Tips on Feature Engineering

Tips on Feature Engineering to fit how classifiers work; giving a geometry problem to a tree, oversized dimension to a kNN and interval data to an SVM are not a good ideas remove as much nonlinearities as possible; expecting that some classifier will do Fourier analysis inside is rather naive (even if, it will waste a lot of complexity there) make features generic to all objects so that some sampling in the chain won’t knock them out check previous works – often transformation used for visualisation or testing similar types of data is already tuned to uncover interesting aspects avoid unstable, optimizing transformations like PCA which may lead to overfitting experiment a lot

October 8, 2017 · 1 min · 113 words · C Ried

Great Statistics Books to Read

Following you will find a number of the best books to learn more about statistics and its philosophy. Opinionated Lessons on Statistics Introduction to Statistical Learning The Elements of Statistical Learning Applied Predicitive Modeling Statistical Inference Statistical Rethinking Data Analysis Using Regression and Multilevel/Hierarchical Models Mostly Harmless Econometrics Mastering Metrics: The Path from Cause to Effect All of Statistics Statistics Statistics for Experimenters Think Bayes Computer Age Statistical Inference Think Stats Machine Learning for Hackers Probability and Statistics Statistical Evidence: A likelihood paradigm

October 6, 2017 · 1 min · 83 words · C Ried

AB Testing in R from Scratch

Using Bayesian Systems Quantify the probability of all possibilites thus measuring risk insert institutional knowledge (add knowledge that changes the probability) learn in an online fashion A/B Testing with Approximate Bayesian Computation No mathematics required able to implement from scratch A/B Testing Measures and figures out the better design Approximate Bayesian Computation Generate a trial value for the thing we want to know (in this case its the conversion fraction of a layout) Simulate or data assuming the trail value, keep the trial value, otherwise discard it and try again If the simulation looks like the real data, keep the trial value, otherwise discard and try again Keep doing this until we’ve got lots of trial values that worked library(progress) library(ggplot2) library(reshape2) # Variables n_visitors_a <- 100 # number of visitors shown layout A n_conv_a <- 4 # number of vistors shown layout A who converted (4%) n_visitors_b <- 40 n_conv_b <- 2 Using Bayesian Systems...

September 29, 2017 · 3 min · 518 words · C Ried