Multi-arm bandit algorithm vs classical A/B Testing

How do you know which ad to serve to a customer and get more conversions, or which version of the website to show?

Multi-arm bandit “MAB” works by assigning weights to multiple experiments “arms” using an algorithm known as epsilon-greedy algorithm and uses a explore vs exploit strategy to choose an arm to show

In the ‘classical’ A/B testing you’ll conclude your experiment B is significant if the confidence is more than 95%

MAB is specially useful when you have more than two experiments to run to see which gives better conversions in our case, this is where it truly shines and A/B testing lacks support for this type of testing

Here’s a paper by Google that performs different tests with their results 

This article challenges this argument ‘MAB is better than A/B Testing’ using some tests where they compare the two and get similar results

Watch this quick intro to learn about Multi-Arm bandit

Next read ‘Contextual-bandit’ a strategy that Netflix uses to show personalized artwork of their shows to get maximum views


Thoughts on Course 1 of Prof. Andrew Ng’s specialization

I recently passed my first course of the specialization on Coursera and I decided to share some thoughts from someone new to Deep Learning.



Course 1 is structured very nicely, in week 1 it starts from the very basic of Neural Networks, though as the course prerequisite says a user needs some experience with Machine Learning ideally using Python to grasp concepts presented in these lectures.

In the following week basics of binary classification and logistic regression are explained including the cost function, gradient descent and derivatives, as well as basics of vectorization in python and shown with examples why vectorization is so important in Deep Learning.

I feel that there cannot be a simpler way to show this concept than this. Each lecture video in the course builds on the lecture before it which makes it easier to digest information in chunks or go over a particular concept multiple times to fully understand it.

In week 3 you implement a Shallow Neural Network with the knowledge you have gained from previous lectures. You learn that a Shallow Neural Network is a neural network with 1 hidden layer, in this week you build and use activation functions, vectorization, computing costs, gradient descent and more.

Finally in week 4 you learn to implement a fully connected deep neural network including forward and backward propagations, this week also gives difference of parameters and hyper parameters and why they are important, which is covered in Course 2


Quiz and Assignments are nicely prepared to help you gauge your own understanding, they cover all important concepts delivered throughout the lectures, I recommend spending some time there to fully understand those lectures before jumping to the following week.

I would highly recommend taking this course to anyone with experience in Machine Learning looking to start with Deep Learning.


How-to approach a ML problem, a beginner’s perspective

How you approach a Machine Learning problem? In my little experience after participating in Kaggle competitions and learning from other kernels below is a set of steps that I have come to follow probably not in the same order

Imputing null values

Removing instances with null values is not a good approach, a lot of people impute null values with mean or in some cases median when there are too many outliers and mean is not a good representation, fortunately scikit-learn makes is really easy to impute null values, so this is probably the first step

Before imputing null values it is important to understand why these values were missed in the first place, was it a result of human error? or it is an industrial system where periodic missing values are common? understanding these will help replacing missing values with appropriate ones

But if you really need to use null values then use a decision trees algorithm

Removing Multicolliearity

When you’re dealing with too many features, it is important to understand inter-correlated features, they don’t necessarily add more value in your model and could negatively affect them. One way that I have found to check them in Python is using VIF or Variable Inflation Factor, setting a moderate value for VIF is important to get rid of unwanted features in your dataset

Outliers and normal distribution

Another very important thing to check is detecting outliers in your dataset and visualizing them using Box plots, I have seen approaches where people set upper and lower limits using np.percentile(array, 0.99) or np.percentile(array, 0.05) and set everything above and below this range to minimum or maximum number returned from the np.percentile()

Also if data is right or left skewed it can hurt your model’s performance so it is important to fix skewness in your data for example by taking a log to get a normal distribution, but if you have zero values in your data then a log transformation cannot work

Label encoding for categorical variables

If data contains categorical variables such as Model of a car (Honda, Toyota, Nissan) it is imperative to convert them to numbers using Label Encoder, One hot encoding or using get_dummies in Pandas where appropriate

Correlation check with output variable or finding the most important features

It is important to check correlation with your target variable and recognize your most important features, Scikit-learn makes it really easy to do, also you can generate a heat map or scatterplot to visualize any correlation and choose your important features

We can also use Random Forrest or XGBoost that gives a list of important features in your dataset, we can start by discarding least important features for getting a better model, we must also use cross-validation for testing our model with held-out training data

Feature Engineering

One important bit that is true for any winning Kaggle competition is building your intuition for data and engineer features, this cannot be emphasized enough and it really takes your creativity and experience to bring new features in your dataset that will make your model more robust

After the above I have seen users build their first model probably using XGBClassifier or XGBRegressor but not before reducing dimensions with PCA or TruncatedSVD and using FeatureUnion for stacking features horizontally if faced with many dimensions, more on this in a later blog post

So questions for you dear reader

What do you think of the above, do you find it helpful? What practices do you follow?

Let me know in the comments

Thank you Salahuddin and Irtaza Shah for reviewing this post and sharing your feedback

Q&A: Given the barrage of information around us..How do you all handle and propose to handle information overload given the above?

A member of the Awesome Artificial Intelligence & Deep Learning Facebook group posted this question to which Arthur Chan replied, I am sharing this Q&A here so that we can refer to it later

Question: [summarized] How to do you keep up with the barrage of information in Machine Learning and Deep Learning?

Answer: [Arthur Chan] I would go with one basic tutorial first – depends on my need, keep on absorbing new material. e.g. I first start from Ng’s class, since I need to learn more about DL on both CV and NLP, I listen to Kapathy’s and Socher’s. Sooner/later you would feel those classes are not as in-depth, that’s when I took the Hinton’s class. For interest, I audit Silver’s class as well.
But narrow it down, one thing at a time. Choose quality material first rather than following sensational hyped news. As you learn more, your judgement would improve on a topic. Then you can start to come up with a list of resources you want to go through.

For a detailed discussion and answers from other members refer to the original post


I highly recommend you join this Facebook group

How I achieved classification accuracy of 78.78% on PIMA Indian Diabetes Dataset

I picked up my first Machine Learning dataset from this list and after spending few days doing exploratory analysis and massaging data I arrived at the accuracy of 78.78%

The code for this can be downloaded from GitHub or you can run it directly on Kaggle

Here’s how I did it

After carefully observing this data I categorized Insulin and Diabetes Pedigree function features, I then did a train/test split to prepare for analysis before standardizing using StandardScaler() from sklearn


After trying various algorithms (Logistic Regression, Random Forest and XGBoost) I tried Support Vector Machine to get an accuracy of 78.78% on this dataset using a Linear kernel, this is by far the highest consistent accuracy that I got.


I also noticed that Regularization parameter “C” didn’t have any impact on final accuracy of SVM

Happy “Machine” Learning