I picked up my first Machine Learning dataset from this list and after spending few days doing exploratory analysis and massaging data I arrived at the accuracy of 78.78%
The code for this can be downloaded from GitHub or you can run it directly on Kaggle
Here’s how I did it
After carefully observing this data I categorized Insulin and Diabetes Pedigree function features, I then did a train/test split to prepare for analysis before standardizing using StandardScaler() from sklearn
After trying various algorithms (Logistic Regression, Random Forest and XGBoost) I tried Support Vector Machine to get an accuracy of 78.78% on this dataset using a Linear kernel, this is by far the highest consistent accuracy that I got.
I also noticed that Regularization parameter “C” didn’t have any impact on final accuracy of SVM
Happy “Machine” Learning