Python Machine Learning Guided Project, Early Diabetes Prediction

Follow along with this Python ML-guided project. In this beginner Python project, we build a classification model in Sklearn to predict if or not a person has early-stage diabetes in this medical prediction with machine learning.

Learn the basics of setting up data science, go through the exploratory data analysis to understand the data then with the insights you've collected build a machine learning model in Python with Sklearn.

Template Workbook

Solution Workbook

One on one time with Data Science Teacher Brandyn

Data Science Teacher Brandyn YouTube Channel

Dataset on Kaggle

Follow Data Science Teacher Brandyn

On Facebook

On Linkedin

dataGroups:

Showcase your DataArt on facebook

Showcase your DataArt on linkedin

Python data analysis group, share your analysis on facebook

python data analysis on linkedin

Machine learning in sklearn group

Join the deep learning with tensorflow facebook group

Join the deep learning with tensorflow on linkedin

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, linear regression, simple linear regression — use sns.histplot in for loop to plot the univariate distrbibutions

An important part of machine learning projects is understanding distributions in your data. Here we plot many histograms to inspect the distributions of our features.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, ensemble — Notes on the distributions is important to understand what needs to be done in preprocessing.

As we go through the exploratory data analysis we will gather insights related to building our machine-learning model. We will use our data insights to guided how we will complete our preprocessing of the data.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, treating outliers for ML, machine learning — treating outliers is an important preprocessing step to get the data ready for our model.

Outliers have a big impact on the predictiveness of our ML models. We will use Pandas .clip function to truncate outliers. This is a good way to deal with outliers when you have a normal distribution.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, train test split. X, y — Set up your train test split to allow for experimentation of features.

Set up your train test split with Sklearn in a way that will allow for you to experiment with different features heading into the modeling section of our ML project.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn,StandardScaler, PCA — Use StandardScaler and PCA in Sklearn to preform model preprocessing

Standardize and take principle components to allow our model to better use the data. Use StandardScaler to complete the standardization and PCA to get the principle components in Sklearn.

Instruct, instructional, instructional education,free python learn, seaborn, python, project, data analysis project, pandas, analyze, sklearn, model_factory, random forest, randomforestclassifier — create model_factory function to allow easy testing of many different models

A good practice with Sklearn is to build a model factory function that will do the training of each model and get all the scores in one line of code in Python.

Here we build a model factory function that will be the RandomForest or Bagging Machine Learning models and will do the training and get train and test scores for each Sklearn model.

Python Machine Learning Guided Project, Early Diabetes Prediction - Level 2, 15 minutes

Recent Posts

Comentarios

Subscribe to Our Newsletter