In the second part of the Python Guided Machine Learning Project, the data scientist picks up where the data analyst left off. We use the data analyst's sights to guide the data scientist preprocessing strategy for machine learning.
This is extremely helpful in a team setting so the data scientist can focus on building the model. And as we see there is a lot to try when building a model.
Here we go a step further and don’t just select the best model, we use a pairplot in Seaborn to plot the hyperparameters against the mean test score in our grid search from Sklearn to understand what is really impacting the output of your model.
This workflow is also set up with an experimental science approach in that the workflow allows for easy ability to change preprocessing and feature selection. In python use Pandas, Seaborn and Sklearn in this kaggle competition prediction.
Follow Data Science Teacher Brandyn
dataGroups:
Use pairplot in Seaborn to plot each hyperparameter against the other hyperparameters in a gridsearch and really understand the effect each hyperparameter has on the final test score.
A huge benefit of seeing all the hyperparameters next to each other we can start to understand what is actually impacting the final predictions. The large range of n_estimators all producing similar top scores would make it seem that this is not the most important by itself.
Better to go with the simplest version of what works best.
We can see that deviance appears on the top models most often for the loss hyperparameter. We would then be able to have confidence that it is the loss we should be using in our model's hyperparameters.
excellent!