Python Data Analysis Guided Project - Analyze Dog Breeds, Level 2, 31 min

free, beginner , instructional, data analysis in Python — Use Seaborn to understand how fur color affects height of dogs in this data analysis project

DataSimple.education Certifications Data Analysis, Pandas, Seaborn, Plotly and More

In this Python data analysis guided project, we will explore dog breeds from this Kaggle data set. We will use our Python data analysis skills in this beginner data analysis project to understand the eye color, fur color, and height of common dog breeds.

To start our Python data analysis project we will start by doing a little processing to enable our analyses. This is needed because of the semi-structured data format that happens when we have a list of different sizes. Like in the character traits features there is a list of different amounts of traits, this is not amenable to data analysis. To easily solve this issue we will use a Pandas' function explode to turn our features into structured data ready for analysis.

After we complete a univariate analysis of each feature we move on to our Python Bivariate Data Analysis. In our bivariate analysis, we will complete an analysis to determine how one column affects another.

We will understand how the fur color of dogs, the dogs' character traits, and how common health issues affect the dogs' height and life span. We will make use of Seaborn's histplot and will use it with the hue argument to change the color of each category in our histogram plot.

Template Workbook

Solutions Workbook

Data Science Teacher Brandyn YouTube Channel

One on one time with Data Science Teacher Brandyn

Dataset on Kaggle

Follow Data Science Teacher Brandyn

On Facebook

On Linkedin

dataGroups:

Showcase your DataArt on facebook

Showcase your DataArt on linkedin

Python data analysis group, share your analysis on facebook

Python data analysis on linkedin

Machine learning in sklearn group

Join the deep learning with tensorflow facebook group

Join the deep learning with tensorflow on linkedin

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, explode, pandas, analyze, statical analysis, distribution — Turn a feature from semi structure to strcuture data with Pandas's explode

A common problem is that in a feature there is a list of different sizes of different categories. To fix this issue we will use Pandas' split function to turn what is a long string into an actual list data type for the next step. After we've turned the long string into a list the feature is ready for Pandas' explode function.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, pandas plot, value_counts, analyze, statical analysis, distribution — Pandas Plot on value_counts

After we've turned a feature into structure data we are able to complete our data analysis and here we look at the most common fur color of dog breeds. We do this using Pandas' plot to create a bar graph.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, count plot, analyze, statical analysis, distribution — logical indexing of value_counts using value_counts

Here while using Pandas' value_counts function we we apply logical indexing to only plot the values that are greater than one to make our plot user-friendly.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, analyze, statical analysis, distribution — Number features as range of values

In our Python Data Analysis Project we notice that the height feature was an object data when we first called Pandas' info function. Which gives us a count of the non-null values and all the data types by column in our DataFrame.

Upon inspecting this column we see that it's represented as a range of height and so we will need to clean this feature to begin to analyze it.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, apply, user defined function, analyze, statical analysis, distribution — Create functions to extract values from string, then apply functions

To extract the values needed from the string in this feature we will create two user-defined functions to extract the max and minimum values.

After we create each function we will use Pandas' apply function to apply the function and we will be able to save this output to a new column.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, pandas histogram, analyze, statical analysis, distribution — Pandas' Plot to plot the distribution of height

After we've extracted the min and max values from the string we use Pandas' plot to plot the distribution of the continuous variable using kind = hist.

free, Instruct, instructional, instructional education,free python learn, seaborn, python, data analysis, analysis, histplot, distribution, hue, analyze, statical analysis, distribution — Use hue argument in Seaborn to change color by category

Lastly, as we changed each feature from a semi-structured to a structured format of data we at the end of our project are able to understand how fur color and common health problems affect the height and longevity of common dog breeds.

Python Data Analysis Guided Project - Analyze Dog Breeds, Level 2, 31 min

Recent Posts

Comments

Subscribe to Our Newsletter