In today’s tutorial we will learn how to calculate the standard deviation of a Pandas DataFrame. We’ll calculate the standard deviation for several cases:
- A Pandas Series
- One or more DataFrame columns
- All rows in a Python DataFrame
- A groupby object
We’ll start by importing the Pandas library and reading a csv file with our data into a new DataFrame.
# Import Pandas library import pandas as pd # Create DataFrame by reading a csv file survey = pd.read_csv ('hr_data.csv')
Here’s the DataFrame:
Case 1: Calculate std deviation of a Python Series
In this simple example, we’ll call the std method on one Series (column).
# standard deviation of a series survey['avg_salary'].std()
Case 2: Standard deviation of one or more column
In this case we will calculate the stdv for all or specific columns.
For all the DataFrame:
For specific columns:
We’ll first subset the DataFrame according to specific column labels and then call the std() method.
cols = ['num_cand','avg_salary'] survey[cols].std()
Case 3: Std deviation for each row in a Python DataFrame
As we would like to calculate the stdev of the rows, we’ll pass the axis=1 parameter.
# standard deviation of each row survey.std(axis=1)
Case 4: Std dev with Groupby
In this example we’ll:
- First aggregate the data by one (or multiple) columns.
- Create an aggregated figure, in this case, representing the standard deviation of the salary figures.
# std deviation groupby data.groupby('language').agg(avg_salary = ('salary', 'std'))
Plot a standard deviation
If we would like to quickly plot the std dev figures into a simple graph, we can use the Pandas DataFrame.plot() method.
Note that we can also create more sophisticated charts by leveraging the Matplotlib and Seaborn libraries to its full extent.
Here’s our chart: