In today’s tutorial we will learn how to calculate the standard deviation of a Pandas DataFrame. We’ll calculate the standard deviation for several cases:
- A Pandas Series
- One or more DataFrame columns
- All rows in a Python DataFrame
- A groupby object
Example DataFrame
We’ll start by importing the Pandas library and reading a csv file with our data into a new DataFrame.
# Import Pandas library
import pandas as pd
# Create DataFrame by reading a csv file
survey = pd.read_csv ('hr_data.csv')
Here’s the DataFrame:
Calculate std deviation of a Pandas Series
In this simple example, we’ll call the std method on one Series (column).
# standard deviation of a series
survey['avg_salary'].std()
Standard deviation of one or more DataFrame column
In this case we will calculate the stdv for all or specific columns.
For all the DataFrame:
survey.std()
For specific columns:
We’ll first subset the DataFrame according to specific column labels and then call the std() method.
cols = ['num_cand','avg_salary']
survey[cols].std()
Std deviation for each row in a Python DataFrame
As we would like to calculate the stdev of the rows, we’ll pass the axis=1 parameter.
# standard deviation of each row
survey.std(axis=1)
Std dev of Pandas Groupby objects
In this example we’ll:
- First aggregate the data by one (or multiple) columns.
- Create an aggregated figure, in this case, representing the standard deviation of the salary figures.
# std deviation groupby
data.groupby('language').agg(avg_salary = ('salary', 'std'))
Plotting a standard deviation
If we would like to quickly plot the std dev figures into a simple graph, we can use the Pandas DataFrame.plot() method.
Note that we can also create more sophisticated charts by leveraging the Matplotlib and Seaborn libraries to its full extent.
survey.std().plot(kind='bar');
Here’s our chart: