How to find the standard deviation of DataFrame columns, series and rows in Pandas?

In today’s tutorial we will learn how to calculate the standard deviation of a Pandas DataFrame. We’ll calculate the standard deviation for several cases:

  • A Pandas Series
  • One or more DataFrame columns
  • All rows in a Python DataFrame
  • A groupby object

Example DataFrame

We’ll start by importing the Pandas library and reading a csv file with our data into a new DataFrame.

# Import Pandas library
import pandas as pd

# Create DataFrame by reading a csv file
survey = pd.read_csv ('hr_data.csv')

Here’s the DataFrame:

Case 1: Calculate std deviation of a Python Series

In this simple example, we’ll call the std method on one Series (column).

# standard deviation of a series
survey['avg_salary'].std()

Case 2: Standard deviation of one or more column

In this case we will calculate the stdv for all or specific columns.

For all the DataFrame:

survey.std()

For specific columns:

We’ll first subset the DataFrame according to specific column labels and then call the std() method.

cols = ['num_cand','avg_salary']
survey[cols].std()

Case 3: Std deviation for each row in a Python DataFrame

As we would like to calculate the stdev of the rows, we’ll pass the axis=1 parameter.

# standard deviation of each row
survey.std(axis=1)

Case 4: Std dev with Groupby

In this example we’ll:

  • First aggregate the data by one (or multiple) columns.
  • Create an aggregated figure, in this case, representing the standard deviation of the salary figures.
# std deviation groupby
data.groupby('language').agg(avg_salary = ('salary', 'std'))

Plot a standard deviation

If we would like to quickly plot the std dev figures into a simple graph, we can use the Pandas DataFrame.plot() method.

Note that we can also create more sophisticated charts by leveraging the Matplotlib and Seaborn libraries to its full extent.

survey.std().plot(kind='bar');

Here’s our chart: