How to get the variance of Pandas DataFrame columns?

In this short tutorial we’ll learn how to quickly calculate the statistical variance of one or multiple columns in a Python Pandas DataFrame. Today. we will focus on several use cases:

  • Variance of a Series or Pandas DataFrame column
  • Variance of all columns in a Pandas DataFrame
  • Variance of a Pandas Groupby object
  • Pandas covariance

Create a DataFrame

As we typically do, we’ll start by importing the Pandas library into your favorite Data Analysis environment and then go ahead and create some example data. Feel free to use the DataFrame below to follow along this example.

import pandas as pd

#Define Dataframe columns
language = ['Go', 'Kotlin', 'Swift', 'Java']
first_interview = (76, 78, 84, 83)
second_interview = (51, 59, 58, 58)
third_interview = (15, 12, 19, 24)

# gather data in a dictionary

hr = dict(interview_1=first_interview, interview_2=second_interview, interview_3=second_interview)

# Construct the DataFrame from a dictionary

interviews = pd.DataFrame(hr, index=language)

print (interviews.head())

Here’s an output of our test data:


Variance of a Pandas Column / Series

To calculate a Pandas column variant, we simply slice the column and use the var() Series method.


Note that we used the round() function to minimize the trailing decimals.

Alternatively, we can define a new Series:

my_s = interviews['interview_1']

Variance of all DataFrame columns

If we want to calculate the variance of all columns, we can use the DataFrame var() method, as shown below:


This will render the following result:

interview_1    14.92
interview_2    13.67
interview_3    13.67
dtype: float64

You might also want to use the select_dtypes() DataFrame method, to subset the columns by data type:


Calculate variance of some columns

You can use the Pandas ‘brackets’ notation to subset several columns and then apply the calculation.

my_df = interviews[['interview_2', 'interview_3']]

This will result in:

interview_2    13.67
interview_3    13.67
dtype: float64

Pandas groupby variance

We’ll first add a new column to our DataFrame and use it group the data and calculate its variance.

interviews['area'] = ['Full_Stack','Full_Stack','Server','Server']


This will result in the following:


Get the Pandas covariance

In the same fashion we are able to calculate the covariance of a DataFrame columns


Or if we want to calculate the co-variance of a specific DataFame subset:


Additional learning