Pandas: How to count number of elements after aggregating with groupby.

In Data Analysis we often aggregate our data and then typically apply specific functions on it. Today we’ll learn how to count values on data that we have previously aggregated using the DataFrame.groupby() Pandas method.

Creating example data

Let’s first import the Python Pandas library and acquire data into our Python development environment:

import pandas as pd
hr = pd.read_csv('interview_data.csv')

Groupby and count in Pandas

We can easily aggregate our dataset and count the number of observations related to each programming language in our dataset.


Note that unlike the count() method, size() counts also occurrences of nan empty values.

Count pandas group by with condition

Let’s now assume that we want to show up only programming languages for which we interviewed more than twice during the year. We will first aggregate the data and then define a new column displaying the values we counted.

# groupby
languages = hr.groupby('language').agg(number_of_months = ('month', 'count'))

# define condition
filt = languages [ 'number_of_months'] > 2

# filter the DataFrame

languages  [filt]

Here’s our result:

Aggregating and counting with multiple conditions

In the same manner, we can filter our grouped DataFrame by multiple conditions:

# aggregate data
languages_month_salary = hr.groupby('language').agg(number_of_months = ('month', 'count'), avg_sal = ('salary', 'mean'))

# define a complex condition - boolean or
filt = (languages_month_salary [ 'number_of_months']) > 2 |(languages_month_salary [ 'avg_sal'] > 100)

#filter the DataFrame
languages_month_salary  [filt]


Groupby and count distinct values

In this case, we will first go ahead and aggregate the data, and then count the number of unique distinct values. We will then sort the data in a descending orders. The result in this case is a series.


Count specific values

We suggest that you look into our tutorial on how to count unique specific values.