In Data Analysis we often aggregate our data and then typically apply specific functions on it. Today we’ll learn how to count values on data that we have previously aggregated using the DataFrame.groupby() Pandas method.
Creating example data
Let’s first import the Python Pandas library and acquire data into our Python development environment:
import pandas as pd hr = pd.read_csv('interview_data.csv') hr.info()
Groupby and count in Pandas
We can easily aggregate our dataset and count the number of observations related to each programming language in our dataset.
Note that unlike the count() method, size() counts also occurrences of nan empty values.
Count pandas group by with condition
Let’s now assume that we want to show up only programming languages for which we interviewed more than twice during the year. We will first aggregate the data and then define a new column displaying the values we counted.
# groupby languages = hr.groupby('language').agg(number_of_months = ('month', 'count')) # define condition filt = languages [ 'number_of_months'] > 2 # filter the DataFrame languages [filt]
Here’s our result:
Aggregating and counting with multiple conditions
In the same manner, we can filter our grouped DataFrame by multiple conditions:
# aggregate data languages_month_salary = hr.groupby('language').agg(number_of_months = ('month', 'count'), avg_sal = ('salary', 'mean')) # define a complex condition - boolean or filt = (languages_month_salary [ 'number_of_months']) > 2 |(languages_month_salary [ 'avg_sal'] > 100) #filter the DataFrame languages_month_salary [filt]
Groupby and count distinct values
In this case, we will first go ahead and aggregate the data, and then count the number of unique distinct values. We will then sort the data in a descending orders. The result in this case is a series.
Count specific values
We suggest that you look into our tutorial on how to count unique specific values.