Groupby pandas dataframe data by month
Use the dt.month accessor on your date column to group your dataframe data according to a specific month. For example:
my_df.groupby(['my_date_col'].dt.month).sum()
Example data
We will start by importing the pandas library and create a a very simply DataFrame that you can use in order to follow this example.
import pandas as pd
# create a time series
dates = pd.date_range(start='11/27/2022', periods = 6, freq = 'B' )
interviews = [61, 139, 56, 124, 87, 85]
campaign = pd.DataFrame (dict (date = dates, num_interviews = interviews))
campaign .head()
Let’s look into our data:
date | num_interviews | |
---|---|---|
0 | 2022-11-28 | 61 |
1 | 2022-11-29 | 139 |
2 | 2022-11-30 | 56 |
3 | 2022-12-01 | 124 |
4 | 2022-12-02 | 87 |
Grouping data by month
Now that we have the data, we are able to easily aggregate it by any time series / period and / or another columns.
In this snippet we will calculate the average number of interviews that our HR team has conducted – by month. We will use the Series dt accessor to display the month number (dt.month).
(campaign
.groupby([campaign['date'].dt.month])
.agg (avg_interviews_month = ('num_interviews' , 'mean'))
.round())
This will render the following result:
avg_interviews_month | |
---|---|
date | |
11 | 85.0 |
12 | 99.0 |
Aggregating by month with pd.Grouper
We can achieve somewhat similar results by using a pandas grouper object:
(campaign
.groupby([pd.Grouper(key = 'date', freq = 'M')])
.agg (avg_interviews_month = ('num_interviews' , 'mean'))
.round())
The data is grouped by default by the last day of the month:
avg_interviews_month | |
---|---|
date | |
2022-11-30 | 85.0 |
2022-12-31 | 99.0 |
Groupby month and year and then sum
In the same fashion we can aggregate by month and year. We will use the dt accessor and the strftime formatter to accomplish this. Once the data is aggregated we will apply the sum operation to add up the different row values.
(campaign
.groupby(campaign['date'].dt.strftime('%m-%y'))
.agg (total_interviews = ('num_interviews' , 'sum')))
This will render the following result:
total_interviews | |
---|---|
date | |
11-22 | 256 |
12-22 | 296 |
Aggregate by month name
Last case for today is to groupby our data by month name. This is straightforward using the month_name() function made available by the dt accessor. Don’t forget the parentheses.
(campaign
.groupby(campaign['date'].dt.month_name())
.agg (total_interviews = ('num_interviews' , 'sum')))
Next Learning
How to aggregate pandas data by one or multiple column values?