A common requirement when working with time series data is to be able to group your data by specific time frequency. In this short tutorial we will learn how to aggregate our data using the pd.Grouper() function into weekly buckets.
Example data
We’ll get started with importing the Pandas library and creating some data that we’ll later group.
#impor the pandas library
import pandas as pd
#define date range
interview_date = pd.date_range(start='1/1/2023', end = '1/10/2023' )
num_interviews = [145, 25, 28, 32, 58, 79, 50, 100, 120, 110]
hr = dict (interview_date = interview_date, num_interviews= num_interviews)
hr_df = pd.DataFrame (hr)
hr_df.head()
Here is the data that we have just created
interview_date | num_interviews | |
---|---|---|
0 | 2023-01-01 | 145 |
1 | 2023-01-02 | 25 |
2 | 2023-01-03 | 28 |
3 | 2023-01-04 | 32 |
4 | 2023-01-05 | 58 |
5 | 2023-01-06 | 79 |
6 | 2023-01-07 | 50 |
7 | 2023-01-08 | 100 |
8 | 2023-01-09 | 120 |
9 | 2023-01-10 | 110 |
Group by week in Pandas
Now that we have the data we can use the pd.Grouper() function to aggregate our data. Using the freq=’W’ parameter we are able to specify the weekly frequency offset alias. ‘W’ determines a weekly frequency starting Sundays.
(hr_df.groupby([pd.Grouper(key = 'interview_date', freq = 'W')])
.agg(average_interview_day = ('num_interviews' , 'mean'))
.round())
average_interview_day | |
---|---|
interview_date | |
2023-01-01 | 145.0 |
2023-01-08 | 53.0 |
2023-01-15 | 115.0 |
Group by week starting Monday
By tweaking the freq parameter, we are able to aggregate data on a weekly basis starting Monday (or any other day).
(hr_df.groupby([pd.Grouper(key = 'interview_date', freq = 'W-MON')])
.agg(average_interview_day = ('num_interviews' , 'mean'))
.round())
average_interview_day | |
---|---|
interview_date | |
2023-01-02 | 85.0 |
2023-01-09 | 67.0 |
2023-01-16 | 110.0 |
Weekly Group by and sum
By modifying the aggregation method, we are able to perform additional calculations on the grouped data, in this case, a sum.
(hr_df.groupby([pd.Grouper(key = 'interview_date', freq = 'W-FRI')])
.agg(total_interviews = ('num_interviews' , 'sum'))
.round())
total_interviews | |
---|---|
interview_date | |
2023-01-06 | 367 |
2023-01-13 | 380 |