Our task for today is to aggregate data by quarter in a pandas dataframe.
Example data
Let’s start by importing the pandas Data Analysis library; and then creating a very simple pandas DataFrame:
import pandas as pd
stamps = pd.date_range(start='6/28/2023', periods = 6, freq = 'B' )
sales = [122, 124, 106, 101, 145, 100]
revenue = pd.DataFrame (dict (date = stamps, sales = sales))
revenue.head()
Here’s our data:
date | sales | |
---|---|---|
0 | 2023-06-28 | 122 |
1 | 2023-06-29 | 124 |
2 | 2023-06-30 | 106 |
3 | 2023-07-03 | 101 |
4 | 2023-07-04 | 145 |
Grouping the data by quarter
We can use the dt.quarter property in order to then aggregate the data rows by quarter:
revenue_quarter = (revenue
.groupby([revenue['date'].dt.quarter])
.agg (sales_by_quarter = ('sales' , 'sum'))
)
# optionally, rename the index
revenue_quarter.index.name = 'quarter'
print (revenue_quarter)
This will result in the following frame:
sales_by_quarter | |
---|---|
quarter | |
2 | 352 |
3 | 346 |
Aggregating by quarter and year
Next case is to aggregate the data by quarter and year. We will first define a new column that represent the quarter-year value for every row in our DataFrame and group by accordingly:
# define a quarter-year column
revenue['quarter-year'] = revenue['date'].dt.quarter.map(str) + '-' +revenue['date'].dt.year.map(str)
#group the data
(revenue
.groupby(revenue['quarter-year'])
.agg (sales_by_quarter = ('sales' , 'sum')))
Here’s our grouped data:
sales_by_quarter | |
---|---|
quarter-year | |
2-2023 | 352 |
3-2023 | 346 |
Aggregate dataframe columns with a pd.grouper
We can also use a pd.grouper to aggregate our data to the last quarter day:
(revenue
.groupby([pd.Grouper(key = 'date', freq = 'Q')])
.agg (sales_by_quarter = ('sales' , 'sum')))
Exporting your groupby to a csv file
Once the data is aggregated, you can easily export it to a csv file:
revenue_quarter.to_csv('path_to_your_csv_file')