How to group by date time range in pandas?

Step #1: Create example data

We will start by importing the pandas library and creating a very simple DataFrame that you can use to follow along.

import pandas as pd
dates = pd.Series(pd.date_range(start='12/26/22', end = '01/07/23', freq='B'))
channel = ['direct', 'indirect', 'web', 'direct', 'indirect', 'web','indirect', 'web', 'web', 'direct']
sales = [120, 140, 134, 156, 188, 200, 178, 213, 234, 98]

#create the DataFrame
performance = pd.DataFrame(dict(sales=sales, dates=dates, channel=channel))
print (performance.head())

This will return the following data:

salesdateschannel
01202022-12-26direct
11402022-12-27indirect
21342022-12-28web
31562022-12-29direct
41882022-12-30indirect

Step #2: Aggregate by datetime /timestamp

Aggregating by our date range is relatively simple:

performance.groupby ([dates.dt.year]).agg(overall_sales= ('sales' , 'sum'))

This will return the following DataFrame object:

overall_sales
2022738
2023923

Note that we have used the dt accessor to derive the year value from our datetime column.

Step #3: Groupby pandas by date and other column

In the second example we will aggregate our data by the date column and by channel. Syntax is similar to the previous example, with key difference being the column names we pass to the groupby method:

performance.groupby ([dates.dt.year,channel]).agg(overall_sales= ('sales' , 'sum'))

This will return the following DataFrame:

Step #4: Group by Dataframe by year using the date index

Let’s start by creating a new DataFrame for this example:

performance1 = pd.DataFrame(dict(sales=sales), index=dates)

In this time series, the index consists of datetime objects. We can use those to groupby:

performance1.groupby(performance.index.year)['sales'].sum()

The result will be:

2022    738
2023    923
Name: sales, dtype: int64

Note that in order to groupby year, we didn’t need the dt accessor. We can directly call the year property of the Datetime Index to aggregate by year / month / month name / week etc’.

Step# 5: Groupby timestamp and count

Last example for this tutorial is aggregating by a timestamp and then counting distinct values.

performance.groupby ([dates.dt.month_name()]).agg(sales_count= ('sales' , 'count'))

And the result will be:

sales_count
December5
January5