How to sort a Pandas dataframe by month and year?

In this tutorial we’ll learn how we can order a Pandas DataFrame by datetime column values. Specifically we’ll look into how to sort data by date, month and year values.

Example Data

We’ll start by creating some dummy data that you can use to follow along this example.

import pandas as pd

dates = ['2023-02-02','2023-02-06', '2023-02-07', '2023-02-03', '2022-12-01' ]
language = ['Python', 'R', 'R', 'Javascript', 'Java']
salary = [234.0, 180.0, 172.0, 163.0, 234.0]
cand = dict(date=dates, language=language, salary=salary)
cand_df = pd.DataFrame(data=cand)
cand_df.head()

Here is our data:

datelanguagesalary
02023-02-02Python234.0
12023-02-06R180.0
22023-02-07R172.0
32023-02-03Javascript163.0
42022-12-01Java234.0

Sort DataFrame by date

We first would like to convert our date column to the datetime64 type.

cand_df['date'] = cand_df['date'].astype('datetime64')

Next is to sort our date according to ascending dates (earliest first):

cand_df.sort_values(by='date').reset_index(drop=True)

Let’s look into our Data – note that for convenience i have ordered the DataFrame by its index (by resetting the index column and dropping the previous one).

datelanguagesalary
02022-12-01Java234.0
12023-02-02Python234.0
22023-02-03Javascript163.0
32023-02-06R180.0
42023-02-07R172.0

Sorting in descending order

We can easily switch the sorting order to descending, by using the ascending=False parameter.

cand_df.sort_values(by='date', ascending=False).reset_index(drop=True)

Sorting DataFrames by year and month values

We will start by creating new columns in our DataFrame by using the dt accessor on the Series:

cand_df['month'] = cand_df['date'].dt.month
cand_df['year'] = cand_df['date'].dt.year

Now we can go ahead and order the data:

cand_df.sort_values('year')

#and for months


cand_df.sort_values('month')

Sorting data by multiple columns

To order our data by multiple column values we can simply pass a list of columns to the sort() method as shown below:

cand_df.sort_values(['year', 'month']).reset_index(drop=True)

And here is our data

datelanguagesalarymonthyear
02022-12-01Java234.0122022
12023-02-02Python234.022023
22023-02-06R180.022023
32023-02-07R172.022023
42023-02-03Javascript163.022023

Order by datetime index

We can also set the date column as the index and order the data accordingly:

cand_df.set_index('date').sort_values(by='date')

Additional learning