In this tutorial we’ll learn how we can order a Pandas DataFrame by datetime column values. Specifically we’ll look into how to sort data by date, month and year values.
Example Data
We’ll start by creating some dummy data that you can use to follow along this example.
import pandas as pd
dates = ['2023-02-02','2023-02-06', '2023-02-07', '2023-02-03', '2022-12-01' ]
language = ['Python', 'R', 'R', 'Javascript', 'Java']
salary = [234.0, 180.0, 172.0, 163.0, 234.0]
cand = dict(date=dates, language=language, salary=salary)
cand_df = pd.DataFrame(data=cand)
cand_df.head()
Here is our data:
date | language | salary | |
---|---|---|---|
0 | 2023-02-02 | Python | 234.0 |
1 | 2023-02-06 | R | 180.0 |
2 | 2023-02-07 | R | 172.0 |
3 | 2023-02-03 | Javascript | 163.0 |
4 | 2022-12-01 | Java | 234.0 |
Sort DataFrame by date
We first would like to convert our date column to the datetime64 type.
cand_df['date'] = cand_df['date'].astype('datetime64')
Next is to sort our date according to ascending dates (earliest first):
cand_df.sort_values(by='date').reset_index(drop=True)
Let’s look into our Data – note that for convenience i have ordered the DataFrame by its index (by resetting the index column and dropping the previous one).
date | language | salary | |
---|---|---|---|
0 | 2022-12-01 | Java | 234.0 |
1 | 2023-02-02 | Python | 234.0 |
2 | 2023-02-03 | Javascript | 163.0 |
3 | 2023-02-06 | R | 180.0 |
4 | 2023-02-07 | R | 172.0 |
Sorting in descending order
We can easily switch the sorting order to descending, by using the ascending=False parameter.
cand_df.sort_values(by='date', ascending=False).reset_index(drop=True)
Sorting DataFrames by year and month values
We will start by creating new columns in our DataFrame by using the dt accessor on the Series:
cand_df['month'] = cand_df['date'].dt.month
cand_df['year'] = cand_df['date'].dt.year
Now we can go ahead and order the data:
cand_df.sort_values('year')
#and for months
cand_df.sort_values('month')
Sorting data by multiple columns
To order our data by multiple column values we can simply pass a list of columns to the sort() method as shown below:
cand_df.sort_values(['year', 'month']).reset_index(drop=True)
And here is our data
date | language | salary | month | year | |
---|---|---|---|---|---|
0 | 2022-12-01 | Java | 234.0 | 12 | 2022 |
1 | 2023-02-02 | Python | 234.0 | 2 | 2023 |
2 | 2023-02-06 | R | 180.0 | 2 | 2023 |
3 | 2023-02-07 | R | 172.0 | 2 | 2023 |
4 | 2023-02-03 | Javascript | 163.0 | 2 | 2023 |
Order by datetime index
We can also set the date column as the index and order the data accordingly:
cand_df.set_index('date').sort_values(by='date')