In this tutorial we’ll learn how to calculate the time difference between two dates in a Pandas DataFrame. We’ll look into several time difference calculations: in days, months and years.
Example DataFrame
We will get started by creating a simple DataFrame that you can use in order to follow along with this example.
#Python3
import pandas as pd
import numpy as np
office = ['Denver','Paris', 'London' ]
hire_date = ['1-17-2022', '4-7-2022', '2-18-2022']
start_date = ['9-17-2022', '4-29-2022', '6-19-2022']
hiring_dict = dict( office=office,hire_date=(hire_date),start_date = start_date)
hiring_df = pd.DataFrame(data=hiring_dict, dtype='datetime64[ns]')
hiring_df.head()
Here’s our data:
office | hire_date | start_date | |
---|---|---|---|
0 | Denver | 2022-01-17 | 2022-09-17 |
1 | Paris | 2022-04-07 | 2022-04-29 |
2 | London | 2022-02-18 | 2022-06-19 |
Calculate delta between dates in days
We can easily subtract two datetime columns in a Pandas DataFrame to get the time diff in days. Here’s a short snippet for our case:
hiring_df['diff_days'] = hiring_df['start_date']- hiring_df['hire_date']
Time difference in minutes, hours, months and years
In order to calculate time differences for any other time frequencies other than days, we’ll need to use the np.timedelta() Numpy array method. Ensure that numpy is installed in your Data Analysis environment before running this example.
For months:
#import numpy as np
hiring_df['diff_months'] = (hiring_df['diff_days']/np.timedelta64(1, 'M')).astype('float').round(1)
Note that the resulting calculation is provided as a float number, rounded up to one decimal. If we would like to show the calculation as an integer we could cast the resulting time difference column to the integer data type as shown below:
hiring_df['diff_months'] = \
(hiring_df['diff_days']/np.timedelta64(1, 'M')).astype('int8')
For years:
hiring_df['diff_years'] = \ (hiring_df['diff_days']/np.timedelta64(1, 'Y')).astype('float').round(1)
For hours:
hiring_df['diff_hours'] = \ (hiring_df['diff_days']/np.timedelta64(1, 'h')).astype('float').round(1)
Note: Calculating the difference in minutes is easy accomplished by multiplying the hours result by 60.
In summary, here’s our resulting DataFrame:
office | hire_date | start_date | diff_days | diff_months | diff_years | diff_hours | |
---|---|---|---|---|---|---|---|
0 | Denver | 2022-01-17 | 2022-09-17 | 243 days | 8.0 | 0.7 | 5832.0 |
1 | Paris | 2022-04-07 | 2022-04-29 | 22 days | 0.7 | 0.1 | 528.0 |
2 | London | 2022-02-18 | 2022-06-19 | 121 days | 4.0 | 0.3 | 290 |