When wrangling data extracted from different sources, you might need to change the data types of several columns from standard Pandas objects / Python strings to the datatime64 format.
In this short tutorial we’ll look into different cases:
- Casting entire column to Datatime
- Handling column conversion to dates with read_csv
- Keep only dates when converting a column
- Change the format of a datetime object
- Set a dateindex
Create the example dataframe
As we typically do, we’ll create a simple Pandas DataFrame for you to follow along this example.
import pandas as pd
hire_date = ['1-1-2022', '15-1-2022', '25-1-2022', '25-2-2022']
office = ['Atlanta','Atlanta', 'Boston', 'London' ]
salary = [153, 143, 188, 199]
hiring_dict = dict( office=office,hire_date=hire_date,salary = salary)
hiring = pd.DataFrame(data=hiring_dict)
hiring.head()
office | hire_date | salary | |
---|---|---|---|
0 | Atlanta | 1-1-2022 | 153 |
1 | Atlanta | 15-1-2022 | 143 |
2 | Boston | 25-1-2022 | 188 |
3 | London | 25-2-2022 | 199 |
Let’s take a look at the DataFrame columns types:
hiring.dtypes
office object hire_date object salary int64 dtype: object
The hire_date column data type is object. We would like to cast the column to the datetime64 Pandas type.
Pandas Dataframe column to Datetime
We’ll use the pd.to_datetime DataFrame method to cast the column. We will pass the Date format using the format parameter.
hiring['hire_date_dt'] = pd.to_datetime(hiring['hire_date'], format = '%d-%m-%Y')
hiring.head()
This will add a new column to the DataFrame containing the dates in datetime64 fomat.
office | hire_date | salary | hire_date_dt | |
---|---|---|---|---|
0 | Atlanta | 1-1-2022 | 153 | 2022-01-01 |
1 | Atlanta | 15-1-2022 | 143 | 2022-01-15 |
2 | Boston | 25-1-2022 | 188 | 2022-01-25 |
3 | London | 25-2-2022 | 199 | 2022-02-25 |
Instead of defining a new column, we can simply replace the current column:
hiring['hire_date'] = pd.to_datetime(hiring['hire_date'], format = '%d-%m-%Y')
Note that by specifying a time format we were able to convert the column to a datetime object that displays the data but not the time.
Convert column to datatime with read_csv
When importing data from a csv file, you can automatically cast specific or multiple columns to the Datatime type. Here’s a simple example in which we use the parse_dates parameter to define the column to be converted.
hiring3 = pd.read_csv('hiring.csv', parse_dates = 'hire_date')
Convert to datetime with astype and keep the dates only
There will be instances in which you have a column that contains date values which contain also hours, minutes and seconds. Let’s quickly add a new column to our DataFrame.
hiring['full_date'] = ['1-1-2022 13:45:00', '15-1-2022 08:45:00', '25-1-2022 12:45:00', '25-2-2022 13:46:00']
hiring['full_date'] = hiring['full_date'].astype('datetime64')
As we are interested to see only the dates, but not the hours, minutes, seconds and so forth, we’ll extract the date only with the dt accessor:
hiring['date'] = hiring['full_date'].dt.date
hiring['date'].head()
Here’s our DataFrame:
office | hire_date | salary | hire_date_dt | full_date | date_only | |
---|---|---|---|---|---|---|
0 | Atlanta | 1-1-2022 | 153 | 2022-01-01 | 2022-01-01 13:45:00 | 2022-01-01 |
1 | Atlanta | 15-1-2022 | 143 | 2022-01-15 | 2022-01-15 08:45:00 | 2022-01-15 |
2 | Boston | 25-1-2022 | 188 | 2022-01-25 | 2022-01-25 12:45:00 | 2022-01-25 |
3 | London | 25-2-2022 | 199 | 2022-02-25 | 2022-02-25 13:46:00 | 2022-02-2 |
Change Datetime format in Pandas columns
After converting our column to the Datetime object we can easily convert the format of our date objects using the strftime string to datetime formatter.
Here’s a very simple example. Be sure to specify the required formatting for your datetime:
hiring['full_date'].dt.strftime('%d-%m-%y')
Set a datetime index to the DataFrame
We can now go ahead and define one of our datetime columns as the DataFrame index. For example:
hiring.index = hiring['hire_date_dt']