Cast strings to datetime in Pandas
You can use either the pd.datetime() or the Series.astype() to cast a DataFrame column from object to date object. Here’s how to make it work in Python:
pd.to_datetime (your_df['col_name'], format = 'your_date_format')
# or alternatively:
your_df['col_name'].astype ('datetime64[ns]', 'your_date_format')
In today’s tutorial we will learn how to change the format of object (string) columns in pandas DataFrame to the Datetime64 format.
Create Example data
import pandas as pd
dates = ['01-01-24', '01-01-24', '01-01-24', '01-01-24', '02-01-24', '02-01-24']
sales = [345, 481, 331, 387, 244, 402]
revenue = pd.DataFrame(dict (dates = dates, sales=sales))
revenue.head()
Let’s look at the data:
dates | sales | |
---|---|---|
0 | 01-01-24 | 345 |
1 | 01-01-24 | 481 |
2 | 01-01-24 | 331 |
3 | 01-01-24 | 387 |
4 | 02-01-24 | 244 |
5 | 02-01-24 | 402 |
If we look at the pandas data types of our DataFrame object we’ll see that the dates column is an object.
revenue.dtypes
This will return:
dates object sales int64 dtype: object
Convert strings to dates in pandas
We have two simple ways to cast our data to the datimetime formats:
- Using the to_datetime function:
revenue['dates'] = pd.to_datetime(revenue['dates'],format = '%d-%m-%y')
- Using the astype() Series method:
revenue['dates'] = revenue['dates'].astype('datetime64[ns]', '%d-%m-%y')
Let’s look into our DataFrame:
print ( revenue.head())
Both will render the same result:
dates | sales | |
---|---|---|
0 | 2024-01-01 | 345 |
1 | 2024-01-01 | 481 |
2 | 2024-01-01 | 331 |
3 | 2024-01-01 | 387 |
4 | 2024-02-01 | 244 |
5 | 2024-02-01 | 402 |
If we check the data types we’ll see that the dates column has now a datetime type.
dates datetime64[ns] sales int64 dtype: object
You can use the same technique to transform multiple DataFrame columns to dates.
Cast strings with hour, minutes and seconds to datetime
Using similar logic, you can also handle strings that has a timestamp format, and convert them to datetimes.
stamps = ['01-01-24 06:00:00', '01-01-24 12:00:00', '01-01-24 18:00:00', '02-01-24 06:00:00', '02-01-24 12:00:00', '02-01-24 12:00:00']
sales = [345, 481, 331, 387, 244, 402]
revenue = pd.DataFrame(dict (stamps = stamps, sales=sales))
We can convert the stamps column to datetime as following:
pd.to_datetime(revenue['stamps'],format = '%d-%m-%y %H:%M:%S')
Note the format parameter, which needs to be specified to facilitate the conversion.
Now that the stamps column is a date, we can easily access its DateTimeProperties using the dt accessor, for example:
revenue['stamps'].dt.year revenue['stamps'].dt.month_name() revenue['stamps'].dt.isocalendar().week
Additional Learning
How to find the difference between multiple datetimes in pandas and Python?