How to change pandas column strings to datetime format?

Convert pandas string values to datetime format

We are able to convert string values in a pandas DataFrame column using the astype() Series method and the pandas pd.todatetime() function:

your_df['your_col'].astype('datetime64')

# or alternatively:

pd.to_datetime(your_df['your_col'])

Step #1: Create example data

Let’s start by constructing a DataFrame:

import pandas as pd

language = ['Python', 'Javascript', 'Javascript', 'Python']
salary = [187.0, 153.0, 139.0, 172.0]
position_open_date = ['10-1-2023', '1-11-2022', '10-12-2022', '12-2-2023']
data = dict(language = language, salary = salary, start_date = position_open_date )
interviews = pd.DataFrame(data=data)
interviews.head()

Here’s our data:

languagesalarystart_date
0Python187.010-1-2023
1Javascript153.01-11-2022
2Javascript139.010-12-2022
3Python172.012-2-2023

Next, let’s look at the data types of our DataFrame columns:

interviews.dtypes

This will return:

language       object
salary        float64
start_date     object
dtype: object

As you can see, the start_date column is defined as an object data type. Our taks is to convert it to datetime – so that we can use that column to aggregate data according to week, month or other time periods.

Step #2: Change from string to datetime yyyymm-dd with astype

First method o cast our string column to date is using the Series astype() method:


interviews['start_date'] = interviews['start_date'].astype('datetime64')

This will return the following Series:

0   2023-10-01
1   2022-01-11
2   2022-10-12
3   2023-12-02
Name: start_date, dtype: datetime64[ns]

Step #3: Convert to datetime with to_datetime()

Alternatively, you can use the to_datetime() function to achieve the same result:


interviews['start_date'] = pd.to_datetime(interviews['start_date'])

Step #4: Group data by month

Now that the start_date column data type is datetime64, we can easily use it to group our data. We’ll set the start_date datetime as the DataFrame index, and use a pd.Grouper to define the aggregation frequency.

interviews.set_index('start_date').groupby([pd.Grouper( freq = 'Y')])['salary'].mean()

This will return the following Series object:

start_date
2022-12-31    146.0
2023-12-31    179.5
Freq: A-DEC, Name: salary, dtype: float64

Format string as dates with strftime

We can format string columns according to different date formats with the very useful strftime function:


interviews['start_date'].dt.strftime('%d/%m/%y')

This will return a Series of strings, but formatted according to dd/mm/yyyy time format:

0    01/10/23
1    11/01/22
2    12/10/22
3    02/12/23
Name: start_date, dtype: object

Suggested follow-up learning

How to convert list elements to datetime objects in Python?