Convert pandas string column to date time format
We are able to convert string values in a pandas DataFrame column using the astype() Series method and the pandas pd.todatetime() function:
your_df['your_col'].astype('datetime64')
# or alternatively:
pd.to_datetime(your_df['your_col'])
Step #1: Create example data
Let’s start by constructing a DataFrame:
import pandas as pd
language = ['Python', 'Javascript', 'Javascript', 'Python']
salary = [187.0, 153.0, 139.0, 172.0]
position_open_date = ['10-1-2023', '1-11-2022', '10-12-2022', '12-2-2023']
data = dict(language = language, salary = salary, start_date = position_open_date )
interviews = pd.DataFrame(data=data)
interviews.head()
Here’s our data:
language | salary | start_date | |
---|---|---|---|
0 | Python | 187.0 | 10-1-2023 |
1 | Javascript | 153.0 | 1-11-2022 |
2 | Javascript | 139.0 | 10-12-2022 |
3 | Python | 172.0 | 12-2-2023 |
Next, let’s look at the data types of our DataFrame columns:
interviews.dtypes
This will return:
language object salary float64 start_date object dtype: object
As you can see, the start_date column is defined as an object data type. Our taks is to convert it to datetime – so that we can use that column to aggregate data according to week, month or other time periods.
Step #2: Change from string to datetime yyyymm-dd with astype
First method o cast our string column to date is using the Series astype() method:
interviews['start_date'] = interviews['start_date'].astype('datetime64')
This will return the following Series:
0 2023-10-01 1 2022-01-11 2 2022-10-12 3 2023-12-02 Name: start_date, dtype: datetime64[ns]
Step #3: Convert to datetime with to_datetime()
Alternatively, you can use the to_datetime() function to achieve the same result:
interviews['start_date'] = pd.to_datetime(interviews['start_date'])
Step #4: Group data by month
Now that the start_date column data type is datetime64, we can easily use it to group our data. We’ll set the start_date datetime as the DataFrame index, and use a pd.Grouper to define the aggregation frequency.
interviews.set_index('start_date').groupby([pd.Grouper( freq = 'Y')])['salary'].mean()
This will return the following Series object:
start_date
2022-12-31 146.0
2023-12-31 179.5
Freq: A-DEC, Name: salary, dtype: float64
Format string as dates with strftime
We can format string columns according to different date formats with the very useful strftime function:
interviews['start_date'].dt.strftime('%d/%m/%y')
This will return a Series of strings, but formatted according to dd/mm/yyyy time format:
0 01/10/23
1 11/01/22
2 12/10/22
3 02/12/23
Name: start_date, dtype: object