How to convert Pandas DataFrame columns to string type?

In today’s Pandas Data Analysis tutorial i would like to cover the basics of Python DataFrame column conversion to strings.

We will focus on several key use cases here:

  1. Converting specific columns to strings using the astype() method.
  2. Exporting a DataFrame to a string object
  3. Converting a Datetime object to a string

Example data

We will start by creating some test data for you to follow along this exercise:

#import Pandas
import pandas as pd

#Define data dictionary
cand_dict = {'office_id' : [ 'ny', 2,3],
    'city': ['nyc', 'boston', 'austin'],
              'num_candidates': [10,20,58]}

#Initialize DataFrame
candidates = pd.DataFrame(cand_dict)

Let’s find out the respective data types of our DataFrame columns:

candidates.dtypes

The result will be:

office_id         object
city              object
num_candidates     int64
dtype: object

The column office_id includes both numeric integers and text characters, hence it is assigned an object data type.

Convert DataFrame columns to strings

Let’s assume that we would like to concatenate the office_id and the city columns.

candidates['city_id'] = candidates['office_id'] + '_'+ candidates['city']

This will render a Type error, as we are trying to concatenate integers and strings.

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Let’s then go ahead and convert the city_id to the string data type and then easily combine the columns:

candidates['office_id'] = candidates['office_id'].astype('string')
candidates['city_id'] = candidates['office_id'] + '_'+ candidates['city']

print(candidates.head())

We’ll get the city_id column in our DataFrame:

office_idcitynum_candidatescity_id
0nynyc10ny_nyc
12boston202_boston
23austin583_austin

Cast DataFrame object to string

We found out beforehand that the city field was interpreted as a Pandas objects. We can cast every column to a specific data type by passing a dictionary of matching column and dtype pairs as highlighted below:

candidates.astype({'city':'string', 'num_candidates':'int32'}).dtypes

And the result that we will get is:

city string num_candidates int32 dtype: object

Export Pandas Dataframe to strings using to_string()

Here’s an example:

# Saving a DataFrame column
print(candidates['city'].to_string())

#Entire DataFrame
print(candidates.to_string())

Convert Datetime to string

In our next example, we’ll use Python to turn a column containing Datetime type objects to strings.

We’ll first generate the some simple DataFrame :

import pandas as pd
week = pd.date_range('2022-10-10', periods = 7, freq = 'd')
sales = [120, 130, 150, 167, 180, 120, 150 ]
sales_df = pd.DataFrame (dict(week = week, sales=sales))

sales_df.dtypes

Will return:

week     datetime64[ns]
sales             int64
dtype: object

Now we can use the astype method as shown above to return a series.

sales_df['week'].astype('string')

Note that the conversion to strings won’t be perpetuated in your original DataFrame. You can easily persist your changes by creating a new DataFrame:

sales_df_2 = sales_df.astype({'week':'string'})

Change Datetime format and convert to string

In the next example we’ll simply go ahead and modify the format of a datetime column using the srftime fomatter. In this example we’ll use the data that we have previously generated.


sales_df['week'].dt.strftime('%d-%m-%y')

We’ll get the following result

In a similar fashion you can modify the datetime value to other formats, including years, months, days, hours, minutes and so forth.

Example of Renaming a DataFrame column

A common requirement is to rename a column after it being cast to a new data type. Here is a quick snippet that you can use as an example:

cand_2 = candidates.astype({'city':'string', 'num_candidates':'int32'})

cand_2.rename(columns = {'city':'my_city'})

Converting Pandas column to int

A couple of readers have asked about a simple process to convert a column containing numeric data from string to integer types. Wanted to point you to this tutorial that helps understanding how to cast columns to int in Pandas.

The gist of it is simply to use the astype method on a specific DataFrame column, In our case:

candidates['num_candidates'].astype('int')