How to convert Pandas DataFrame columns to int types?

Cast pandas column cells to integer

In order to convert one or more pandas DataFrame columns to the integer data type use the astype() method. Here’s a simple example:

# single column / series
my_df['my_col'].astype('int64')

# for multiple columns
my_df.astype({'my_first_col':'int64', 'my_second_col':'int64'})

In this tutorial, we will look into three main use cases:

  • Casting a specific column from float to int
  • Convert a column containing nan empty values to int
  • Converting multiple columns to int / int64

Sample Pandas DataFrame

Let’s get started by writing some simple Python code that will help us to create some test data that you can use to follow along.


import pandas pd

#Lists containing test data

offices = ['Paris', 'Madrid', 'London', 'Barcelona', 'Brussels']
num_interviews = [129.0, 132.0, 145.0, 230.0, pd.NA]
positions = [12.0, 15.0, 13.0, 13.5, 3]

#Create pandas DataFrame from dictionary

interviews_dict  = dict(office=offices, total_interviews =num_interviews, total_positions = positions )

interviews  = pd.DataFrame(interviews_dict)

interviews.head()

print(interviews.head())

Here’s our DataFrame:

officetotal_interviewstotal_positions
0Paris129.012.0
1Madrid132.015.0
2London145.013.0
3Barcelona230.013.5
4Brussels<NA>3.0

Let’s find out the data types for the different DataFrame columns:

print (interviews.dtypes)
office               object
total_interviews     object
total_positions     float64
dtype: object

Convert a single column from float to integer

We will start by converting a single column from float64 to int and int64 data types.

interviews['total_positions'].astype('int')

This will return a series casted to int. To change the type to int64, simply type:


interviews['total_positions'].astype('int64')

Handling conversion of columns to int with nan values

You might have noted that one of our DataFrame columns contains an empty value. Trying to cast it to integer will render the following error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

We should therefore handle the empty values first and then go ahead and cast the column:

interviews['total_interviews'].fillna(0, 
inplace=True).astype(int)

Converting multiple columns to int types

Let us look into a more realistic scenario in which we cast multiple columns at the same time. We’ll first go ahead and take care of cells containing empty values.


interviews.fillna(0, inplace=True)

We’ll then cast multiple columns to int64. Unlike before, we’ll pass a dictionary containing the columns to convert and the required dtype for each.

interviews_2 = interviews.astype({'total_interviews':'int64', 'total_positions':'int64'})

We’ll finish up by verifying the data types:

interviews_2.dtypes

This will render:

office              object
total_interviews     int64
total_positions      int64
dtype: object

Rename converted columns

Last, we can go ahead and rename the columns that you just converted. Also here. we’ll pass a mapping dictionary as a parameter to the DataFrame method. Here’s a short snippet:

interviews.rename(mapper = {'total_interviews':'ti_int', 'total_positions':'tp_int'}, axis=1, inplace=True)