How to convert a Python DataFrame column to float and int types?

As part of our Data Wrangling process we need to often cast certain columns of our DataFrame to other data types.

In today’s tutorial we’ll learn how to change data types of one or multiple columns to integer and float data types.

Create the example data

Let’s start by quickly creating a test DataFrame that you can use in order to follow along:

# Importing the Pandas library
import pandas as pd

#Define dictionary
emp_dict = {'city': ['New York', 'Boston', 'Austin'],
              'num_employees': ['10','20','58'],
            'annual_revenue': ['10.54','21.45','58.65']}

#Initialize DataFrame
offices = pd.DataFrame(emp_dict)

Let’s look into the different column dtypes:

offices.dtypes
city              object
num_employees     object
annual_revenue    object
dtype: object

Change type of a single column to float or int

The code below returns a Series containing the converted column values:


offices['num_employees'].astype(dtype ='int64')

Note that although the column values will be converted, the change won’t be persisted in your original DataFrame (Note that unlike in other Pandas methods, astype() doesn’t have the inPlace parameter).

If you want to persist the changes you can use the following:


offices['num_employees']= offices['num_employees'].astype(dtype ='int64')

Similarly for float64 dtypes:


offices['annual_revenue'] =offices['annual_revenue'].astype(dtype = 'float64')

Let us now go ahead and check our DataFrame data types. Run the following command:

offices.dtypes

And here is the result we will receive:

city               object
num_employees       int64
annual_revenue    float64
dtype: object

Convert multiple columns to different data types

We can convert multiple columns simultaneously by passing a dictionary containing key/value pairs consisting of the column label and the required data type.

Here is an example:

offices = offices.astype({'num_employees': 'int64','annual_revenue': 'float64' })

Using pd.to_numeric()

Pd.to_numeric() offers a slightly less flexible method for casting column values to numeric types:

# cast to int8
pd.to_numeric(offices['num_employees'], downcast='integer')

# cast to float32
pd.to_numeric(offices['annual_revenue'], downcast='float')

Additional learning: