As part of our Data Wrangling process we need to often cast certain columns of our DataFrame to other data types.
In today’s tutorial we’ll learn how to change data types of one or multiple columns to integer and float data types.
Create the example data
Let’s start by quickly creating a test DataFrame that you can use in order to follow along:
# Importing the Pandas library
import pandas as pd
#Define dictionary
emp_dict = {'city': ['New York', 'Boston', 'Austin'],
'num_employees': ['10','20','58'],
'annual_revenue': ['10.54','21.45','58.65']}
#Initialize DataFrame
offices = pd.DataFrame(emp_dict)
Let’s look into the different column dtypes:
offices.dtypes
city object num_employees object annual_revenue object dtype: object
Change type of a single column to float or int
The code below returns a Series containing the converted column values:
offices['num_employees'].astype(dtype ='int64')
Note that although the column values will be converted, the change won’t be persisted in your original DataFrame (Note that unlike in other Pandas methods, astype() doesn’t have the inPlace parameter).
If you want to persist the changes you can use the following:
offices['num_employees']= offices['num_employees'].astype(dtype ='int64')
Similarly for float64 dtypes:
offices['annual_revenue'] =offices['annual_revenue'].astype(dtype = 'float64')
Let us now go ahead and check our DataFrame data types. Run the following command:
offices.dtypes
And here is the result we will receive:
city object num_employees int64 annual_revenue float64 dtype: object
Convert multiple columns to different data types
We can convert multiple columns simultaneously by passing a dictionary containing key/value pairs consisting of the column label and the required data type.
Here is an example:
offices = offices.astype({'num_employees': 'int64','annual_revenue': 'float64' })
Using pd.to_numeric()
Pd.to_numeric() offers a slightly less flexible method for casting column values to numeric types:
# cast to int8
pd.to_numeric(offices['num_employees'], downcast='integer')
# cast to float32
pd.to_numeric(offices['annual_revenue'], downcast='float')