Cast pandas column cells to integer
In order to convert one or more pandas DataFrame columns to the integer data type use the astype() method. Here’s a simple example:
# single column / series
my_df['my_col'].astype('int64')
# for multiple columns
my_df.astype({'my_first_col':'int64', 'my_second_col':'int64'})
In this tutorial, we will look into three main use cases:
- Casting a specific column from float to int
- Convert a column containing nan empty values to int
- Converting multiple columns to int / int64
Sample Pandas DataFrame
Let’s get started by writing some simple Python code that will help us to create some test data that you can use to follow along.
import pandas pd
#Lists containing test data
offices = ['Paris', 'Madrid', 'London', 'Barcelona', 'Brussels']
num_interviews = [129.0, 132.0, 145.0, 230.0, pd.NA]
positions = [12.0, 15.0, 13.0, 13.5, 3]
#Create pandas DataFrame from dictionary
interviews_dict = dict(office=offices, total_interviews =num_interviews, total_positions = positions )
interviews = pd.DataFrame(interviews_dict)
interviews.head()
print(interviews.head())
Here’s our DataFrame:
office | total_interviews | total_positions | |
---|---|---|---|
0 | Paris | 129.0 | 12.0 |
1 | Madrid | 132.0 | 15.0 |
2 | London | 145.0 | 13.0 |
3 | Barcelona | 230.0 | 13.5 |
4 | Brussels | <NA> | 3.0 |
Let’s find out the data types for the different DataFrame columns:
print (interviews.dtypes)
office object
total_interviews object
total_positions float64
dtype: object
Convert a single column from float to integer
We will start by converting a single column from float64 to int and int64 data types.
interviews['total_positions'].astype('int')
This will return a series casted to int. To change the type to int64, simply type:
interviews['total_positions'].astype('int64')
Handling conversion of columns to int with nan values
You might have noted that one of our DataFrame columns contains an empty value. Trying to cast it to integer will render the following error:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'
We should therefore handle the empty values first and then go ahead and cast the column:
interviews['total_interviews'].fillna(0,
inplace=True).astype(int)
Converting multiple columns to int types
Let us look into a more realistic scenario in which we cast multiple columns at the same time. We’ll first go ahead and take care of cells containing empty values.
interviews.fillna(0, inplace=True)
We’ll then cast multiple columns to int64. Unlike before, we’ll pass a dictionary containing the columns to convert and the required dtype for each.
interviews_2 = interviews.astype({'total_interviews':'int64', 'total_positions':'int64'})
We’ll finish up by verifying the data types:
interviews_2.dtypes
This will render:
office object
total_interviews int64
total_positions int64
dtype: object
Rename converted columns
Last, we can go ahead and rename the columns that you just converted. Also here. we’ll pass a mapping dictionary as a parameter to the DataFrame method. Here’s a short snippet:
interviews.rename(mapper = {'total_interviews':'ti_int', 'total_positions':'tp_int'}, axis=1, inplace=True)