How to convert a pandas column to numeric types?

This tutorial will teach you how to cast a pandas column of objects to numeric types such as integer or floats. This is a common task you will typically execute during the data cleaning phase of your data acquisition workflow; typically when harmonizing the data types in a column that contains figures, but also non-numeric data such as strings.

Create example data

As we typically do, we will first create a simple DataFrame that you can use to follow along.

import pandas as pd
dates = pd.date_range(start='1/10/2023', periods = 5, freq = 'B' )
sales = [162, '142', 136, '134', None]

campaign = pd.DataFrame (dict (dates = dates, sales = sales))
campaign.head()

This will render the following DataFrame rows:

datessales
02023-01-10162
12023-01-11142
22023-01-12136
32023-01-13134
42023-01-16None

Let’s look at the columns data types:

campaign.dtypes

This will return:

dates    datetime64[ns]
sales            object
dtype: object

The sales column has mixed data types, hence as expected its type is object.

Note: The sales column also contains an empty value, which we would like to replace. If we’ll run the fillna() command on the column we will get the following TypeError exception:

#TypeError: unsupported operand type(s) for +: 'int' and 'str'

Before attempting to replace the empty values in our DataFrame we should first convert the column to numeric values.

Convert pandas column to numbers

There are two main options to cast a Series/ column to integers or float numbers: the pd.to_numeric function and the astype() method. As we have empty values in our column, we’ll opt for pd.to_numeric.

pd.to_numeric vs astype

  • Unlike pd.to_numeric, astype() doesn’t handle NAN empty values. To circumvent this, we use the parameter errors=’ignore’.
  • While pd.to_numeric know to infer the expected data type for the conversion, when using astype() we need to provide the target data type as a parameter.

Using pd.to_numeric

campaign['sales'] = pd.to_numeric(campaign['sales'], errors = 'raise')
campaign['sales'].head()
0    162.0
1    142.0
2    136.0
3    134.0
4      NaN
Name: sales, dtype: float64

Replace empty values

We can now easily replace the empty value in the sales column by the column average value:

campaign['sales'] = campaign['sales'].fillna(campaign['sales'].mean())
campaign['sales'].head()

Here’s the result:

0    162.0
1    142.0
2    136.0
3    134.0
4    143.5
Name: sales, dtype: float64

Converting pandas column from float to integer

Now we can use the astype() method:

campaign['sales'] = campaign['sales'].astype('int64')
campaign['sales'].head()

And a final look at our column:

0    162
1    142
2    136
3    134
4    143
Name: sales, dtype: int64