How to replace nan values with a string in Pandas?

Today we’ll learn how to replace empty values in Pandas DataFrame columns with a string object. We’ll show how to use the Pandas DataFrame replace() method to easily convert

We’ll start by defining a sample DataFrame that you can use in order to follow along this example:

import pandas as pd
import numpy as np

month = ['March', 'March', 'March', 'April', 'April', 'March']
office = ['Toronto', 'New York', 'Las Vegas', 'Madrid', 'Toronto', 'Rome']
salary = [np.nan, 138.0, 108.0,"",pd.NA , np.nan]
salary_dict = dict(month = month, office = office, salary = salary)
hr_df = pd.DataFrame(data=salary_dict)

Let’s look into our DataFrame:

1MarchNew York138.0
2MarchLas Vegas108.0

As can be observed, we have several empty values (either NAN or just blank values). We would like to handle these.

Replacing nan with other values in Pandas

Replace is a pretty self explanatory function. We first define the values to replace and then the replacement values (in our case – we opted to use the string ‘None’.

hr_df.replace(to_replace=np.nan, value='None')

Note: optionally we can use the inplace=True parameter to persist changes in our original DataFrame.

Let’s take a look at the result:

1MarchNew York138.0
2MarchLas Vegas108.0

Replacing empty fields in our DataFrame

We still have an empty cell in row 3. Replacing it is simple. Instead of passing a single string into the to_replace parameter; we’ll pass a list. Here’s the script:

hr_df.replace(to_replace=[np.nan,""], value='None')

That’s it!

Replace nan with 0 values

A very simple tweak allows to fill zeros instead of every nan or empty value in the DataFrame (or one of it columns).

hr_df.replace(to_replace=[np.nan,""], value=0)

Replace nan with median values

For completeness, here’s a simple snippet that leverages the fillna() method to replace nan values in a column with the median value for that specific column. First we’ll convert the empty values to np.nan; then invoke fillna().

hr_df['salary'].replace(to_replace="", value=np.nan)
hr_df['salary'].fillna(value= hr_df['salary'].median())

Related learning