Today we’ll learn how to replace empty values in Pandas DataFrame columns with a string object. We’ll show how to use the Pandas DataFrame replace() method to easily convert
We’ll start by defining a sample DataFrame that you can use in order to follow along this example:
import pandas as pd
import numpy as np
month = ['March', 'March', 'March', 'April', 'April', 'March']
office = ['Toronto', 'New York', 'Las Vegas', 'Madrid', 'Toronto', 'Rome']
salary = [np.nan, 138.0, 108.0,"",pd.NA , np.nan]
salary_dict = dict(month = month, office = office, salary = salary)
hr_df = pd.DataFrame(data=salary_dict)
hr_df
Let’s look into our DataFrame:
month | office | salary | |
---|---|---|---|
0 | March | Toronto | NaN |
1 | March | New York | 138.0 |
2 | March | Las Vegas | 108.0 |
3 | April | Madrid | |
4 | April | Toronto | <NA> |
5 | March | Rome | NaN |
As can be observed, we have several empty values (either NAN or just blank values). We would like to handle these.
Replacing nan with other values in Pandas
Replace is a pretty self explanatory function. We first define the values to replace and then the replacement values (in our case – we opted to use the string ‘None’.
hr_df.replace(to_replace=np.nan, value='None')
Note: optionally we can use the inplace=True parameter to persist changes in our original DataFrame.
Let’s take a look at the result:
month | office | salary | |
---|---|---|---|
0 | March | Toronto | None |
1 | March | New York | 138.0 |
2 | March | Las Vegas | 108.0 |
3 | April | Madrid | |
4 | April | Toronto | None |
5 | March | Rome | None |
Replacing empty fields in our DataFrame
We still have an empty cell in row 3. Replacing it is simple. Instead of passing a single string into the to_replace parameter; we’ll pass a list. Here’s the script:
hr_df.replace(to_replace=[np.nan,""], value='None')
That’s it!
Replace nan with 0 values
A very simple tweak allows to fill zeros instead of every nan or empty value in the DataFrame (or one of it columns).
hr_df.replace(to_replace=[np.nan,""], value=0)
Replace nan with median values
For completeness, here’s a simple snippet that leverages the fillna() method to replace nan values in a column with the median value for that specific column. First we’ll convert the empty values to np.nan; then invoke fillna().
hr_df['salary'].replace(to_replace="", value=np.nan)
hr_df['salary'].fillna(value= hr_df['salary'].median())