In this tutorial we will learn how to replace empty values in one or multiple pandas DataFrame columns. Empty cells in pandas might contain any of the following values: np.nan, pd.NA, None or just be empty. In pandas we typically use the fillna() DataFrame and Series method to replace missing values with
Example data
First and foremost we will import the pandas and numpy libraries. We will use numpy to create the empty values in our Dataframe using np.nan.
import pandas as pd
import numpy as np
We will then initialize our DataFrame:
sales_data = {
'domain' : ['Retail', 'B2C', 'Online', 'B2B'],
'telesales' : [340, np.nan, 244, 209],
'direct' : [337, 263, 168, np.nan],
'indirect' : [171, 200, 210, np.nan],
}
sales = pd.DataFrame (sales_data)
sales.head()
Here’s our data, note the empty cells, marked by NaN:
domain | telesales | direct | indirect | |
---|---|---|---|---|
0 | Retail | 340.0 | 337.0 | 171.0 |
1 | B2C | NaN | 263.0 | 200.0 |
2 | Online | 244.0 | 168.0 | 210.0 |
3 | B2B | 209.0 | NaN | NaN |
Find and count empty cells
We can use the isna() DataFrame method to find and count cells with empty values in our DataFrame columns:
sales.isna().sum()
This will return the following series consisting of the number of empty cells for each column:
domain 0 telesales 1 direct 1 indirect 1 dtype: int64
Fill empty cells with zero values
Now that we know the overall number of empty cells, we can replace their value with 0. We can invoke the fillna() method on the entire DataFrame as shown below:
sales = sales.fillna(0)
# or alternatively
sales.fillna(0, inplace = True)
Fill empty values with values from other column
In this case we would like to use values from our column in order to replace missing values in other column:
sales['direct'] = round(sales['direct'].fillna(sales['indirect'].mean()))
This will render the following values:
0 337.0 1 263.0 2 168.0 3 194.0 Name: direct, dtype: float64
Replace nan values with string
We can also using a string in order to replace missing values:
sales['telesales'].fillna('Not Available')
This will render the following Series:
0 340.0 1 Not Available 2 244.0 3 209.0 Name: telesales, dtype: object
Replace empty values with previous one
Other option is to use a value from the previous observation to fill an empty cell in our column:
sales['telesales'].fillna(method= 'ffill')
The result:
0 340.0 1 340.0 2 244.0 3 209.0 Name: telesales, dtype: float64
Fill missing value from another row
Last case is to fill a cell with values from the next observation:
sales['telesales'].fillna(method = 'bfill')
This will result in:
0 340.0 1 244.0 2 244.0 3 209.0 Name: telesales, dtype: float64