How to fill missing values in pandas columns?

In this tutorial we will learn how to replace empty values in one or multiple pandas DataFrame columns. Empty cells in pandas might contain any of the following values: np.nan, pd.NA, None or just be empty. In pandas we typically use the fillna() DataFrame and Series method to replace missing values with

Example data

First and foremost we will import the pandas and numpy libraries. We will use numpy to create the empty values in our Dataframe using np.nan.

import pandas as pd
import numpy as np

We will then initialize our DataFrame:

sales_data = {
            'domain' : ['Retail', 'B2C', 'Online', 'B2B'], 
            'telesales' : [340, np.nan, 244, 209],
             'direct' : [337, 263, 168, np.nan],
            'indirect' : [171, 200, 210, np.nan],           
}

sales = pd.DataFrame (sales_data)

sales.head()

Here’s our data, note the empty cells, marked by NaN:

domaintelesalesdirectindirect
0Retail340.0337.0171.0
1B2CNaN263.0200.0
2Online244.0168.0210.0
3B2B209.0NaNNaN

Find and count empty cells

We can use the isna() DataFrame method to find and count cells with empty values in our DataFrame columns:

sales.isna().sum()

This will return the following series consisting of the number of empty cells for each column:

domain       0
telesales    1
direct       1
indirect     1
dtype: int64

Fill empty cells with zero values

Now that we know the overall number of empty cells, we can replace their value with 0. We can invoke the fillna() method on the entire DataFrame as shown below:

sales = sales.fillna(0)

# or alternatively 

sales.fillna(0, inplace = True)

Fill empty values with values from other column

In this case we would like to use values from our column in order to replace missing values in other column:

sales['direct'] = round(sales['direct'].fillna(sales['indirect'].mean()))

This will render the following values:

0    337.0
1    263.0
2    168.0
3    194.0
Name: direct, dtype: float64

Replace nan values with string

We can also using a string in order to replace missing values:

sales['telesales'].fillna('Not Available')

This will render the following Series:

0            340.0
1    Not Available
2            244.0
3            209.0
Name: telesales, dtype: object

Replace empty values with previous one

Other option is to use a value from the previous observation to fill an empty cell in our column:

sales['telesales'].fillna(method= 'ffill')

The result:

0    340.0
1    340.0
2    244.0
3    209.0
Name: telesales, dtype: float64

Fill missing value from another row

Last case is to fill a cell with values from the next observation:

sales['telesales'].fillna(method  = 'bfill')

This will result in:

0    340.0
1    244.0
2    244.0
3    209.0
Name: telesales, dtype: float64

Next Learning

How to initialize a list from pandas DataFrame data?