How to replace zero with specific values in Pandas DataFrames columns?

As part of your data cleaning tasks, you might need to process large amounts of data that is missing or somewhat flawed. In this short tutorial we will demonstrate several usages of the Python Pandas DataFrame replace method to change zero values in a specific DataFrame columns.

Create example DataFrame

Let’s get started with creating a simple Python DataFrame:

# Import Pandas library

import pandas as pd

# Define DataFrame columns as lists
city =  ['Atlanta', 'Barcelona', 'Atlanta','Paris']
skill =  ['Python', 'R', 'Scala', 'Python']
salary = [102,104,120,0]

# Initialize DataFrame
salaries = dict(city=city, skill=skill, salary=salary)
campaigns = pd.DataFrame (salaries)

#display DataFrame

Here’s our DataFrame content:


Replace zero value with the column mean

You might want to replace those missing values with the average value of your DataFrame column. In our case, we’ll modify the salary column. Here is a simple snippet that you can use:

salary_col = campaigns['salary']
salary_col.replace(to_replace = 0, value = salary_col.mean(), inplace=True)

Here’s our DataFrame:


Important note: By default, the replace method doesn’t persist changes in your DataFrame. We used the inplace=True to ensure changes made are permanent.

Replacing 0 with NAN values

In the same fashion we can change zero values to pd.NA or np,nan by using the following code:

campaigns['salary'].replace(to_replace = 0, value = pd.NA, inplace=True)

Change zero values to the previous column value

By using the method parameter, we can specify a different replacement strategy for our cell values. In our case we will method=’ffill’ . Conversely, we can use method=’bfill’ to assign the next value to our cell.

salary_col = campaigns['salary']
salary_col.replace(to_replace = 0,  method='ffill', inplace=True)

Replace specific values according to condition

We can also use the loc accessor to modify zero cell values as shown below. In our case, we’ll set the missing values to be equal to the column maximal value.

condition  = (campaigns['salary'] == 0 )
campaigns.loc[condition, 'salary'] = campaigns['salary'].max()

Next Learning