As part of your data cleaning tasks, you might need to process large amounts of data that is missing or somewhat flawed. In this short tutorial we will demonstrate several usages of the Python Pandas replace method to change zero values in specific DataFrame columns.
Replace zero values in Pandas columns
In Pandas, you can use the DataFrame and Series replace() function to modify the content of your DataFrame cells. For example, if your DataFrame name is my_df, you can use the following code to change all cells containing zeros to empty values.
my_df.replace(to_replace = 0, value = '', inplace=true)
Change cells with zero values – practical example
Let’s get started with creating a simple Python DataFrame:
# Import Pandas library
import pandas as pd
# Define DataFrame columns as lists
city = ['Atlanta', 'Barcelona', 'Atlanta','Paris']
skill = ['Python', 'R', 'Scala', 'Python']
salary = [102,104,120,0]
# Initialize DataFrame
salaries = dict(city=city, skill=skill, salary=salary)
campaigns = pd.DataFrame (salaries)
#display DataFrame
campaigns.head()
Here’s our DataFrame content:
city | skill | salary | |
---|---|---|---|
0 | Atlanta | Python | 102 |
1 | Barcelona | R | 104 |
2 | Atlanta | Scala | 120 |
3 | Paris | Python | 0 |
Replace zero value with the column mean
You might want to replace those missing values with the average value of your DataFrame column. In our case, we’ll modify the salary column. Here is a simple snippet that you can use:
salary_col = campaigns['salary']
salary_col.replace(to_replace = 0, value = salary_col.mean(), inplace=True)
Here’s our DataFrame:
city | skill | salary | |
---|---|---|---|
0 | Atlanta | Python | 102.0 |
1 | Barcelona | R | 104.0 |
2 | Atlanta | Scala | 120.0 |
3 | Paris | Python | 81.5 |
Important note: By default, the replace method doesn’t persist changes in your DataFrame. We used the inplace=True to ensure changes made are permanent.
Replacing 0 with NAN values
In the same fashion we can change zero values to pd.NA, np,nan or None by using the following code:
campaigns['salary'].replace(to_replace = 0, value = pd.NA, inplace=True)
Change zero values to the previous column value
By using the method parameter, we can specify a different replacement strategy for our cell values. In our case we will method=’ffill’ . Conversely, we can use method=’bfill’ to assign the next value to our cell.
salary_col = campaigns['salary']
salary_col.replace(to_replace = 0, method='ffill', inplace=True)
Replace specific values based on condition
We can also use the loc accessor to modify zero cell values as shown below. In our case, we’ll set the missing values to be equal to the column maximal value.
condition = (campaigns['salary'] == 0 )
campaigns.loc[condition, 'salary'] = campaigns['salary'].max()
We can also modify values use the iloc accessor:
condition = (campaigns['salary'] == 0 )
campaigns.iloc[condition, 2] = campaigns['salary'].max()