As part of your data cleaning tasks, you might need to process large amounts of data that is missing or somewhat flawed. In this short tutorial we will demonstrate several usages of the Python Pandas replace method to change zero values in specific DataFrame columns.
Replace zero values in Pandas columns
In Pandas, you can use the DataFrame and Series replace() function to modify the content of your DataFrame cells. For example, if your DataFrame name is my_df, you can use the following code to change all cells containing zeros to empty values.
my_df.replace(to_replace = 0, value = '', inplace=true)
Change cells with zero values – practical example
Let’s get started with creating a simple Python DataFrame:
# Import Pandas library
import pandas as pd
# Define DataFrame columns as lists
city = ['Atlanta', 'Barcelona', 'Atlanta','Paris']
skill = ['Python', 'R', 'Scala', 'Python']
salary = [102,104,120,0]
# Initialize DataFrame
salaries = dict(city=city, skill=skill, salary=salary)
campaigns = pd.DataFrame (salaries)
#display DataFrame
campaigns.head()
Here’s our DataFrame content:
city | skill | salary | |
---|---|---|---|
0 | Atlanta | Python | 102 |
1 | Barcelona | R | 104 |
2 | Atlanta | Scala | 120 |
3 | Paris | Python | 0 |
Replace zero value with the column mean
You might want to replace those specific missing values with the average value of your DataFrame column. In our case, we’ll modify the salary column. Here is a simple snippet that you can use:
salary_col = campaigns['salary']
salary_col.replace(to_replace = 0, value = salary_col.mean(), inplace=True)
Here’s our DataFrame:
city | skill | salary | |
---|---|---|---|
0 | Atlanta | Python | 102.0 |
1 | Barcelona | R | 104.0 |
2 | Atlanta | Scala | 120.0 |
3 | Paris | Python | 81.5 |
Important note: By default, the replace method doesn’t persist changes in your DataFrame. We used the inplace=True to ensure changes made are permanent.
Replacing 0 with NAN blank values
In the same fashion we can change zero values to pd.NA, np,nan or None values by using the following code:
campaigns['salary'].replace(to_replace = 0, value = pd.NA, inplace=True)
Change zero values to the previous column value
By using the method parameter, we can specify a different replacement strategy for our cell values. In our case we will method=’ffill’ . Conversely, we can use method=’bfill’ to assign the next value to our cell.
salary_col = campaigns['salary']
salary_col.replace(to_replace = 0, method='ffill', inplace=True)
Replace specific values based on condition
We can also use the loc accessor to modify zero cell values as shown below. In our case, we’ll set the missing values to be equal to the column maximal value.
condition = (campaigns['salary'] == 0 )
campaigns.loc[condition, 'salary'] = campaigns['salary'].max()
We can also modify values use the iloc accessor:
condition = (campaigns['salary'] == 0 )
campaigns.iloc[condition, 2] = campaigns['salary'].max()
Replace column cells with one value
If we would like to replace all column rows with a specific value, we will use the following snippet:
campaigns = campaigns.assign(salary = 120)
Modifying certain values in Pandas DataFrames
Once we import data from different data sources, we usually need to clean up the imported data so it can be easier to analyze and visualize it. In today’s tutorial we’ll show how to how to replace value in a DataFrame column. We’ll look into a couple of use cases:
- Replace all occurrences of a value in a column
- Replace one or multiple values based on a condition
- Replace a cell with nan value in a column
Creating a test DataFrame
We will get started by importing the Pandas library to our Python Data Analysis environment and then go ahead and create some simple data:
import pandas as pd
city = ['Paris', 'Madrid', 'Boston','Paris']
office = ['North-West', 'South-West', 'South', 'East']
interviews = [102,104,210,pd.NA]
offices = dict(city=city, office=office, interviews=interviews)
campaign = pd.DataFrame (offices)
print(campaign.head())
Here’s our DataFrame:
city | office | interviews | |
---|---|---|---|
0 | Paris | North-West | 102 |
1 | Madrid | South-West | 104 |
2 | Boston | South | 210 |
3 | Paris | East | <NA> |
Update values in a DataFrame column
We can use the replace method and apply it on a specific column as following:
campaign['city'].replace(to_replace='Paris', value= 'Versailles', inplace= True)
Note that the inplace=True parameter persist the updated values in our DataFrame.
city | office | interviews | |
---|---|---|---|
0 | Versailles | West | 102 |
1 | Madrid | West | 104 |
2 | Boston | South | 210 |
3 | Versailles | East | <NA> |
Update Multiple values
Note: Updating multiple values instead of a single one is relatively easy when passing a dictionary as a parameter to the replace method, as shown in the code below:
campaign['city'].replace(to_replace={'Versailles':'Paris', 'Boston':'Cambridge'}, inplace= True)
Replace values in column based on condition
A more interesting case is when we would like to manipulate our DataFrame by replacing column cells which much a specific condition.
In our example we would like to replace all occurrences of the North-West and South-West values in the office column.
We will first define a condition statement. This can be a very simple one, and or a more complex one containing several conditions separated by boolean AND or OR operators.
condition = (campaign['office'] == 'North-West') | (campaign['office'] == 'South-West')
We’ll then use the conditions in our loc indexer to replace the required values as shown below:
campaign.loc[condition, 'office'] = 'West'
The corresponding result will be:
city | office | interviews | |
---|---|---|---|
0 | Versailles | West | 102 |
1 | Madrid | West | 104 |
2 | Boston | South | 210 |
3 | Versailles | East | <NA> |
Replace NAN values in Pandas dataframe column
If you would like to set all empty values in your DataFrame column or Series, you can use the fillna method. Here’s a very simple example:
campaign['interviews'].fillna(0, inplace=True)
This simple snippet updates all null values to 0 for the interviews