Once we import data from different data sources, we usually need to clean up the imported data so it can be easier to analyze and visualize it. In today’s tutorial we’ll show how to how to replace value in a DataFrame column. We’ll look into a couple of use cases:
- Replace all occurrences of a value in a column
- Replace one or multiple values based on a condition
- Replace a cell with nan value in a column
Creating a test DataFrame
We will get started by importing the Pandas library to our Python Data Analysis environment and then go ahead and create some simple data:
import pandas as pd
city = ['Paris', 'Madrid', 'Boston','Paris']
office = ['North-West', 'South-West', 'South', 'East']
interviews = [102,104,210,pd.NA]
offices = dict(city=city, office=office, interviews=interviews)
campaign = pd.DataFrame (offices)
print(campaign.head())
Here’s our DataFrame:
city | office | interviews | |
---|---|---|---|
0 | Paris | North-West | 102 |
1 | Madrid | South-West | 104 |
2 | Boston | South | 210 |
3 | Paris | East | <NA> |
Update values in a DataFrame column
We can use the replace method and apply it on a specific column as following:
campaign['city'].replace(to_replace='Paris', value= 'Versailles', inplace= True)
Note that the inplace=True parameter persist the updated values in our DataFrame.
city | office | interviews | |
---|---|---|---|
0 | Versailles | West | 102 |
1 | Madrid | West | 104 |
2 | Boston | South | 210 |
3 | Versailles | East | <NA> |
Update Multiple values
Note: Updating multiple values instead of a single one is relatively easy when passing a dictionary as a parameter to the replace method, as shown in the code below:
campaign['city'].replace(to_replace={'Versailles':'Paris', 'Boston':'Cambridge'}, inplace= True)
Replace values in column based on condition
A more interesting case is when we would like to manipulate our DataFrame by replacing column cells which much a specific condition.
In our example we would like to replace all occurrences of the North-West and South-West values in the office column.
We will first define a condition statement. This can be a very simple one, and or a more complex one containing several conditions separated by boolean AND or OR operators.
condition = (campaign['office'] == 'North-West') | (campaign['office'] == 'South-West')
We’ll then use the conditions in our loc indexer to replace the required values as shown below:
campaign.loc[condition, 'office'] = 'West'
The corresponding result will be:
city | office | interviews | |
---|---|---|---|
0 | Versailles | West | 102 |
1 | Madrid | West | 104 |
2 | Boston | South | 210 |
3 | Versailles | East | <NA> |
Replace NAN values in Pandas dataframe column
If you would like to set all empty values in your DataFrame column or Series, you can use the fillna method. Here’s a very simple example:
campaign['interviews'].fillna(0, inplace=True)
This simple snippet updates all null values to 0 for the interviews column.