How to replace values in Pandas DataFrame columns?

Once we import data from different data sources, we usually need to clean up the imported data so it can be easier to analyze and visualize it. In today’s tutorial we’ll show how to how to replace value in a DataFrame column. We’ll look into a couple of use cases:

  • Replace all occurrences of a value in a column
  • Replace one or multiple values based on a condition
  • Replace a cell with nan value in a column

Creating a test DataFrame

We will get started by importing the Pandas library to our Python Data Analysis environment and then go ahead and create some simple data:

import pandas as pd
city =  ['Paris', 'Madrid', 'Boston','Paris']
office =  ['North-West', 'South-West', 'South', 'East']
interviews = [102,104,210,pd.NA]

offices = dict(city=city, office=office, interviews=interviews)
campaign = pd.DataFrame (offices)

print(campaign.head())

Here’s our DataFrame:

cityofficeinterviews
0ParisNorth-West102
1MadridSouth-West104
2BostonSouth210
3ParisEast<NA>

Update values in a DataFrame column

We can use the replace method and apply it on a specific column as following:

campaign['city'].replace(to_replace='Paris', value= 'Versailles', inplace= True)

Note that the inplace=True parameter persist the updated values in our DataFrame.

cityofficeinterviews
0VersaillesWest102
1MadridWest104
2BostonSouth210
3VersaillesEast<NA>

Update Multiple values

Note: Updating multiple values instead of a single one is relatively easy when passing a dictionary as a parameter to the replace method, as shown in the code below:

campaign['city'].replace(to_replace={'Versailles':'Paris', 'Boston':'Cambridge'}, inplace= True)

Replace values in column based on condition

A more interesting case is when we would like to manipulate our DataFrame by replacing column cells which much a specific condition.

In our example we would like to replace all occurrences of the North-West and South-West values in the office column.

We will first define a condition statement. This can be a very simple one, and or a more complex one containing several conditions separated by boolean AND or OR operators.

condition = (campaign['office'] == 'North-West') | (campaign['office'] == 'South-West')

We’ll then use the conditions in our loc indexer to replace the required values as shown below:

campaign.loc[condition, 'office'] = 'West'

The corresponding result will be:

cityofficeinterviews
0VersaillesWest102
1MadridWest104
2BostonSouth210
3VersaillesEast<NA>

Replace NAN values in Pandas dataframe column

If you would like to set all empty values in your DataFrame column or Series, you can use the fillna method. Here’s a very simple example:

campaign['interviews'].fillna(0, inplace=True)

This simple snippet updates all null values to 0 for the interviews column.

Additional recommended learning