How to find and replace values in Pandas DataFrames and Series?

As part of your cleaning you data and prepare it for analysis, you might need to find and replace data stored in your pandas DataFrame. In this hands-on tutorial, we will use the very helpful replace() method; available for both DataFrames and Series objects to quickly change cell values in pandas.

Define example DataFrame

We will begin by importing the pandas library and building our example dataset.

import pandas as pd

office = ['Istanbul', pd.NA, 'Istanbul', 'New York', 'Toronto', 'Osaka']
month = ['June', 'October', 'June', 'September', 'December', 'March']
language = ['R', 'Javascript', 'R', 'Java', 'Javascript', 'Python']
salary = [136.0, 156.0, 199.0, 164.0, 85.0, pd.NA]

hr = dict(month = month, language  =language, salary = salary, office = office)

sal_df = pd.DataFrame(data=hr)

print(sal_df)

Here’s the output:

monthlanguagesalaryoffice
0JuneR136.0Istanbul
1OctoberJavascript156.0 <NA>
2JuneR199.0Istanbul
3SeptemberJava164.0New York
4DecemberJavascript85.0Toronto
5MarchPython<NA>Osaka

Replacing a value in a pandas column / Series

Let’s assume that we would like to replace all occurrences of a value in a column.

We can easily see the impacted rows by filtering the DataFrame as shown below:

sal_df[sal_df['language'] == 'R']
monthlanguagesalaryoffice
0JuneR136.0Istanbul
2JuneR199.0Istanbul

We’ll now replace all occurrences of the value R with the value Python:


sal_df['language'] = sal_df['language'].replace(to_replace = 'R', value = 'Python')

Or alternatively persist your changes by using the inplace=True parameter.


sal_df['language'].replace(to_replace = 'R', value = 'Python', inplace=True)

And the resulting Series will be:

0        Python
1    Javascript
2        Python
3          Java
4    Javascript
5        Python
Name: language, dtype: object

Find and Change a sub string in a pandas DataFrame column

In this next recipe, we’ll modify all occurrences of a specific part of a string in a DataFrame column:

# define a string to search for
sub_str = 'Java'

# find the impacted rows
sal_df[sal_df['language'].str.contains(sub_str)]

# Replace the string
sal_df['language'].replace(to_replace= sub_str, value = 'J', regex=True)

The output will be a Series. Note that the parameter regex=True allowed to replace part of a string even if the entire cell value didn’t match.

0     Python
1    Jscript
2     Python
3          J
4    Jscript
5     Python
Name: language, dtype: object

Find and replace multiple values in a pandas column

If we want to match and change several values we will pass a list to the replace() method:

value_lst = ['Istanbul','Tokyo', 'Osaka']
sal_df['office'].replace(to_replace= value_lst , value = 'Asia')

Search and change NAN in pandas DataFrames

We can obviously run the replace(0 function on the DataFrame. To replace all empty cell occurrences we can use the code below

Find in which DataFrame columns we have nan values:

sal_df.isna().any()

The result will be a boolean Series:

month       False
language    False
salary       True
office       True
dtype: bool

It is now clear that we have two columns containing empty nan values. We can replace the nan values by passing a dictionary:

sal_df.fillna({'office' : '', 'salary' : sal_df['salary'].mean()})