How to change values in a pandas DataFrame column?

As part of your data cleaning work you might need a replace one or multiple substrings in a pandas DataFrame. In today’s tutorial we will learn how to search for specific column values in your DataFrame that meet certain criteria and change them to other strings.

Example data

Let’s start by initializing a DataFrame:

import pandas as pd

month = [1,2,3,3,2]
language = ['R', 'Java', 'Javascript', 'Javascript', 'Python']
office = ['NYC-Midtown', 'Osaka', 'NYC-Midtown', 'Bangkok', 'Toronto']
salary = [144.00, 194.00, 131.00, 131.00, 107.00]
interviews = dict(month = month, office = office, language = language, salary = salary)
hr_df = pd.DataFrame(data=interviews)
hr_df.head()

Here’s our data:

monthofficelanguagesalary
01NYC-MidtownR144.0
12OsakaJava194.0
23NYC-MidtownJavascript131.0
33BangkokJavascript131.0
42TorontoPython107.0

Replace string in pandas column

First case will be to change specific strings in a DataFrame column and replace those with another value.


hr_df['language'] = hr_df['language'].replace(to_replace = 'Javascript', value= 'JS')


hr_df['language']

This will render the following Series / column:

0         R
1      Java
2        JS
3        JS
4    Python
Name: language, dtype: object

Change cell value based on a condition or other column value

Next example is to modify values of a column according to specific if condition that is evaluated base on values in other single or multiple columns. Let us assume for example that we would like to apply a 25% raise to the salary offered in our NYC-Midtown office. This would look as following:

hr_df.loc[hr_df['office'] == "NYC-Midtown", 'salary'] = hr_df['salary'] * 1.25

hr_df.head()
monthofficelanguagesalary
01NYC-MidtownR180.00
12OsakaJava194.00
23NYC-MidtownJavascript163.75
33BangkokJavascript131.00
42TorontoPython107.00

Set column values according to a dictionary map

In our next example we would like to modify column values according to a mapping dictionary object that we will pass to the replace() method as shown below. In our case we just mapped the month values to month names.

hr_df.replace (to_replace = {1: 'January', 2 : 'February', 3:'March'})

Our DataFrame will look as following:

monthofficelanguagesalary
0JanuaryNYC-MidtownR180.00
1FebruaryOsakaJava194.00
2MarchNYC-MidtownJavascript163.75
3MarchBangkokJavascript131.00
4FebruaryTorontoPython107.00

Update column values based on a list

In this last example for today, we will modify the contents of a DataFrame column based on a Python list. We’ll first define the list object, then drop the existing column. Last step will be to insert a new column based on the list values.

lang_lst = ['Python', 'JavaScript', 'Python', 'R', 'Julia']
hr_df_1 = hr_df.drop('language', axis=1)
hr_df_1.insert(2, column = 'language', value = lang_lst)
hr_df_1

Related learning

How to cast a pandas DataFrame column values to integer type?