How to modify values in a Pandas DataFrame?

As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.

In this short tutorial we would like to discuss the basics of replacing/changing/updating manipulation inside Pandas DataFrames.

Replace specific data in Pandas DataFrames

In this tutorial we will look into several cases:

  1. Replacing values in an entire DataFrame
  2. Updating values in specific cells by index
  3. Changing values in an entire DF row
  4. Replace cells content according to condition
  5. Modify values in a Pandas column / series.

Creating example data

Let’s define a simple survey DataFrame:


# Import DA packages

import pandas as pd
import numpy as np

# Create test Data

survey_dict = {             'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'],
                          'salary': [120,85,95,80,90],
                          'num_candidates': [18,22,34,10, np.nan]
                                       }

# Initialize the survey DataFrame
survey_df = pd.DataFrame(survey_dict)

# Review our DF
languagesalarynum_candidates
0Python12018.0
1Java8522.0
2Haskell9534.0
3Go8010.0
4C++90NaN

1. Set cell values in the entire DF using replace()

We’ll use the DataFrame replace method to modify DF sales according to their value. In the example we’ll replace the empty cell in the last row with the value 17.

survey_df.replace(to_replace= np.nan, value = 17, inplace=True )
survey_df.head()

Note: The replace method is pretty self explanatory, note the usage of the inplace=True to persist the updates in the DataFrame going forward.

Here’s the output we will get:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

Note: that we could accomplish the same result with the more elegant fillna() method.

survey_df.fillna(value = 17, axis = 1)

Follow up learning: We can also change empty values to strings.

2. Change value of cell content by index

To pick a specific row index to be modified, we’ll use the iloc indexer.

survey_df.iloc[0].replace(to_replace=120, value = 130)

Our output will look as following:

language          Python
salary               130
num_candidates      18.0
Name: 0, dtype: object

Note: We could also use the loc indexer to update one or multiple cells by row/column label. The code below sets the value 130 the first three cells or the salary column.

survey_df.loc[[0,1,2],'salary'] = 130

3. Modify multiple cells in a DataFrame row

Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:

survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))

4. Update cells based on conditions

In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.

We’ll first slide the DataFrame and find the relevant rows to update:

cond = survey_df['salary'] < 90

We’ll then pass the rows and columns labels to be updated into the loc indexer:

survey_df.loc[cond,'salary'] = 90
survey_df

Here’s our output:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.

cond = (survey_df['salary'] < 90) | (survey_df['num_candidates'] <18)

5. Set and Replace values for an entire Pandas column / Series.

Let’s now assume that we would like to modify the num_candidates figure for all observations in our DataFrame. That’s fairly easy to accomplish.

survey_df['num_candidates'] = 25

Let’s now assume that management has decided that all candidates will be offered an 20% raise. We can easily change the salary column using the following Python code:

survey_df['salary'] = survey_df['salary'] * 1.2

6. Replace string in Pandas DataFrame column

We can also replace specific strings in a DataFrame column / series using the syntx below:

survey_df['language'] = 
survey_df['language'].replace(to_replace = 'Java', value= 'Go')

Follow up learning

How to replace strings or part of strings in pandas columns?