How to modify values in a Pandas DataFrame?

As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.

In this short tutorial we would like to discuss the basics of replacing/changing/updating manipulation inside Pandas DataFrames.

Replace specific data in Pandas DataFrames

We’ll look into several cases:

  1. Replacing values in an entire DF.
  2. Updating values in specific cells by index
  3. Changing values in an entire DF row
  4. Replace cells content according to condition
  5. Modify values in a Pandas column / series.

Creating the data

Let’s define a simple survey DataFrame:

# Import DA packages

import pandas as pd
import numpy as np

# Create test Data

survey_dict = {             'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'],
                          'salary': [120,85,95,80,90],
                          'num_candidates': [18,22,34,10, np.nan]

# Initialize the survey DataFrame
survey_df = pd.DataFrame(survey_dict)

# Review our DF

1. Set cell values in the entire DF using replace()

We’ll use the DataFrame replace method to modify DF sales according to their value. In the example we’ll replace the null value in the last row. Note that we could accomplish the same result with the more elegant fillna() method.

survey_df.replace(to_replace= np.nan, value = 17, inplace=True )

Note: The replace method is pretty self explanatory, note the usage of inplace=True to persist the updates in the DataFrame going forward.

Here’s the output:


2. Change value of cell content by index

To pick a specific row index to be modified, we’ll use the iloc indexer. Note that we could also use the loc indexer to update the cell by row/column label.

survey_df.iloc[0].replace(to_replace=120, value = 130)

Our output:

language          Python
salary               130
num_candidates      18.0
Name: 0, dtype: object

3. Modify multiple cells in a DataFrame row

Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:

survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))

4. Update cells based on conditions

In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.

We’ll first slide the DataFrame and find the relevant rows to update:

cond = survey_df['salary'] < 90

We’ll then pass the rows and columns labels to be updated into the loc indexer:

survey_df.loc[cond,'salary'] = 90

Here’s our output:


Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.

cond = (survey_df['salary'] < 90) | (survey_df['num_candidates'] <18)

5. Set and Replace values for an entire Pandas column / Series.

Let’s now assume that we would like to modify the num_candidates figure for all the DF entries. That’s fairly easy:

survey_df['num_candidates'] = 25

Let’s now assume that all candidates will be paid a 20% raise. We can easily replace the values in the salary column using the following Python code:

survey_df['salary'] = survey_df['salary'] * 1.2