How to modify values in a Pandas DataFrame?

cell, column, row, based on condition, replace, update, series, index

As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.

In this short tutorial we would like to discuss the basics of replacing/changing/updating manipulation in Pandas DataFrames.

Replace existing data in Pandas DataFrames

We’ll look into several cases:

  1. Replacing values in an entire DF.
  2. Updating values in specific cells by index
  3. Changing values in an entire DF row
  4. Replace cells content according to condition
  5. Set values for an entire column / series.

Creating the data

Let’s define a simple survey DataFrame:


# Import DA packages

import pandas as pd
import numpy as np

# Create test Data

survey_dict = {             'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'],
                          'salary': [120,85,95,80,90],
                          'num_candidates': [18,22,34,10, np.nan]
                                       }

# Initialize the survey DataFrame
survey_df = pd.DataFrame(survey_dict)

# Review our DF
languagesalarynum_candidates
0Python12018.0
1Java8522.0
2Haskell9534.0
3Go8010.0
4C++90NaN

1. Set cell values in the entire DF using replace()

We’ll use the DataFrame replace method to modify DF sales according to their value. In the example we’ll replace the null value in the last row. Note that we could accomplish the same result with the more elegant fillna() method.

survey_df.replace(to_replace= np.nan, value = 17, inplace=True )
survey_df

Note: The replace method is pretty self explanatory, note the usage of inplace=True to persist the updates in the DataFrame going forward.

Here’s the output:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

2. Change value of cell content by index

To pick a specific row index to be modified, we’ll use the iloc indexer. Note that we could also use the loc indexer to update the cell by row/column label.

survey_df.iloc[0].replace(to_replace=120, value = 130)

Our output:

language          Python
salary               130
num_candidates      18.0
Name: 0, dtype: object

3. Modify multiple cells in a DataFrame row

Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:

survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))

4. Update cells based on conditions

In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.

We’ll first slide the DataFrame and find the relevant rows to update:

cond = survey_df['salary'] < 90

We’ll then pass the rows and columns labels to be updated into the loc indexer:

survey_df.loc[cond,'salary'] = 90
survey_df

Here’s our output:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.


cond = (survey_df['salary'] < 90) | (survey_df['num_candidates'] <18)

5. Replace values for an entire column

Let’s now assume that we would like to modify the num_candidates figure for all the DF entries. That’s fairly easy:

survey_df['num_candidates'] = 25