Today we would like to show how to change cell values based on specific conditions in a pandas DataFrame.
We will start by creating some example data, that you can use if order to follow along with this example:
rev_dict = {
'date' : pd.date_range('1/1/2023', periods=4, freq='MS') ,
'team': ['B', 'A', 'A', pd.NA],
'direct_sales' : [303, 234, 302, 461],
'indirect_sales' : [199, 175, 108, 142],
}
revenue = pd.DataFrame (rev_dict)
print( revenue.head())
This will return the following DataFrame rows:
date | team | direct_sales | indirect_sales | |
---|---|---|---|---|
0 | 2023-01-01 | B | 303 | 199 |
1 | 2023-02-01 | A | 234 | 175 |
2 | 2023-03-01 | A | 302 | 108 |
3 | 2023-04-01 | <NA> | 461 | 142 |
Change empty / missing values in our columns
We clearly see that this last DataFrame rows contain an empty value. We can use the fillna() function to replace values in the team column.
revenue['team'] = revenue['team'].fillna('A')
Modify value based on condition
Next we will go ahead and adjust the value of specific row cell based on a condition.
# define condition
rows = (revenue.indirect_sales > 150)
# update dataframe
revenue.loc [rows, 'team'] = 'B'
print ( revenue.head() )
Here’s our DataFrame:
date | team | direct_sales | indirect_sales | |
---|---|---|---|---|
0 | 2023-01-01 | B | 303 | 199 |
1 | 2023-02-01 | B | 234 | 175 |
2 | 2023-03-01 | A | 302 | 108 |
3 | 2023-04-01 | A | 461 | 142 |
Apply a function to set values based on multiple conditions
In this example, we will define a simple function that will determine values based on row values. We will then apply the function as needed to the DataFrame column.
# define a Python function
def set_team(r):
if (r['indirect_sales'] > 140) | (r['direct_sales'] > 400):
return 'B'
else:
return 'A'
# apply function to column
revenue['team'] = revenue.apply(lambda r: set_team(r), axis=1)
Update values in multiple rows of a column
Next case is to adjust values of all cells in a column by multiplying the pandas column values by a constant. In this example, we will augment the indirect_sales column by 15%.
revenue.indirect_sales = revenue.indirect_sales *1.15
Next Learning
How to slice all columns but the first in a pandas DataFrame?