There might be cases in which you’ll need to perform calculations across several columns in your Pandas DataFrame. In this post we’ll learn how to easily use the max() DataFrame to retrieve the maximum value across two or more Pandas column.
Example DataFrame
We’ll get started with a very simple DataFrame that you can use in your Python environment in order to follow along with this example.
import pandas as pd
sales_dict = {'Year':['2019', '2020', '2021', '2022'], 'sales_Q1':[150, 250, 175, 190], 'sales_Q2': [ 160, 270, 220, 185], 'sales_Q3' : [ 285, 210, 290, 400] }
sales = pd.DataFrame (sales_dict)
sales
Here’s our data:
Year | sales_Q1 | sales_Q2 | sales_Q3 | |
---|---|---|---|---|
0 | 2019 | 150 | 160 | 285 |
1 | 2020 | 250 | 270 | 210 |
2 | 2021 | 175 | 220 | 290 |
3 | 2022 | 190 | 185 | 400 |
Calculate the maximum value of two columns
As a simple example, we will calculate for example the maximum of the Q1 and Q2 columns.We first need to subset our DataFrame and apply the max() function. There are several ways to subset a DataFrame.
Using the brackets notation:
subset = sales[['Q1', 'Q2']]
Using the loc and iloc indexers:
# using index location
subset = sales.iloc[:,1:3]
#or using loc
subset = sales.loc[:,['Q1', 'Q2']]
Note that you can use the same method to calculate the maximum across any number of columns. Now we’ll find the maximal value:
sales['max_two_cols'] = subset.max(axis=1)
Note the usage of the axis=1 parameter to ensure that the max calculations is done across the columns (instead of across the DataFrame rows).
Here’s our DataFrame:
Year | sales_Q1 | sales_Q2 | sales_Q3 | max_q1_q2 | |
---|---|---|---|---|---|
0 | 2019 | 150 | 160 | 285 | 160 |
1 | 2020 | 250 | 270 | 210 | 270 |
2 | 2021 | 175 | 220 | 290 | 220 |
3 | 2022 | 190 | 185 | 400 | 190 |
Maximum of Pandas column and constant
In a similar fashion we can find the maximum value between a constant and a column. We’ll compare between the Q1 figure and a quarterly target expense.
Method A – using numpy.maximum()
import numpy as np
q_expense = 210
sales['max_q1_q_expense'] = np.maximum(sales['sales_Q1'], q_expense)
Method B – using two Pandas columns
sales['q_expense'] = 210
sales['max_q1_q_expense'] = sales[['sales_Q1', 'q_expense']].max(axis=1)
And our DataFrame will look as following:
Year | sales_Q1 | sales_Q2 | sales_Q3 | max_q1_q2 | q_expense | max_q1_q_expense | |
---|---|---|---|---|---|---|---|
0 | 2019 | 150 | 160 | 285 | 160 | 210 | 210 |
1 | 2020 | 250 | 270 | 210 | 270 | 210 | 250 |
2 | 2021 | 175 | 220 | 290 | 220 | 210 | 210 |
3 | 2022 | 190 | 185 | 400 | 190 | 210 | 210 |