Today We will learn how to quickly calculate the minimum value between two columns / series in a pandas DataFrame. In most cases, we will use the DataFrame min() to quickly calculate the lowest values across columns and rows.
Example DataFrame
Before getting started, we will build some simple generic sales data:
import pandas as pd
rev_dict = {
'area' : ['B2B', 'B2C', 'Retail', 'Online'],
'Q1' : [187, 211, 364, 534],
'Q2' : [324, 234, 561, 345],
'Q3' : [278, 345, 183, pd.NA],
}
sales= pd.DataFrame (rev_dict)
sales.head()
Here’s the data we will be working with:
area | Q1 | Q2 | Q3 | |
---|---|---|---|---|
0 | B2B | 187 | 324 | 278 |
1 | B2C | 211 | 234 | 345 |
2 | Retail | 364 | 561 | 183 |
3 | Online | 534 | 345 | <NA> |
Find the minimum value between two columns
Let’s assume we would like to focus on Q1 and Q2.
subset = ['area','Q2', 'Q3']
sales[subset]
This returns a subset of our DataFrame. To calculate the minimum of each column we’ll just call the min() method. Note the usage of the numeric_only parameter that allows to compare only numeric values.
sales[subset].min(numeric_only = True)
A Series representing the smallest value of each column is returned.
area B2B Q2 234 Q3 183 dtype: object
Calculate Lowest value between row values
Conversely, if you want to calculate the min value across multiple columns of the same row by using the axis = 1 parameter as shown below:
sales.min(numeric_only = True, axis = 1)
Here’s our series – representing the lowest value per row.
0 187 1 211 2 364 3 345 dtype: int64
Minimum between column and number
Let us now assume that we need to compare a value in a Pandas column or Series with a constant number. The easiest way here is to use numpy np.minimum function. In our case we will compare the Q2 series vs a fixed cost constant.
import numpy as np
fixed_cost = 340
sales['max_cost_rev'] = np.minimum(sales['Q2'], fixed_cost)
Note: in case that you encounter errors importing numpy, read this tutorial.
Difference between a min and a max of a column
Now we would like to calculate the range between each of our DataFrame rows maximum value and the minimal one:
sales['diff_min_max'] = sales.max(numeric_only=True , axis=1) - sales.min(numeric_only=True , axis=1)
And the result will be:
area | Q1 | Q2 | Q3 | diff_min_max | |
---|---|---|---|---|---|
0 | B2B | 187 | 324 | 278 | 137 |
1 | B2C | 211 | 234 | 345 | 23 |
2 | Retail | 364 | 561 | 183 | 197 |
3 | Online | 534 | 345 | <NA> | 189 |