To rank rows in a group by pandas object use the following method:
your_df.groupby('your_col')['your_variable_col].rank()
Create Example DataFrame
We will start by importing the pandas library into our Python development environment and constructing a very simple DataFrame.
import pandas as pd
month = ['March', 'March', 'June', 'October', 'March', 'June']
office = ['Hong Kong', 'Toronto', 'Paris', 'Paris', 'Osaka', 'Paris']
interviews = [195, 225, 186, 180, 185, 156]
hiring_data = dict(month = month, office = office, interviews = interviews)
hiring = pd.DataFrame(data=hiring_data)
Here are our DataFrame rows:
hiring
month | office | interviews | |
---|---|---|---|
0 | March | Hong Kong | 195 |
1 | March | Toronto | 225 |
2 | June | Paris | 186 |
3 | October | Paris | 180 |
4 | March | Osaka | 185 |
5 | June | Paris | 156 |
Aggregate and rank within a group
In this first example, we would like to aggregate our data by month, then rank the values within each group by the number of interviews.
First let’s see how many records have been aggregated into each group
hiring.groupby('month')['interviews'].size().reset_index()
month | interviews | |
---|---|---|
0 | June | 2 |
1 | March | 3 |
2 | October | 1 |
Ranking each of the records is easy. Pandas returns a Series showing the rank of every record in its group.
relative_rank = hiring.groupby('month')['interviews'].rank(ascending= False)
We can assign the Series to the Dataframe as a new column:
hiring.assign(relative_rank = relative_rank )
Here’s our Dataframe:
month | office | interviews | relative_rank | |
---|---|---|---|---|
0 | March | Hong Kong | 195 | 2.0 |
1 | March | Toronto | 225 | 1.0 |
2 | June | Paris | 186 | 1.0 |
3 | October | Paris | 180 | 1.0 |
4 | March | Osaka | 185 | 3.0 |
5 | June | Paris | 156 | 2.0 |
Group by multiple columns and rank
In the same fashion we are able to aggregate our DataFrame by multiple columns and determine the relative ranking:
hiring.groupby(['month', 'office'])['interviews'].rank(ascending= False)
As shown above, the rank method returns a pandas Series.