How to group and rank by multiple columns in pandas?

To rank rows in a group by pandas object use the following method:

your_df.groupby('your_col')['your_variable_col].rank()

Create Example DataFrame

We will start by importing the pandas library into our Python development environment and constructing a very simple DataFrame.

import pandas as pd

month = ['March', 'March', 'June', 'October', 'March', 'June']
office = ['Hong Kong', 'Toronto', 'Paris', 'Paris', 'Osaka', 'Paris']
interviews = [195, 225, 186, 180, 185, 156]
hiring_data = dict(month = month, office = office, interviews = interviews)
hiring = pd.DataFrame(data=hiring_data)

Here are our DataFrame rows:

hiring
monthofficeinterviews
0MarchHong Kong195
1MarchToronto225
2JuneParis186
3OctoberParis180
4MarchOsaka185
5JuneParis156

Aggregate and rank within a group

In this first example, we would like to aggregate our data by month, then rank the values within each group by the number of interviews.

First let’s see how many records have been aggregated into each group

hiring.groupby('month')['interviews'].size().reset_index()
monthinterviews
0June2
1March3
2October1

Ranking each of the records is easy. Pandas returns a Series showing the rank of every record in its group.

relative_rank = hiring.groupby('month')['interviews'].rank(ascending= False)

We can assign the Series to the Dataframe as a new column:

hiring.assign(relative_rank = relative_rank )

Here’s our Dataframe:

monthofficeinterviewsrelative_rank
0MarchHong Kong1952.0
1MarchToronto2251.0
2JuneParis1861.0
3OctoberParis1801.0
4MarchOsaka1853.0
5JuneParis1562.0

Group by multiple columns and rank

In the same fashion we are able to aggregate our DataFrame by multiple columns and determine the relative ranking:

hiring.groupby(['month', 'office'])['interviews'].rank(ascending= False)

As shown above, the rank method returns a pandas Series.