In today’s tutorial you will learn how to order the values of your DataFrame by column values. We will also learn how you can rearrange your DataFrame according to specific column order.
Example Data
First off, we will import the pandas library into our Python development environment and create a sample DataFrame that you can use to follow along this tutorial.
import pandas as pd
revenue_dict = {
'area' : [ 'Retail', 'Online', 'B2B', 'B2C'] ,
'indirect' : [324, 324, 561, 345],
'direct' : [187, 211, 364, 534],
'distributors' : [278, 345, 183, 765],
'time' : pd.date_range(start='1/1/2019', periods=4, freq = 'y')}
revenue_df = pd.DataFrame (revenue_dict)
print(revenue_df)
Here’s is the resulting data:
area | indirect | direct | distributors | time | |
---|---|---|---|---|---|
0 | Retail | 324 | 187 | 278 | 2019-12-31 |
1 | Online | 324 | 211 | 345 | 2020-12-31 |
2 | B2B | 561 | 364 | 183 | 2021-12-31 |
3 | B2C | 345 | 534 | 765 | 2022-12-31 |
Sort pandas DataFrame by column value
We can easily order the DataFrame rows by values of a specific column using the sort_values() DataFrame method.
revenue_df.sort_values(by='area')
The DataFrame rows are sorted by the area value in ascending order (default):
area | indirect | direct | distributors | time | |
---|---|---|---|---|---|
2 | B2B | 561 | 364 | 183 | 2021-12-31 |
3 | B2C | 345 | 534 | 765 | 2022-12-31 |
1 | Online | 324 | 211 | 345 | 2020-12-31 |
0 | Retail | 324 | 187 | 278 | 2019-12-31 |
Sort pandas Date columns
Ordering date values is relatively simple. In this case we’ll re-arrange the values in the time column in descending order:
revenue_df.sort_values(by='time', ascending= False)
Order DataFrame values by multiple columns
We are able to pass a list of columns to the sort() method and order our DataFrame accordingly:
revenue_df.sort_values(by=['direct', 'distributors'])
Sort DataFrame columns by column name
We can rearrange the DataFrame column order very easily.
Let’s get started by printing the current column index:
print ("The DataFrame column sequence is: " + str (revenue_df.columns))
This will return the following result.
The DataFrame column sequence is: Index(['area', 'indirect', 'direct', 'distributors', 'time'], dtype='object')
We’ll now reorder the column sequence alphabetically in an ascending order:
revenue_df.columns = revenue_df.columns.sort_values()
And check again the column index sequence to receive the following result:
print ("The DataFrame column sequence is: " + str (revenue_df.columns))
The DataFrame column sequence is: Index(['area', 'direct', 'distributors', 'indirect', 'time'], dtype='object')
Note that you can rearrange the column sequence in additional ways for example:
order_cols = ['direct', 'time', 'indirect', 'distributors', 'area']
revenue_df.columns = order_cols
Additional Learning
How to calculate the average value in a Pandas DataFrame column?