In this pandas Data Analysis tutorial we will learn how to set a column as the index of your pandas DataFrame. We will extensively use the set_index() DataFrame method and look into several prevalent use cases that you might encounter as part of your data wrangling process.
As we typically do, we’ll invoke the pandas library start by creating a simple example DataFrame that you can use to follow along this tutorial.
import pandas as pd
sales_dict = {
'channel' : ['B2B', 'B2C', 'Retail', 'Online'],
'direct_sales' : [166, 211, 355, 534],
'tele_sales' : [112, 321, 223, 166],
'date' : pd.date_range('1/1/2020', periods=4, freq='YS')
}
perf_df = pd.DataFrame (sales_dict)
Here are our DataFrame contents:
channel | direct_sales | tele_sales | date | |
---|---|---|---|---|
0 | B2B | 166 | 112 | 2020-01-01 |
1 | B2C | 211 | 321 | 2021-01-01 |
2 | Retail | 355 | 223 | 2022-01-01 |
3 | Online | 534 | 166 | 2023-01-01 |
The DataFrame index consists of the the default number range.
Set first column as index
If we want to set the first (or for practical purposes – any) column as the DataFrame index, we’ll pass the column names to the set_index() method:
perf_df_1 = perf_df.set_index('channel')
print(perf_df_1)
direct_sales | tele_sales | date | |
---|---|---|---|
channel | |||
B2B | 166 | 112 | 2020-01-01 |
B2C | 211 | 321 | 2021-01-01 |
Retail | 355 | 223 | 2022-01-01 |
Online | 534 | 166 | 2023-01-01 |
We could also make column 0 the index with the following snippet:
perf_df_1 = perf_df.set_index(perf_df.columns[0])
Make column an index but keep column
If we would like to define a column as index, while keeping it in the columns, we use the drop=False parameter.
#keep column
perf_df_3 = perf_df.set_index('channel', drop = False)
perf_df_3
Set a column as datetime index
As we created the date column using the pd.date_range function, we can easily set as a datetime index:
perf_df_4 = perf_df.set_index('date')
# print the index values
perf_df_4.index
This will return the following values:
DatetimeIndex(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'], dtype='datetime64[ns]', name='date', freq=None)
Rename an index column
Last case is that after we created the index column, we would like to rename it. This is accomplished in the following way:
# set column as index
perf_df_2 = perf_df.set_index('channel')
# name the index column
perf_df_2.index.names =['idx']