How to create a column as index of a pandas DataFrame?

In this pandas Data Analysis tutorial we will learn how to set a column as the index of your pandas DataFrame. We will extensively use the set_index() DataFrame method and look into several prevalent use cases that you might encounter as part of your data wrangling process.

As we typically do, we’ll invoke the pandas library start by creating a simple example DataFrame that you can use to follow along this tutorial.

import pandas as pd

sales_dict = {
             'channel' : ['B2B', 'B2C', 'Retail', 'Online'],
             'direct_sales' : [166, 211, 355, 534],
            'tele_sales' : [112, 321, 223, 166],
            'date' : pd.date_range('1/1/2020', periods=4, freq='YS')
}

perf_df = pd.DataFrame (sales_dict)

Here are our DataFrame contents:

channeldirect_salestele_salesdate
0B2B1661122020-01-01
1B2C2113212021-01-01
2Retail3552232022-01-01
3Online5341662023-01-01

The DataFrame index consists of the the default number range.

Set first column as index

If we want to set the first (or for practical purposes – any) column as the DataFrame index, we’ll pass the column names to the set_index() method:


perf_df_1 = perf_df.set_index('channel')
print(perf_df_1)
direct_salestele_salesdate
channel
B2B1661122020-01-01
B2C2113212021-01-01
Retail3552232022-01-01
Online5341662023-01-01

We could also make column 0 the index with the following snippet:

perf_df_1 = perf_df.set_index(perf_df.columns[0])

Make column an index but keep column

If we would like to define a column as index, while keeping it in the columns, we use the drop=False parameter.

#keep column
perf_df_3 = perf_df.set_index('channel', drop = False)

perf_df_3

Set a column as datetime index

As we created the date column using the pd.date_range function, we can easily set as a datetime index:

perf_df_4 = perf_df.set_index('date')

# print the index values
perf_df_4.index

This will return the following values:

DatetimeIndex(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'], dtype='datetime64[ns]', name='date', freq=None)

Rename an index column

Last case is that after we created the index column, we would like to rename it. This is accomplished in the following way:

# set column as index
perf_df_2 = perf_df.set_index('channel')

# name the index column
perf_df_2.index.names =['idx']

Suggested follow up learning

How to subtract date times objects in pandas DataFrames?