In this tutorial we’ll explain the mechanics of change column names in pandas DataFrames so that you can troubleshoot most issues you might encounter when defining new column header for your data after loading data into a DataFrame from a text or csv file or other sources.
Example DataFrame for setting column names
We will first import the pandas library into your Python development environment and create a very simple DataFrame that we can use to manipulate its column names. Feel free to use the data to follow along this tutorial.
import pandas as pd
sales_dict = {
'biz_domain' : ['B2B', 'B2C', 'Online'],
'biz_sales' : [187, 211, 364],
'biz_expenses' : [125, 162, 174]
}
#initialize DataFrame from dictionary
revenue_df = pd.DataFrame (sales_dict)
print (revenue_df)
This will render the following:
biz_domain | biz_sales | biz_expenses | |
---|---|---|---|
0 | B2B | 187 | 125 |
1 | B2C | 211 | 162 |
2 | Online | 364 | 174 |
Get DataFrame column names into a list
We can easily access the column index using the DataFrame columns property:
#will return the column index
revenue_df.columns
# convert the column index to list
df_cols = revenue_df.columns.to_list()
print (df_cols)
['biz_domain', 'biz_sales', 'biz_expenses']
Set DataFrame column names from list
Next, we will go ahead and rename the columns with values in a Python list. Specifically we want to get rid of the biz_ prefix on each column name. We’ll use a simple list comprehension.
new_cols = [col_name[4:] for col_name in df_cols]
# apply the new column names to the DataFrame:
revenue_df.columns = new_cols
print (revenue_df.columns)
Here’s the new column index:
Index(['domain', 'sales', 'expenses'], dtype='object')
Change column names by index position
If we are interested to use the column index position to modify our column names we can use the rename DataFrame method as shown in the following code:
cols = revenue_df.columns
revenue_df.rename (columns = {cols[0]:'domain',cols[1]:'sales',cols[2]:'expenses'}, inplace=True)
Set column names on empty DataFrame
We can easidly create an empty DataFrame object using the pd.DataFrame constructor. We can then pass a list to define the df column index:
new_rev_df = pd.DataFrame(columns = new_cols)
Rename the DataFrame index
We’ll first set a column as an index:
revenue_df.set_index('domain', inplace=True)
Now we are able to easily set the index naming:
revenue_df.index.name = 'area'
print (revenue_df)
And here’s our modified DataFrame:
sales | expenses | |
---|---|---|
area | ||
B2B | 187 | 125 |
B2C | 211 | 162 |
Online | 364 | 174 |
Convert all column names to lower case
If you happen to have mixed lower / upper case and want to standardize your column names look and feel you can use the pandas str accessor and convert to lower case, upper case or capitalize your column names.
revenue_df.columns = revenue_df.columns.str.lower()
revenue_df.columns = revenue_df.columns.str.upper()
revenue_df.columns = revenue_df.columns.str.capitalize()