How to add one or multiple pandas columns if doesn’t exist?

Follow this tutorial to validate whether your DataFrame contains one or multiple colums and add them as needed.

Step 1: Create your DataFrame

We start by importing the pandas library package and defining a simple DataFrame:

import pandas as pd

cols = ['Area', 'Office', 'Sales', 'Expenses']
sales_df = pd.DataFrame(columns = cols )

The cols variable contains four DataFrame columns. We then initialize the DataFrame that is assigned those columns.

Note: Make sure to install the pandas package before importing it into your JupyTer notebook.

We can easily list the DataFrame column index using the following command:

sales_df.columns

This will return the index of columns:

Index(['Area', 'Office', 'Sales', 'Expenses'], dtype='object')

Step 2: Validate whether the pandas column exists

It’s now quite simple to check whether a specific column is part of the index:

col_name  = 'Margin'
col_name not in sales_df.columns:

This will return a boolean True as it is clear that we don’t have a column named Margin in our DataFrame.

Step 3: Add new column if not existing

Now we can bring everything together. We will first search for the Margin column in the DataFrame and if it is not found, we will add a column named Margin that subtract two columns (in our case Sales and Expenses).

if col_name not in sales_df.columns:
    sales_df[col_name] = sales_df['Sales'] - sales_df['Expenses']
else:
    print(' The column already exists in your DataFrame.')
    

We can now easily add the DataFrame columns:

sales_df.columns

This will return:

Index(['Area', 'Office', 'Sales', 'Expenses', 'Margin'], dtype='object')

Step 4: Adding multiple columns at once if not exists

In this simple example we will look into the case of multiple columns

cols_to_add = ['Date', 'Time']
for col in cols_to_add:
    if col not in sales_df.columns:
        sales_df[col] = #your value here
    else:
        print(' The column already exists in your DataFrame.')

Related Learning

How to select rows by row and column values?