Follow this tutorial to validate whether your DataFrame contains one or multiple colums and add them as needed.
Step 1: Create your DataFrame
We start by importing the pandas library package and defining a simple DataFrame:
import pandas as pd
cols = ['Area', 'Office', 'Sales', 'Expenses']
sales_df = pd.DataFrame(columns = cols )
The cols variable contains four DataFrame columns. We then initialize the DataFrame that is assigned those columns.
Note: Make sure to install the pandas package before importing it into your JupyTer notebook.
We can easily list the DataFrame column index using the following command:
sales_df.columns
This will return the index of columns:
Index(['Area', 'Office', 'Sales', 'Expenses'], dtype='object')
Step 2: Validate whether the pandas column exists
It’s now quite simple to check whether a specific column is part of the index:
col_name = 'Margin'
col_name not in sales_df.columns:
This will return a boolean True as it is clear that we don’t have a column named Margin in our DataFrame.
Step 3: Add new column if not existing
Now we can bring everything together. We will first search for the Margin column in the DataFrame and if it is not found, we will add a column named Margin that subtract two columns (in our case Sales and Expenses).
if col_name not in sales_df.columns:
sales_df[col_name] = sales_df['Sales'] - sales_df['Expenses']
else:
print(' The column already exists in your DataFrame.')
We can now easily add the DataFrame columns:
sales_df.columns
This will return:
Index(['Area', 'Office', 'Sales', 'Expenses', 'Margin'], dtype='object')
Step 4: Adding multiple columns at once if not exists
In this simple example we will look into the case of multiple columns
cols_to_add = ['Date', 'Time']
for col in cols_to_add:
if col not in sales_df.columns:
sales_df[col] = #your value here
else:
print(' The column already exists in your DataFrame.')