How to drop a pandas column if it exists?

The key challenge when automating your data wrangling process is to gracefully handle possible exceptions. As an example, as part of the automation, might need to drop one or more columns from the DataFrame (provided they exists). In this tutorial we’ll learn how to check if columns exist in our DataFrame, and if so, remove them from our data.

Create sample Data

We will first import the pandas module and initialize a DataFrame with example data that you can use to follow along.

import pandas as pd

quarter = [1,2,3,4]
sales = [248, 417, 387, 411] 
expenses = [207, 312, 332, 299]

performance = pd.DataFrame(dict (quarter = quarter, sales = sales, expenses = expenses))
performance.head()

Here’s our data:

quartersalesexpenses
01248207
12417312
23387332
34411299

Let’s look at the column index:

df_cols = performance.columns
print(cols)

This will return:

Index(['quarter', 'sales', 'expenses'], dtype='object')

Drop column from pandas DataFrame

We will first define a list of colums to drop (one or multiple). We will then loop through the list and remove the column from the DataFrame if it exists.


cols_to_drop = ['margin', 'expenses']

for col in cols_to_drop :
    if col in df_cols:
        performance.drop(col, axis=1, inplace=True)
        print(f'Column {col} dropped from DataFrame.')
    else:
        print(f'Column {col} doesn\'t exist in DataFrame.')

This will be the output:

Column margin doesn't exist in DataFrame.
Column expenses dropped from DataFrame.

Remove a pandas column and ignoring errors

A somewhat different way to approach this task is to use the DataFrame drop() method with the parameter errors=’ignore.

cols_to_drop = ['margin', 'expenses']
performance.drop(cols_to_drop, axis=1, errors = 'ignore')

When running this snippet, pandas will normally raise an error as the margin column is not part of the column index and stop the execution of the code. Using the errors = ‘ignore’ parameter will ensure that any errors when dropping the columns are ignored and the code will work for any column names – including those not in the DataFrame.

Remove a column which name contains a string

In the next example we will use a list comprehension to create a list of columns to drop which name contains a string . This is typically useful in case you have lots of unnamed columns then importing a csv file for example.

#find columns containing the string 'es
cols_to_drop = [col for col in df_cols if col.find('es') > -1]

print (cols_to_drop)

This will return:

['sales', 'expenses']

You can then drop those as shown above or using the following snippet:

performance.drop(cols_to_drop, axis=1)

Drop a column containing specific value

Another case is to drop a column if it contains specific values:

cols_to_drop  = [col for col in df_cols if df[col].str.contains('your_string')]

Additional learning

How to cast a pandas DataFrame column to integer data type?