How to drop one or multiple columns in a Pandas Dataframe?

In today’s Data Wrangling tutorial we’ll learn how to manipulate Pandas DataFrames by removing one or more columns from them.

In all examples we’ll use the DataFrame.drop() method, to subset our data so we can focus our analysis.

Define and load example dataset

We’ll start by loading the Python Pandas library and creating our dataset

# drop multiple columns Python DataFrame example

import pandas as pd

inter_dict = {'language': [ 'Python', 'R', 'Scala', 'Java', 'SQL'],
             'salary':[120, 100, 90, 87, 78],
             'candidates' : [13,5,4,6,8],
             'city' : ['NYC', 'NYC', 'SFO', 'CHI', 'STL']}

# create the DataFrame
interviews = pd.DataFrame(data=inter_dict)
interviews.head()

Here’s our data:

Dropping multiple DataFrame columns by name / label

First case is to remove specific columns (one ore multiple), simply by passing their labels to the DataFrame.drop() method.

# define list of columns
cols = ['candidates', 'city']

# drop columns and assign to a new DataFrame
iv1=interviews.drop(columns = cols)
iv1.head()

Here’s our filtered DataFrame.

Important note: Note that we have assigned the filtered DataFrame into the iv1 variable that represents a new DataFrame. If you would like to modify the original interviews DataFrame permanently, you would use the inplace=True parameter:

interviews.drop(columns = cols,inplace=True)

Removing columns by label using df.loc indexer

Alternatively, you can accomplish a similar result by using the df.loc indexer.

cols = interviews.loc[:,['candidates', 'city']]

iv2=interviews.drop(cols, axis=1)

iv2.head()

Drop Pandas columns by index / index range

You can also define the multiple columns to drop using their index using the df.iloc indexer/

# drop by index with iloc

cols_idx = interviews.iloc[:, 2:4]
iv4 = interviews.drop(cols_idx, axis = 1 )
iv4.head()

Note: we use the axis=1 parameter if we want to ensure that columns and not rows as dropped. The reason is that by default, axis=0 (the Dataframe rows) is assumed.

Filtering out columns not in a list

Instead of pssing a list of columns to drop, youmight want to pass a list of columns to keep for further analysis:

# drop columns not in list or except
cols_keep = ['language']
cols_drop = [x for x in interviews.columns if x not in cols_keep]
iv5=interviews.drop(columns = cols)
iv5.head()

Drop specific DataFrame columns not meeting a condition

Last example for today is to keep specific columns which label contains a certain string pattern.

cols = interviews.columns.str.contains('c')
iv5 = interviews.loc[:,cols]

Next Learning:

If you are starting out with Pandas you might want also to take a look at our tutorial on adding new columns to your Python DataFrame.