In this short tutorial we’ll learn how to troubleshoot issues related one of the most basic tasks when wrangling pandas DataFrames: removing columns off your DataFrame.
Can’t drop pandas columns
Let’s assume that you are working with the following DataFrame:
import pandas as pd month = ['November', 'December', 'October'] language = ['R', 'Python', 'R'] salary = [ 134.0, 142.0, 165.0] data = dict(month = month, office = office, salary = salary) my_df = pd.DataFrame(data=data) print (my_df.head())
This will return the following data set:
Example #1 – specifying the right axis
Let’s assume that we would like to remove the office column. We will use the drop() DataFrame method:
This will return a KeyError exception:
KeyError: "['office'] not found in axis"
A DataFrame has two axes: rows (index) and columns. When we invoke the drop() method, pandas searches by default for the specific index to delete in the rows. If we would like to delete a column we simply add the axis=1 parameter:
my_df.drop('office', axis = 1)
Our Jupyter Notebook (or other Python development environment) will return the following output:
Example #2 – Keeping changes with inplace=True
By default, when dropping rows or columns, the changes to the original DataFrame are not kept. If we would like to keep changes, we can either save the DataFrame as a new object or use the inplace=True parameter.
my_new_df = my_df.drop('office', axis = 1)
my_df.drop('office', axis = 1, inplace=True)
Example # 3 – drop column not working when importing tsv files
Most often, we acquire our data from external files such as text , csv, Excelor json or databases such as MySQL and sqlite.
I have noticed that errors related to dropping columns are prevalent when working with tab separated values (tsv files) .This is puzzling at times as your file includes the column name but you get a KeyError when trying to get rid of it!
You can solve this by specifying the right separator (using the sep parameter) when invoking the read_csv() method. Here is a simple example:
my_df = pd.read_csv('salaries.tsv', sep='\t', usecols = cols)
Example # 4 – failing to drop multiple columns
Another case is when you try to remove multiple columns. Example:
my_df.drop('office', 'salary', axis = 1, inplace=True)
This will raise the following Type Error:
TypeError: drop() got multiple values for argument 'axis'
Fixing this is easy – just pass a list of columns to drop:
my_df.drop(['office', 'salary'], axis = 1, inplace=True)