When importing data from csv (comma separated values) files into your dataframe, you might have noticed that some of the header column labels are unnamed. Some of the columns could be empty as well. After looking carefully into your DataFrame you decide to drop the unnamed columns.
Let’s look into how we can achieve that. We will start by creating a simple example DataFrame that we’ll use throughout the tutorial.
import pandas as pd
# read data from a csv file into a DataFrame
sales = pd.read_csv('sales.csv')
#look into your data
sales.head()
Here’s the result – note the unnamed files:
Unnamed: 0 | Person | Sales | Unnamed: 3 | |
---|---|---|---|---|
0 | 123 | Joe | 50 | 60 |
1 | 866 | Sam | 70 | 80 |
2 | 855 | Tim | 80 | 100 |
Drop unnamed columns in Pandas
We’ll use the DataFrame.drop() method to remove one or multiple rows or columns from a DataFrame. But first, we need to get those columns without header labels.
unnamed_cols = sales.columns.str.contains('Unnamed')
unnamed_cols
This will return an array of boolean items, which we’ll use to filter our DataFrame.
array([ True, False, False, True])
Next we can call the drop() method:
sales_new = sales.drop(sales[sales.columns[unnamed_cols]], axis=1)
sales_new.head()
Note the usage of the parameter axis=1 to drop from the column index.
This will effectively remove all unnamed columns off our DataFrame as shown below:
Person | Sales | |
---|---|---|
0 | Joe | 50 |
1 | Sam | 70 |
2 | Tim | 80 |
An alternative way to accomplish a similar result is using the iloc accesor:
sales_new = sales.iloc[:,~ unnamed_cols]
Explanation: the iloc accessor gets two parameters: an index of rows and one of columns. In our example we use the expression ~ unnamed_cols to pass the named columns indexes to the iloc accessor and consequently filter out the unnamed ones.