In today’s tutorial we’ll learn how you can drop one or multiple columns based on predefined conditions from your DataFrame.
Define an example DataFrame
We’ll start by importing the pandas library (look here if you are encountering pandas import errors); and then create a very simple DataFrame populated with some candidate data:
Delete a single column by name
The easiest case, is to drop a single column off the DataFrame:
# define column to remove col = 'office' #remove the column and assing to a new DataFrame interviews_df_1 = interviews_df.drop(col, axis=1)
Note: When calling the drop method, you can invoke the inplace=True parameter to persist your changes (in this case – the column removal) in the DataFrame.
interviews_df.drop(col, axis=1, inplace=True)
Unable to delete columns
If removing a column from your DataFrame doesn’t seem to be working for you, most probably you are missing one of these two:
- You drop the column from the DataFrame, but when visualizing its contents, you still see the column you dropped. If that’s the case, remember to either assigned the modified DataFrame to a new one, or use the inplace=True parameter mentioned above.
- You get the following error:
KeyError: "<your column name> not found in axis"
The problem here is that pandas is looking for specific column names in the rows axis. The solution is to use the axis=1 parameter, as shown in the examples throughout this tutorial.
Remove columns if exist in the DataFrame
Next example, is that we’ll trigger the column deletion only if the specific object is part of the columns index.
To display the list of columns in our DataFrame, we use the following snippet. The result is an Index object.
We can then write a very simple conditional statement to trigger the column removal if part of the index:
col = 'office' if col in (list(interviews_df.columns)): interviews_df_2 = interviews_df.drop(col, axis=1)
Delete a column if matches a certain pattern
In this example we’ll use a list comprehension to loop through the column index and construct a list object that has the name of the DataFrame cols that matches our pattern. Then we will go ahead and remove those.
pattern = 'month' drop_lst = [col for col in (interviews_df.columns) if col.find(pattern)>-1] interviews_df_4 = interviews_df.drop(drop_lst, axis=1)
Remove columns which names starts with a specific string
In a similar fashion, we can search for specific column names starting with a provided string and wipe them off our DataFrame.
pattern = 'month' drop_lst = [col for col in (interviews_df.columns) if col.startswith(pattern)] interviews_df_3 = interviews_df.drop(drop_lst, axis=1)