In today’s data wrangling tutorial, we’ll learn how to use Python to check how to find out if one or multiple columns exist in a DataFrame. This can be helpful after exporting a very wide Data set, or before joining two DataFrames.
Create the Python DataFrame
We’ll first go ahead and create a very simple DataFrame – feel free to use it in order to follow along this example.
# import the Pandas library
import pandas as pd
# define your data
cols = ['emp_id', 'name', 'target', 'achievement']
data = [['1', 'John Sales', 450, 500], [2,'Debbie Financials', pd.NA, pd.NA]]
sales = pd.DataFrame (data=data, columns=cols)
sales.head()
Let us now look into the DataFrame that we have just created:
emp_id | name | target | achievement | |
---|---|---|---|---|
0 | 1 | John Sales | 450 | 500 |
1 | 2 | Debbie Financials | <NA> | <NA |
Check if a column exist in a DataFrame
To find whether a specific column exists across a Pandas DataFrame rows you can use the following snippet:
# column exists in row
print('target' in sales)
' This will return a boolean True
Alternatively, use the following code:
col = 'target'
pd.Series(col).isin(sales)
'This will return a Series object
Find if multiple columns are contained in your DataFrame
To check if your DataFrame contains multiple specific columns, pass a list to the pd.Series method as shown below and check against your DataFrame columns index
subset = ['target', 'achievement']
pd.Series(subset).isin(sales.columns)
Result will be a Series containing boolean values.
0 True 1 True dtype: bool
Find if a column contains a specific string
To find the rows that exists and matches a specific string pattern in your DataFrame column:
mask =
sales['name'].str.contains('Sales')
sales[mask]
Result will be:
emp_id | name | target | achievement | |
---|---|---|---|---|
0 | 1 | John Sales | 450 | 500 |
Pandas – check if a row has empty values
Last case for today is to find whether specific column has empty values:
# column has empty values
sales['target'].isna().sum()
This will return the number of empty values in the column – in this case 1.