Today we will learn how to quickly find whether one or multiple cells in pandas DataFrame rows or columns contains empty values (Null, N/A, nan, None).
Create a DataFrame
First off, we’ll create a very simple Data set that you can use in order to follow along with this example.
import pandas as pd
import numpy as np
prog_language = [np.nan, 'Javascript', 'Python', 'Java', 'Python', 'Java']
office = ['Bangkok', 'Osaka', 'Paris', None, pd.NA, 'Paris']
salary = [133.0, 136.0, 127.0, 185.0, 164.0, pd.NA]
data = dict(office=office, prog_language = prog_language, salary = salary)
test_df = pd.DataFrame(data=data)
test_df.head()
Here’s our test data. Notice that some of the cells in the DataFrame contain null / NAN / none empty values:
office | prog_language | salary | |
---|---|---|---|
0 | Bangkok | NaN | 133.0 |
1 | Osaka | Javascript | 136.0 |
2 | Paris | Python | 127.0 |
3 | None | Java | 185.0 |
4 | <NA> | Python | 164.0 |
Check if a specific DataFrame cell is empty
We can use the pandas functions isna or isnull and the loc or iloc accessors to determine whether a specific cell is empty:
if pd.isna(test_df.loc[2,'office']) is False:
print("Your cell is empty.")
else:
print("Your cell is not empty.")
This will return the following:
Your cell is empty.
In a similar manner, using isnull and the indexer loc (iloc) accesor to test if our cell contains null values.
if pd.isnull(test_df.iloc[2,1]) is False:
print("Your cell is empty.")
else:
print("Your cell is not empty.")
Check if cells are empty in a column
Let’s assume we would like to check if a column contains empty cells:
# define a column to search
my_col = test_df['prog_language']
my_col.isna().any()
This returns True.
Find if nan values exists in DataFrame
To check the entire DataFrame, we use a similar technique:
test_df.isna().any()
This returns a boolean Series, each value pertaining to each of our DataFrame columns.
office True prog_language True salary True dtype: bool
Note: this technique apply to whether we would like to check for NAN values in multiple columns or rows:
subset = test_df[['office', 'salary']]
# find rows with nulls
subset.isna().any(axis=1)
# find columns with nulls
subset.isna().any()
Check if there are empty columns in your DataFrame
Assuming that you add the following column to the DataFrame:
test_df = test_df.assign(null_col = [np.nan , np.nan, np.nan,np.nan,np.nan,np.nan])
We can check whether your specific column containts null values only.
test_df['prog_language'].isna().all()
This returns True.
We can also find empty columns in our entire DataFrame:
test_df.isna().all()
This returns the following boolean series:
office False prog_language False salary False null_col True dtype: bool
Search for DataFrame rows containing nan values
If we want to find the rows in our DataFrame which contains any empty values, we can use the following snippet:
test_df[test_df.isna().any(axis=1)]
#alternatively
test_df[test_df.isnull().any(axis=1)]
FAQ
How to replace empty columns cells with a default value?
You can replace nan column values to the value ‘N/A’:
test_df.fillna('N/A', inplace=True)
How to determine that a DataFrame column is empty?
You can check that a DataFrame column is empty using the isnull() function:
is_empty = test_df['your_column_name'].isnull().all()
This will return a True value if the column is completely empty.