How to select rows with NAN missing values in Pandas?

As part of our data wrangling process we might need to find, replace or even drop empty values in our data set so that these values don’t impact our analysis.

In this short tutorial, we’ll learn a few ways to use Python and the Pandas library to query our data and find data rows containing empty values.

Creating our example Data set

We’ll start by importing the Pandas library and creating a very simple dataset that you can use to follow along:

import pandas as pd
import numpy as np

city =  ['DTW', 'NYC', 'SFO','ORD']
office =  ['West', 'North', np.nan, 'East']
interviews = [10,104,210,np.nan]

#create the pandas dataframe
data_dict = dict(city=city, office=office, interviews=interviews)
hr = pd.DataFrame (data_dict)

print(hr.head())

Here’s our DataFrame:

cityofficeinterviews
0DTWWest10.0
1NYCNorth104.0
2SFONaN210.0
3ORDEastNaN

Find rows with null values in Pandas Series (column)

To quickly find cells containing nan values in a specific Python DataFrame column, we will be using the isna() or isnull() Series methods.

Using isna()

nan_in_col  = hr[hr['interviews'].isna()]

Using isnull()


nan_in_col  = hr[hr['interviews'].isnull()]

Both methods will render the following result:

print(nan_in_col)
cityofficeinterviews
3ORDEastNaN

Select rows with missing values in a Pandas DataFrame

If we want to quickly find rows containing empty values in the entire DataFrame, we will use the DataFrame isna() and isnull() methods, chained with the any() method.

nan_rows  = hr[hr.isna().any(axis=1)]

or

nan_rows  = hr[hr.isnull().any(axis=1)]

Alternatively we can use the loc indexer to filter out the rows containing empty cells:


nan_rows  = hr.loc[hr.isna().any(axis=1)]

All the above will render the following results:

print(nan_rows)
cityofficeinterviews
2SFONaN210.0
3ORDEastNaN

Select DataFrame columns with NAN values

You can use the following snippet to find all columns containing empty values in your DataFrame.


nan_cols = hr.loc[:,hr.isna().any(axis=0)]

Find first row containing nan values

If we want to find the first row that contains missing value in our dataframe, we will use the following snippet:

hr.loc[hr.isna().any(axis=1)].head(1)

Replace missing nan values with zero

Once found, we might decide to fill or replace the missing values according to specific login. We typically use the fillna() DataFrame or Series method for that. In this example we’ll going to replace the missing values in the interview column with 0.

hr.fillna({'interviews':0},  inplace=True)

Additional learning