How to delete rows with null / missing (NAN) values from your Python DataFrame?

As part of our Data Wrangling process we often clean our Data set and remove outlier observations before proceeding with further analysis and visualization.

In today’s tutorial we’ll learn how to use the Pandas library’s DataFrame.dropna() method to get rid of rows containing missing values.

Example Dataset

We’ll get started by importing the Pandas and Numpy libraries and create a very simple DataFrame from a dictionary.

import pandas as pd
import numpy as np

employees = {'employee': [ 'John', 'Don', 'Joe'],
             'salary':[110, 120, 190],
             'employer' : [np.nan,'ABC Corp',np.nan]}

my_data = pd.DataFrame(data=employees)
my_data.head()

Here’s our data:

Counting number of missing values

You might want to first identify and count the missing values in your DataFrame. Here’s the code and result:

my_data.isna().sum()

The result is a Pandas Series containing the number of missing values in each column.

employee    0
salary      0
employer    2
dtype: int64

Drop rows with missing values from our Python DataFrame

As mentioned before, we’ll use the DataFrame.dropna() method.

We can create a new DataFrame containing rows with non empty values:

my_data1 = my_data.dropna(axis=0)
my_data1.head()

Here’s the result:

We can as well use the inplace=True parameter to persist the changes in our original DataFrame:

my_data.dropna(axis=0, inplace=True)

Delete rows with nan with condition

What if we would like to drop rows with NAN, but do that only if the empty values are located in specific columns?

Luckily, we can use the subset parameter and pass the relevant columns to the dropna() method. The following code will search for empty values on two specific columns.

my_data.dropna(axis=0, subset=['employee', 'salary'] )

Remove columns with NAN

If we would like to delete columns containing NAN values, then we’ll pass the axis=1 parameter to dropna():

my_data2 = my_data.dropna(axis=1)

Next Learning

We have several tutorials which you might want to look into related to sub-setting and slicing DataFrames according to certain conditions.