How to select rows containing NA missing values in R?

In this Data Analysis tutorial we will learn how to find an subset R DataFrame rows containing empty values.

We will first define a random R DataFrame. Note that the vectors include empty / missing values (NA).

var1 <- c (NA, 165.0, 165.0, 140.0, 216.0, 134.0)
var2 <- c (71.0, NA, 54.0, NA, 86.0, NA)

var_df <- data.frame (var1 = var1, var2 = var2)

print (var_df)

Here’s our data:


Find rows with missing values in R DataFrame

compete.cases is a nice function that when run on a DataFrame allows to find all rows which are complete, meaning they don’t have empty / missing values.

In order to find rows (observations) that do contain NA values, we need to negate (using the ! sign) the output of complte.cases, and then using to subset the DataFrame rows.

rows_with_na <- var_df [!complete.cases(var_df), ]
print (rows_with_na)

This will return as expected the following four rows:

     var1 var2
1   NA   71
2  165   NA
4  140   NA
6  134   NA

Extract rows with NA in a column

What if we would like to subset only rows with NAs in one specific column? The following snippet allows to find observations with missing values in variable column var2.

rows_with_na_var2 <- var_df[$var2) , ]
print (rows_with_na_var2)

This will return the following 3 observations:

     var1 var2
2  165   NA
4  140   NA
6  134   NA

Fix the undefined columns selected error

Note that sub setting your DataFrame we included a comma after selecting the empty rows.

rows_with_na_var2 <- var_df[$var2) , ]

Failing to include a comma will trigger an error in RStudio (or other R environments):

#Error in `[.data.frame`(var_df,$var2)) : 
#  undefined columns selected

R expects that the following syntax will be provided when subsetting a DataFrame: df[row,col]. Failing to add the comma sign will tell R that we are trying to subset columns and not rows and cause the rror.