Filter R DataFrame rows by multiple conditions with tidyverse and dplyr?

In this tutorial we will learn how to subset an R DataFrame (or tibble) by multiple conditions. The simplest way to do this is to use the tidyverse package and its core component dplyr.

Create an R Example DataFrame

Let’s first import the tidyverse package into RStudio and create a simple R DataFrame that we will use for our example:

library (tidyverse)
col01 <- c('West', 'South', 'East', 'West', 'South', 'East')
col11 <- c (126.0, 143.0, 185.0, 159.0, 138.0, 125.0)
col12 <- c (65.0, 47.0, 58.0, 68.0, 79.0, 83.0)
col13 <- c (62.0, 63.0, 79.0, NA, NA, 60.0)

var_df <- data.frame (team = col01, var1 = col11, var2 = col12, var3 = col13)

Note: Here’s how to troubleshoot tidyverse no package found issues.

Let’s look into our dataset (named var_df)

print(var_df)

Here’s are our DataFrame rows, note that one column contains NA missing values.


team
var1var2var3
1West1266562
2South1434763
3East1855879
4West15968NA
5South13879NA
6East1258360

Subset R DataFrame rows by conditions (AND)

We can easily define a complex set of conditions and concatenate those using a boolean AND (&)

# 
subset <- filter (var_df, var1 < 138 & var2 >=75)
print (subset)

This will render the following:

  team var1 var2 var3
1 East  125   83   60

Select R rows by conditions (with OR)

In the same fashion we can construct our condition using a boolean OR (|)

var_df %>% filter (var2 >=75 | is.na(var3) )

Filter rows with NA values and other condition

Note the usan of the is.na() function(from R Base):

> var_df %>% filter (var2 >=75 | is.na(var3) )

This will return the following result:

   team var1 var2 var3
1  West  159   68   NA
2 South  138   79   NA
3  East  125   83   60

Query Rows containing a string or empty values

In the following example we query for rows with cells matching a string pattern (in our case ‘W’), or containing NA empty values in the var3 column:

 var_df %>% filter (str_detect(team, 'W') | is.na(var3) )