In this tutorial we will learn how to subset an R DataFrame (or tibble) by multiple conditions. The simplest way to do this is to use the tidyverse package and its core component dplyr.
Create an R Example DataFrame
Let’s first import the tidyverse package into RStudio and create a simple R DataFrame that we will use for our example:
library (tidyverse)
col01 <- c('West', 'South', 'East', 'West', 'South', 'East')
col11 <- c (126.0, 143.0, 185.0, 159.0, 138.0, 125.0)
col12 <- c (65.0, 47.0, 58.0, 68.0, 79.0, 83.0)
col13 <- c (62.0, 63.0, 79.0, NA, NA, 60.0)
var_df <- data.frame (team = col01, var1 = col11, var2 = col12, var3 = col13)
Note: Here’s how to troubleshoot tidyverse no package found issues.
Let’s look into our dataset (named var_df)
print(var_df)
Here’s are our DataFrame rows, note that one column contains NA missing values.
team | var1 | var2 | var3 | |
---|---|---|---|---|
1 | West | 126 | 65 | 62 |
2 | South | 143 | 47 | 63 |
3 | East | 185 | 58 | 79 |
4 | West | 159 | 68 | NA |
5 | South | 138 | 79 | NA |
6 | East | 125 | 83 | 60 |
Subset R DataFrame rows by conditions (AND)
We can easily define a complex set of conditions and concatenate those using a boolean AND (&)
#
subset <- filter (var_df, var1 < 138 & var2 >=75)
print (subset)
This will render the following:
team var1 var2 var3 1 East 125 83 60
Select R rows by conditions (with OR)
In the same fashion we can construct our condition using a boolean OR (|)
var_df %>% filter (var2 >=75 | is.na(var3) )
Filter rows with NA values and other condition
Note the usan of the is.na() function(from R Base):
> var_df %>% filter (var2 >=75 | is.na(var3) )
This will return the following result:
team var1 var2 var3 1 West 159 68 NA 2 South 138 79 NA 3 East 125 83 60
Query Rows containing a string or empty values
In the following example we query for rows with cells matching a string pattern (in our case ‘W’), or containing NA empty values in the var3 column:
var_df %>% filter (str_detect(team, 'W') | is.na(var3) )