Use the stringr library str_detect function to check if a specific value is included in an R DataFrame column:
library(stringr)
if sum(str_detect(my_df$my_column, 'string_value) > 0)
{
Your code here
}
Note: Make sure to install the stringr library (also included in tidyverse) before utilizing the str_detect function in your RStudio script or Jupyter notebook.
Check if column contains value in R – Practical example
Create a sample DataFrame
Use the following code to create an R DataFrame that we’ll use in this tutorial:
language <- c ('JavaScript', 'Python', 'Java', 'R')
interviews <- c (12,15,23,25)
hiring <- data.frame (language = language, interviews = interviews)
print (hiring)
Here’s our DataFrame:
language | interviews | |
---|---|---|
1 | JavaScript | 12 |
2 | Python | 15 |
3 | Java | 23 |
4 | R | 25 |
Check if a specific string exists in a column
To verify that a specific value exists in a column we can use the following snippet:
sum(str_detect(hiring$language, '^Java'))
This will return the value 2, as we have one row containing the word Java and other the word JavaScript.
Note the usage of the regular expression ^ denoting that the start of the string we are searching for.
Check if column contains one or more values
The following snippet checks whether a column contains either the string Java or the string Python:
sum(str_detect(hiring$language, 'Java|Python'))
This will return the value 3, as we have one row containing the string Java , one containing the string JavaScript and one containing the string Python.
Verify that a column contains an exact value
The following snippet verifies that the word Java is contained in our language column.
sum(str_detect(hiring$language, '^Java$'))
The regex $ determines the end of the string. Hence, the snippet above will return the value 1, as only cells matching the exact word Java will be counted.
Adding to a conditional statement in R
You can now include the snippets above in a conditional if/else statement to verify that a specific value appears in one of your column cells.
if (sum(str_detect(hiring$language, '^Java')) > 0) {
print ('The string you specified appears in the column.')
}else
{print ('The string does not exist in the column.')} }