How to check if a Pandas DataFrame row or cell contains a string?

In this tutorial we’ll learn how to quickly validate whether a specific pandas DataFrame row or specific one or multiple column cells contain a specific text pattern.

Example DataFrame

We will get started with importing the pandas library and defining some simple data that you are welcome to use in order to follow along this example.

import pandas as pd

month = ['April', 'March', 'January', 'March', 'May']
language = ['Python', 'R', 'Java', 'Javascript', 'Python']
office = ['LA', 'Rome', 'Madrid', 'New York', 'Toronto']
salary = [177.0, 177.0, 155.0, 139.0, 155.0]
salaries = dict(month=month, language=language, salary = salary)
salary_df = pd.DataFrame(data=salaries)
salary_df.head()

Let’s take a quick look at our data:

monthlanguagesalary
0AprilPython177.0
1MarchR177.0
2JanuaryJava155.0
3MarchJavascript139.0
4MayPython155.0

Check if a DataFrame row contains specific text

We’ll start by defining a string pattern and then search our DataFrame rows:

# define value to search for
str_pattern = 'Python'

# row by row check for the string pattern in the row values
filt = (salary_df
.apply(lambda r: r.astype('string').str.contains(str_pattern)
.any(), axis=1))

print(filt)

Note tat in order to search row by row, we first need to convert each DataFrame row to the string type. Also note that we have used axis=1 to search across the columns of each row. We’ll get a Series of boolean values highlighting the DataFrame rows contain the string pattern:

0     True
1    False
2    False
3    False
4     True
dtype: bool

We can now filter our DataFrame to highlight the relevant rows:

salary_df[filt]

The result will be:

monthlanguagesalary
0AprilPython177.0
4MayPython155.0

Verify if a column cell contains a value in list

Let’s start by defining a list of strings we will be searching for:

lst_pattern = ['Python', 'R']

Now filter the our column (Series) accordingly:

filt = salary_df['language'].isin(lst_pattern)
salary_df[filt]

This will be out output:

monthlanguagesalary
0AprilPython177.0
1MarchR177.0
4MayPython155.0

Search for text string in multiple columns

In a similar fashion we can earch for strings only in a subset of our DataFrame.

# define list and DataFrame subset
lst_pattern = ['Python', 'R']
subset = salary_df[['month','language']]

# Search for value in list in DataFrame
filt = subset.apply (lambda r: r.isin(lst_pattern).any(), axis=1)

Show the relevant rows:

subset[filt]

And the output will be:

monthlanguage
0AprilPython
1MarchR
4MayPython

Additional Learning