How to check if a Pandas column contains a string?

In this Data Analysis tutorial we’ll learn how to use Python to search for a specific or multiple strings in a Pandas DataFrame column.

In a nutshell we can easily check whether a string is contained in a column using the .str.contains() Series function:


Read on for several examples of using this capability.

Creating test data

We’ll start by creating a simple DataFrame for this example

import pandas as pd

#Create data
month = ['October', 'November', 'July', 'June', 'February']
language = ['Python', 'Python', 'Haskell', 'JavaScript', 'Go']
salary = (81, 77, 74, 80, 75)

hr_dict = dict(month=month, language=language, salary=salary)
hr_df = pd.DataFrame(data=hr_dict)


This will result in the following DataFrame:


Check if Series contains a string

The simplest example is to check whether a DataFrame column contains a string literal


This will return a Series of boolean values.

More useful is to get the count of occurrences matching the string ‘Python’. Note that the this assumes case sensitivity.


In this case 2.

Check for Case insensitive strings

In a similar fashion, we’ll add the parameter Case=False to search for occurrences of the string literal, this time being case insensitive:

hr_df['language'].str.contains('python', case=False).sum()

Check is a substring exists in a column with Regex

Another case is that you might want to search for substring matching a Regex pattern:

hr_df['language'].str.contains('\Dt', regex=True).sum()

Search for multiple strings in DataFrame column

We might want to search for several strings. We can use the boolean operators ‘|’ and ‘&’ to define character sequences to be checked for. In this case we search for occurrences of Python or Haskell in our language Series.


Check for strings contained in a list

In this case we’ll use the Series function isin() to check whether elements in a Python list are contained in our column:

# define a list
lang_lst = ['Python', 'Go']

#count occurrences from the list

The result will be 3.

