How to find a value in columns of your pandas DataFrame?

In today’s tutorial we’ll learn how to find specific single or multiple values across columns of your pandas DataFrame.

We’ll first import the pandas Data Analysis library into your Python development environment (Jupyter, PyCharm or others).

import pandas as pd

We’ll now define a simple pandas DataFrame that you are able to use to follow along this example.

month = ['July', 'October', 'September', 'July', 'November', 'July']
language = ['Java', 'Python', 'Python', 'Javascript', 'R', 'Javascript']
office = ['Istanbul', 'New York', 'Osaka', 'Toronto', 'New York', 'Hong Kong']
salary = [181.0, 203.0, 163.0, 181.0, 121.0, 132.0]
hr_campaign = dict(month = month, language = language, salary = salary)
interviews_data = pd.DataFrame(data=hr_campaign)

Let’s take a look at the data:

interviews_data.head()
monthlanguagesalary
0JulyJava181.0
1OctoberPython203.0
2SeptemberPython163.0
3JulyJavascript181.0
4NovemberR121.0

Finding specific value in Pandas DataFrame column

Let’s assume that we would like to find interview data related to Python candidates. We’ll define our search criteria and filter the pandas DataFrame accordingly.

value = 'Python'

mask = interviews_data['language'].str.contains(value)

interviews_data[mask]

Here’s our result:

monthlanguagesalary
1OctoberPython203.0
2SeptemberPython163.0

Search for multiple strings in a column

In the same fashion we can find multiple strings. We’ll first define a Python list containing values to search for and then subset the DataFrame.


value_lst = ['Java', 'Python']
mask = interviews_data['language'].isin(value_lst)
interviews_data[mask]

This also returns a DataFrame:

monthlanguagesalary
0JulyJava181.0
1OctoberPython203.0
2SeptemberPython163.0

Same result can be returned by filtering your pandas DataFrame using the loc indexer:

interviews_data.loc[mask]

Filter column by value greater than

Same logic applies for selecting specific rows according to a condition. In the following example we will retrieve rows with average salary expectation higher to the maximal salary threshold defined by the company:

max_salary = 170
mask = interviews_data['salary'] >= max_salary
interviews_data[mask]

Searching for a string and return the index

We can return the index of the relevant rows using the index DataFrame method:

interviews_data[mask].index

This will return the following index:

Int64Index([0, 1, 3], dtype='int64')

Find values based on other column values

In the next use case, we’ll use the query DataFrame method to find the salaries pertaining to the Python candidates:


value = 'Python'
python_salaries = interviews_data.query('language ==  @value')[['month','salary']]
python_salaries

This will render the following DataFrame subset:

monthsalary
1October203.0
2September163.0

Replace values found in column

After finding specific string or numeric values, we can also use the replace DataFrame method to replace values as needed. In this very trivial example we’ll replace the string “Javascript” with the string “JavaScript”.

max_salary = 170
mask = interviews_data['salary'] >= max_salary

high_salaries = interviews_data[mask].replace('Javascript', 'JavaScript')