How to find DataFrame rows according to condition in Pandas?

In this short tutorial we’ll learn how to subset a Pandas DataFrame according to simple and complex conditions. This is a core capability that you should master in order to effectively wrangle data. You can think about it as similar functionality to the one provided by the SQL where statement but in Pandas.

Create our example DataFrame

As we typically do, we’ll start by creating a DataFrame that you can use to follow along this example in your own computer. Enter the code below into your Python Data Analysis development environment:

import pandas as pd

month = ['June', 'August', 'February', 'July']
language = ['Java', 'Kotlin', 'PHP', 'Python']
first_interview = (78, 93, 76, 89)
hr_dict = dict(month=month, language=language, interview_1=first_interview)
hr_df = pd.DataFrame(data=hr_dict)

print(hr_df.head())

Here’s our DataFrame:

monthlanguageinterview_1
0JuneJava78
1AugustKotlin93
2FebruaryPHP76
3JulyPython89

Find rows by single condition

First case will be to filter our DataFrame according to rows containing specific values. Initially we’ll use a simple condition as an example:

# select rows by simple condition
condition = (hr_df['language'] == 'Python')
hr_df[condition]

The following records will be selected:

monthlanguageinterview_1
3JulyPython89

Select rows by multiple conditions (with AND and OR)

More interesting is to query our DataFrame according to multiple conditions. We concatenate the conditions using AND (&) and OR (|) boolean operators.

Using the OR operator

# multiple conditions (OR) 

condition = (hr_df['language'] == 'Python') | (hr_df['month'] == 'June')
hr_df[condition]

The result will be:

monthlanguageinterview_1interview_2
0JuneJava7843
3JulyPython8961

Using the AND operator

# multiple conditions (AND)
condition = (hr_df['language'] == 'Python') & (hr_df['month'] == 'June')
hr_df[condition]

No results will be found.

Selecting rows when column value starts with

Could be that you might want to select row values that matches a specific string pattern. In the example below, we’ll filter only entries in the language column which starts with the letter K.

#select rows column value starts with
condition = (hr_df['language'].str.startswith('K'))
hr_df[condition]

Here’s the result:

monthlanguageinterview_1interview_2
1AugustKotlin9368

Pick records by conditions in list

In our last example for today, we’ll select one or multiple rows only if they match a specific list of values.

# find rows by condition in list

#define the list of values
lang_lst = ['PHP', 'Python']
#subset the dataframe
hr_df[hr_df['language'].isin(lang_lst)]
monthlanguageinterview_1interview_2
2FebruaryPHP7647
3JulyPython8961

Additional Learning