How to filter rows by multiple conditions in Pandas?

In today’s tutorial we’ll learn how to select DataFrame rows by specific or multiple conditions. For people new to Pandas but experienced in SQL, we’ll learn how to write where statements aimed at selecting data from our DataFrames.

We’ll look into several cases:

  • Filtering rows by column value
  • Selecting by multiple boolean conditions
  • Selecting only rows containing items in a list
  • Using a lambda function to define a filter

Example DataFrame

We’ll start by defining a very simple DataFrame that you can use to follow along with the tutorial.

import pandas as pd

year = [2022, 2021, 2022, 2022, 2020]
area = ['R', 'Python', 'VisualBasic', 'Python', 'JavaScript']
candidates= (76, 79, 78, 88, 83)
interview = dict(year = year, area = area, candidates = candidates)
hr = pd.DataFrame(data=interview)

print(hr)

Here are our DataFrame rows:

yearareacandidates
02022R76
12021Python79
22022VisualBasic78
32022Python88
42020JavaScript83

#1 Filter rows by column value

This is the simplest example, we have a simple condition to subset our DataFrame with.

col_val = 'Python'
filt1 = (hr['area'] == col_val)

hr[filt1]

Here is the result:

yearareacandidates
12021Python79
32022Python88

#2 Select DataFrame rows by multiple conditions

A little bit more realistic scenario is that we need to write a complex filter made of several boolean expressions.

# we can also use or ('\') boolean gates
filt2 = (hr['area'] == 'Python') & (hr['candidates'] > 87)
hr[filt2]
yearareacandidates
32022Python88

#3 Subset rows by condition in list

In this next example we’ll search and filter by values in a list.

area_lst = ['R', 'Python']
filt3 = hr['area'].isin(area_lst)

hr[filt3]

The result:

yearareacandidates
02022R76
12021Python79
32022Python88

#4 Filtering DataFrames using lambda functions

Lambda functions are very powerful artifacts that we can leverage to select specific DataFrame records. Here’s a very simple example:

# select only rows with more than 80 candidates OR records from 2020
filt4 = lambda  x: (hr['candidates'] >80) | (hr['year'] == 2020)
hr[filt4]

And here’s our result:

yearareacandidates
32022Python88
42020JavaScript83

Using the loc indexer to select specific data

For completeness, we can also filter our data using the loc indexer. This will work for all examples we provided so far in this tutorial. Example:

hr.loc[filt3]