How to use pandas loc to select rows by multiple conditions?

One of the prevalent tasks in data wrangling is filtering data that we have previously imported into one or multiple DataFrames. In today’s tutorial we will learn how to use the loc accessor to filter pandas DataFrame rows and columns.

Example DataFrame

As we typically do, we’ll import the pandas library and create a very simple dataset that you can use in order to follow along.

import pandas as pd

area = ['Javascript', 'Java', 'Java', 'R', 'Python', 'R']
office = ['Shenzhen', 'Bangalore', 'N', 'Oew York', 'Rio de Janeiro', 'Rio de Janeiro']
salary = [151.0, 148.0, 248.0, 181.0, 134.0, 136.0]
interviews = dict(area = area, office = office, salary = salary)
sal_df = pd.DataFrame(data = interviews)

Let’s take a look at our data:

sal_df.head()
areaofficesalary
0JavascriptShenzhen151.0
1JavaBangalore148.0
2JavaNew York248.0
3RNew York181.0
4PythonRio de Janeiro134.0

Using the pandas loc accesor to filter by multiple conditions

The loc accessor allows to filter a Pandas DataFrame by rows and columns labels. The basic syntax is simple:

my_df.loc[row_labels, column_labels]

If we want to filter by multiple criteria, we’ll define a conditional statement – in this case using an AND (&) condition:

filt = (sal_df['area'] == 'Java') & (sal_df['salary'] > 200)

This will return a series of boolean elements that we can pass to the loc accessor:

sal_df.loc[filt]

The result will look as following:

areaofficesalary
2JavaNew York248.0

We can define a complex condition using OR (|):

filt = (sal_df['area'] == 'Java') | (sal_df['salary'] > 200)
sal_df.loc[filt]

Here’s the DataFrame subset:

areaofficesalary
1JavaBangalore148.0
2JavaNew York248.0

Filter by multiple string values

Let’s now assume that we would like to subset the DataFrame according to multiple string values. In this case we will show rows which contains the values Java or JavaScript. Here’s a simple example of how to do that:

filt = sal_df['area'].str.contains ('Java|Javascript')

sal_df.loc[filt]

Here’s the result:

areaofficesalary
0JavascriptShenzhen151.0
1JavaBangalore148.0
2JavaNew York248.0

Subset by multiple values in a list

In this case we would like to pass a list of strings or numbers and then use those as a criteria for sub setting our DataFrame. In this case we will use the isin() Series method is search for specific rows and then filter the DataFRame accordingly.

value_lst = ['Javascript', 'Java']
filt = sal_df['area'].isin(value_lst)

sal_df.loc[filt]

Questions / feedback? Feel free to let us know in the Comments section.

Leave a Comment