One of the prevalent tasks in data wrangling is filtering data that we have previously imported into one or multiple DataFrames. In today’s tutorial we will learn how to use the loc accessor to filter pandas DataFrame rows and columns.
As we typically do, we’ll import the pandas library and create a very simple dataset that you can use in order to follow along.
Let’s take a look at our data:
|4||Python||Rio de Janeiro||134.0|
Using the pandas loc accesor to filter by multiple conditions
The loc accessor allows to filter a Pandas DataFrame by rows and columns labels. The basic syntax is simple:
If we want to filter by multiple criteria, we’ll define a conditional statement – in this case using an AND (&) condition:
filt = (sal_df['area'] == 'Java') & (sal_df['salary'] > 200)
This will return a series of boolean elements that we can pass to the loc accessor:
The result will look as following:
We can define a complex condition using OR (|):
filt = (sal_df['area'] == 'Java') | (sal_df['salary'] > 200) sal_df.loc[filt]
Here’s the DataFrame subset:
Filter by multiple string values
Here’s the result:
Subset by multiple values in a list
In this case we would like to pass a list of strings or numbers and then use those as a criteria for sub setting our DataFrame. In this case we will use the isin() Series method is search for specific rows and then filter the DataFRame accordingly.
Questions / feedback? Feel free to let us know in the Comments section.