In today’s quick tutorial we’ll learn how to filter a Python Pandas DataFrame with the loc indexer. Specifically we will look into sub-setting data using complex condition criteria beyond the basics.
Data Preparation
#Import the Pandas library
import pandas as pd
#Initialize the DataFrame
hr = pd.read_csv('interviews.csv')
# Browse the first DF rows
hr.head()
Here’s our dataframe:
language | month | salary | num_candidates | days_to_hire | |
---|---|---|---|---|---|
1 | PHP | April | 128.0 | 74.0 | 56.0 |
2 | C# | February | 97.0 | 83.0 | 46.0 |
3 | R | January | 128.0 | 86.0 | 57.0 |
4 | Swift | October | 123.0 | 80.0 | 76.0 |
5 | Swift | December | 108.0 | 81.0 | 60 |
Select rows by conditions with iloc
We will start by writing a simple condition. Let’s assume that we ant to filter the rows realted to the Swift language.
subset = (hr['language'] == 'Swift')
# using the loc indexer
hr.loc[subset]
# using the brackets notation
hr[subset]
Both will render a similar result:
language | month | salary | num_candidates | days_to_hire | |
---|---|---|---|---|---|
4 | Swift | October | 123.0 | 80.0 | 76.0 |
5 | Swift | December | 108.0 | 81.0 | 60.0 |
Filter row by multiple conditions with loc
Let’s write a bit more complex statement, Instead of showing all columns, we’ll subset / select specific columns:
# Define a query string with isin
subset = hr['language'].isin (['PHP', 'R'])
# filter specific columns
cols = ['language', 'salary', 'days_to_hire']
Now we’ll pass the two parameters to the loc indexer on our DataFrame
hr.loc[subset, cols]
The result will be:
language | salary | days_to_hire | |
---|---|---|---|
1 | PHP | 128.0 | 56.0 |
3 | R | 128.0 | 57.0 |
Writing complex conditions (OR and AND)
In this last section will show you how to concatenate conditions using the & and | operators.
Condition based on OR operator:
subset = (hr['language'] == 'PHP') | (hr['salary'] < 128)
hr[subset]
Condition based on AND operator:
subset = (hr['language'] == 'PHP') & (hr['salary'] == 128)
hr[subset]
Drop DataFrame rows based on multiple conditions
We first define the complex condition. Then call the DataFRame drop method on the DF index.
subset = (hr['language'] == 'PHP') & (hr['salary'] == 128)
hr_new = hr.drop(hr[subset].index)
hr_new