How to filter a Pandas DataFrame by specific column and row values?

In today’s Data Wrangling tutorial we’ll learn how to use Python in order to subset Pandas DataFrames and select specific columns according to column data and label values.

Create an example dataset

Use the following code to initialize our DataFrame.

import pandas pd

candidates = {'city': ['Miami', 'Boston', 'Boston','New York', 'Atlanta'],
              'month': ['01', '02', '03', '04', '05',],
              'num_candidates': [190,82,88, 48, 56]}

hr = pd.DataFrame(candidates)

hr.head()

Here’s our DataFrame:

Select column by one or multiple value in list

The following code will select rows in which the city name matches multiple values specified in the cities list.

# column value in list
cities = ['Boston', 'Atlanta']
hr[hr['city'].isin(cities)]

Here’s the result:

Subset dataframe column if it contains specific values

filt = (hr['city'] == 'Boston')
hr[filt]

Filter according to complex conditions

In this case we’ll filter out rows in which column values match specific complex conditions. Here is an example:

# multiple conditions / greater than
filt = (hr['city'] == 'Boston') | (hr['num_candidates'] > 60)

Subset using a regex

In this example we’ll subset only columns which label matches a specific expression. We’ll use the filter() method and pass the expression into the like parameter as shown in the example depicted below.

# filter by column label value
hr.filter(like='ity', axis=1)

We can also cast the column values into strings and then go ahead and use the contains() method to filter only columns containing a specific pattern.

filt = hr['city'].str.contains('ton')
hr[filt]

Filter according to the column label

For completeness, below you can find how we can use the loc indexer to subset a DataFrame by column label.

# filter by column label value
hr.loc[:,'city']

Note that you can obviously accomplish the same result by using the following techniques:

Select columns with bracket notation:

hr['city']

or using the iloc indexer, we can write the following expression

hr.iloc[:,0]