How to select rows by column values in a Pandas DataFrame

In today’s tutorial we will learn how to filter one or multiple pandas DataFrame and to select specific rows by value and condition.

Example DataFrame

Let’s start by importing the pandas Python library and defining an example DataFrame that you can use to follow along. In your Python dev environment run the following code

sales_dict = {
    'area' : ['South', 'North', 'East', 'West'],
             'type' : ['B2B', 'B2C', 'B2B', 'B2C'],
             'revenue' : [1345, 1987, 5634, 3424]
              }

sales_df = pd.DataFrame (sales_dict, columns = {'area': 'area', 'type':'cost_center', 'revenue':'revenue'})

To browse through our DataFrame values, run the following command:

print(sales_df.head())

Here’s our data:

areatyperevenue
0SouthB2B1345
1NorthB2C1987
2EastB2B5634
3WestB2C3424

Filter DataFrame row by index value

In this first example, we’ll use the iloc accesor in order to slice out a single row from our DataFrame by its index.

sales_df.iloc[0]

The output is a Series representing the row values:

area       South
type         B2B
revenue     1345
Name: 0, dtype: object

Filter one or multiple rows by value

The next simple example is to filter our data by a single value in a column.

filt = (sales_df ['area'] == 'North')
sales_df[filt]

The output will be:

areatyperevenue
1NorthB2C1987

Select rows by column value in list

Next example, we would like to find only rows that include specific column values which are elements of a Python list:

filt = (sales_df ['area'].isin ([ 'North', 'South']))
sales_df[filt]

Here’s the output:

areatyperevenue
0SouthB2B1345
1NorthB2C1987

Aside: make sure that you invoke the Series isin method correctly. The following code will return a Type Error because the isin builtin method can’t receive a list as a parameter if not enclosed by parentheses.

# this will render an error:
filt = (sales_df ['area'].isin [ 'North', 'South'])

TypeError: 'method' object is not subscriptable

Filter rows by value range

We can use the between function to define value ranges on numeric data as shown below:

filt = (sales_df ['revenue'].between(2500, 4000))
sales_df[filt]

Here’s the output:

areatyperevenue
3WestB2C3424

Get the maximum value row

To get the highest value of a column we use the max() function. Let’s take a look:

filt = (sales_df ['revenue'] == sales_df ['revenue'].max(axis=0))
sales_df[filt]

Here’s the maximum value row:

areatyperevenue
2EastB2B5634

Find the minimum value row

Conversely, to find the row containing a column minimal value, we use the min() function.

filt = (sales_df ['revenue'] == sales_df ['revenue'].min(axis=0))
sales_df[filt]

Here’s the minimal value row:

areatyperevenue
0SouthB2B1345

Select by condition value in any / multiple columns

We will wrap up this tutorial with filtering our DataFrame according to a complex condition that searches for specific values in multiple columns. In this case we created an OR (|) condition. You can obviously concatenate conditions also with an AND (&) boolean gate.

filt = (sales_df['area'].isin ([ 'North', 'South'])) | (sales_df['type'] == 'B2C')
sales_df[filt]

Here’s our output:

areatyperevenue
0SouthB2B1345
1NorthB2C1987
3WestB2C3424