How to check if one or multiple Pandas rows exist?

Step 1: Prepare your Data

We will first create a couple of a simple DataFrame:

import pandas as pd 

hired = pd.DataFrame(
                    {'area': ['Python', 'DAX', 'Power BI'],
                    'applicants' : [110, 100, 130],
                     'hired' : [11, 10, 13]
                     })

Step 2: Check if a single row exists in your DataFrame

Let’s assume we have a simple list we want to add as a row to our DataFrame.

new_row = ['Python', 110, 11]

Using the loc accessor we can check whether there is a row which already contains same values in your DataFrame.

if hired.loc[(hired['area'] == 'Python') & (hired['applicants'] == 110) & (hired['hired'] == 11)].empty is True:
    print('Your df doesn\'t contain your row')
else:
    print('Your df contains your row')

Step 3: Check if multiple rows exists in another DataFrame

Let’s define a second DataFrame structure:

applications = pd.DataFrame(
                    {'area': ['Python', 'R', 'Power BI'],
                    'applicants' : [110, 120, 130],
                     })

A simple way to check whether rows exists is to merge the two DataFrames into a new one:

apps_vs_hired = pd.merge(applications,hired, on = ['area', 'applicants'], how = 'outer', indicator='exists_yes_no')

Let’s look at the new DataFrame:

apps_vs_hired.head()
areaapplicantshiredexists_yes_no
0Python11011.0both
1R120NaNleft_only
2Power BI13013.0both
3DAX10010.0right_only

The exists_yes_no column depicts whether all rows in both DataFrames exists in each other.

We can make the column values a bit more legible by mapping them to simpler to understand values using a Python dictionary:

exist_dict = {'left_only': 'Row exists in Applicants only',
             'right_only': 'Row exists in Hired only',
             'both': 'Row exists in both'}

apps_vs_hired['exists_yes_no'] = apps_vs_hired['exists_yes_no'].map(exist_dict)

We can now take a look at our DataFrame and clearly understand whether columns appear in any or both of the datasets:

areaapplicantshiredexists_yes_no
0Python11011.0Row exists in both
1R120NaNRow exists in Applicants only
2Power BI13013.0Row exists in both
3DAX10010.0Row exists in Hired only

FAQ:

Are there additional ways to find specific rows in DataFrame?

In the example above, we use the loc accessor to find rows matching specific values. An additional method is to use the isin Series method:

languages = ['Python', 'R']

#boolean check for matching values in DataFrame rows
matches = applications['language'].isin(languages)

#filter DataFrame according to matching conditions
matched_rows = applications[matches]

print(matched_rows)

Are there performance considerations when searching for matching rows in Pandas?

Usage of vectorized functions (isin() or str.contains()) or boolean indexing are more effective than using for loops when searching text in pandas.