Step 1: Prepare your Data
We will first create a couple of a simple DataFrame:
import pandas as pd
hired = pd.DataFrame(
{'area': ['Python', 'DAX', 'Power BI'],
'applicants' : [110, 100, 130],
'hired' : [11, 10, 13]
})
Step 2: Check if a single row exists in your DataFrame
Let’s assume we have a simple list we want to add as a row to our DataFrame.
new_row = ['Python', 110, 11]
Using the loc accessor we can check whether there is a row which already contains same values in your DataFrame.
if hired.loc[(hired['area'] == 'Python') & (hired['applicants'] == 110) & (hired['hired'] == 11)].empty is True:
print('Your df doesn\'t contain your row')
else:
print('Your df contains your row')
Step 3: Check if multiple rows exists in another DataFrame
Let’s define a second DataFrame structure:
applications = pd.DataFrame(
{'area': ['Python', 'R', 'Power BI'],
'applicants' : [110, 120, 130],
})
A simple way to check whether rows exists is to merge the two DataFrames into a new one:
apps_vs_hired = pd.merge(applications,hired, on = ['area', 'applicants'], how = 'outer', indicator='exists_yes_no')
Let’s look at the new DataFrame:
apps_vs_hired.head()
area | applicants | hired | exists_yes_no | |
---|---|---|---|---|
0 | Python | 110 | 11.0 | both |
1 | R | 120 | NaN | left_only |
2 | Power BI | 130 | 13.0 | both |
3 | DAX | 100 | 10.0 | right_only |
The exists_yes_no column depicts whether all rows in both DataFrames exists in each other.
We can make the column values a bit more legible by mapping them to simpler to understand values using a Python dictionary:
exist_dict = {'left_only': 'Row exists in Applicants only',
'right_only': 'Row exists in Hired only',
'both': 'Row exists in both'}
apps_vs_hired['exists_yes_no'] = apps_vs_hired['exists_yes_no'].map(exist_dict)
We can now take a look at our DataFrame and clearly understand whether columns appear in any or both of the datasets:
area | applicants | hired | exists_yes_no | |
---|---|---|---|---|
0 | Python | 110 | 11.0 | Row exists in both |
1 | R | 120 | NaN | Row exists in Applicants only |
2 | Power BI | 130 | 13.0 | Row exists in both |
3 | DAX | 100 | 10.0 | Row exists in Hired only |
FAQ:
Are there additional ways to find specific rows in DataFrame?
In the example above, we use the loc accessor to find rows matching specific values. An additional method is to use the isin Series method:
languages = ['Python', 'R']
#boolean check for matching values in DataFrame rows
matches = applications['language'].isin(languages)
#filter DataFrame according to matching conditions
matched_rows = applications[matches]
print(matched_rows)
Are there performance considerations when searching for matching rows in Pandas?
Usage of vectorized functions (isin() or str.contains()) or boolean indexing are more effective than using for loops when searching text in pandas.