Step 1: Prepare your Data
We will first create a couple of a simple DataFrame:
import pandas as pd
hired = pd.DataFrame(
{'area': ['Python', 'DAX', 'Power BI'],
'applicants' : [110, 100, 130],
'hired' : [11, 10, 13]
})
Step 2: Check if a single row exists in your DataFrame
Let’s assume we have a simple list we want to add as a row to our DataFrame.
new_row = ['Python', 110, 11]
Using the loc accessor we can check whether there is a row which already contains same values in your DataFrame.
if hired.loc[(hired['area'] == 'Python') & (hired['applicants'] == 110) & (hired['hired'] == 11)].empty is True:
print('Your df doesn\'t contain your row')
else:
print('Your df contains your row')
Step 3: Check if multiple rows exists in another DataFrame
Let’s define a second DataFrame structure:
applications = pd.DataFrame(
{'area': ['Python', 'R', 'Power BI'],
'applicants' : [110, 120, 130],
})
A simple way to check whether rows exists is to merge the two DataFrames into a new one:
apps_vs_hired = pd.merge(applications,hired, on = ['area', 'applicants'], how = 'outer', indicator='exists_yes_no')
Let’s look at the new DataFrame:
apps_vs_hired.head()
area | applicants | hired | exists_yes_no | |
---|---|---|---|---|
0 | Python | 110 | 11.0 | both |
1 | R | 120 | NaN | left_only |
2 | Power BI | 130 | 13.0 | both |
3 | DAX | 100 | 10.0 | right_only |
The exists_yes_no column depicts whether all rows in both DataFrames exists in each other.
We can make the column values a bit more legible by mapping them to simpler to understand values using a Python dictionary:
exist_dict = {'left_only': 'Row exists in Applicants only',
'right_only': 'Row exists in Hired only',
'both': 'Row exists in both'}
apps_vs_hired['exists_yes_no'] = apps_vs_hired['exists_yes_no'].map(exist_dict)
We can now take a look at our DataFrame and clearly understand whether columns appear in any or both of the datasets:
area | applicants | hired | exists_yes_no | |
---|---|---|---|---|
0 | Python | 110 | 11.0 | Row exists in both |
1 | R | 120 | NaN | Row exists in Applicants only |
2 | Power BI | 130 | 13.0 | Row exists in both |
3 | DAX | 100 | 10.0 | Row exists in Hired only |