In today’s Data Analysis tutorial we’ll learn how to easily remove one or multiple rows from a Python DataFrame.
We’ll look into several cases in which we’ll use the pd.DataFrame.drop() method in order to remove irrelevant rows from our data:
- Drop row by index / row number.
- Drop multiple rows
- Delete row based on condition
- Delete row if its empty / null /nan
Creating test data set
We’ll start by importing the Pandas library into our Jupyter Notebook, and create some test data.
import pandas as pd city = ['New York', 'Boston', 'Atlanta','New York'] office = ['West', 'South', 'South', 'East'] interviews = [90,89,100,pd.NA] # create dictionary offices = dict(city=city, office=office, interviews=interviews) # create DataFrame from dictionary hr = pd.DataFrame (offices) hr.head()
Here’s our DataFrame:
Drop row by index
In the first case, we would like to pass relevant row index labels to the DataFrame.drop() method:
rows = 3 hr.drop(index=rows)
This will remove the last row from our DataFrame.
Deleting multiple rows by index
We can obviously get rid of multiple rows by passing a list of row labels:
rows = [2,3] hr.drop(index=rows)
If you want to permanently save the changes you have done to your DataFrame, simply use the inplace=True parameter.
You can also save your modified DataFrame as a new one, and persist changes you have made:
rows = [2,3] hr1 = hr.drop(index=rows)
Remove a Pandas DataFrame the first row
After importing our DataFrame data from an external file (such as csv, json and so forth) or a sql database, we might want to get rid of the header row. You can do that by tweaking your data import code or use something simple such as:
# drop first row hr.drop(index=0)
Drop rows based on conditions
Let’s now assume that we want to filter our specific rows out of our DataFrame based on conditions. In our case we’ll want to remove rows pertaining to offices which are not based in NYC
filt = hr[hr['city'] != 'New York'].index hr_new_york = hr.drop(index=filt)
We could have filtered the DataFrame more easily by using the brackets notation:
hr_new_york = hr[hr['city'] == 'New York']
Both commands will render the same result:
Delete rows with empty (nan values)
We have a complete tutorial on this topic which you might want to look at. Removing rows with null values in Pandas
We have several tutorials on deleting specific columns from your DataFrame: