In today’s quick data analysis tutorial we’ll learn how to remove the first or multiple few rows of a pandas DataFrame.
Remove first rows in pandas – Summary
Option 1: drop by row label
mydf.drop (labels = 'label_first_row', inplace= True)
Option 2: using the row index
mydf.drop (index = 0, inplace= True)
Option 3: using the iloc accessor
mydf.iloc[1:]
Delete the first row in pandas – Example
Let’s show some use cases in which you will need to drop some one or multiple rows.
Creating the DataFrame
We will start by importing the pandas library. If you have issues importing pandas, read on this tutorial. Next we’ll define a sample dataset.
import pandas as pd
language = ['R', 'Javascript', 'R', 'Python', 'R', 'Javascript']
area = ['Paris', 'Rio de Janeiro', 'Buenos Aires', 'New York', 'Buenos Aires', 'London']
salary = [149.0, 157.0, 117.0, 146.0, 130.0, 191.0]
emp = dict(area=area, language =language, salary = salary)
emp_df = pd.DataFrame(data = emp)
#visualize the dataframe first 5 rows
emp_df.head()
Here’s the output:
area | language | salary | |
---|---|---|---|
0 | Paris | R | 149.0 |
1 | Rio de Janeiro | Javascript | 157.0 |
2 | Buenos Aires | R | 117.0 |
3 | New York | Python | 146.0 |
4 | Buenos Aires | R | 130.0 |
Drop first row from the DataFrame
We’ll can use the DataFrame drop() method here (use same method to select and drop any single record) :
emp_df.drop(index=0)
Note: make sure to add the inplace = True parameter to ensure your changes are persisted.
emp_df.drop(index=0, inplace = True)
Alternatively:
emp_df.iloc[1:]
When using iloc you can persist a change by assigning to a new DataFrame:
emp_df_2 = emp_df.iloc[1:]
Remove first two rows
Using the iloc accessor:
emp_df.iloc[2:]
Here’s our output:
area | language | salary | |
---|---|---|---|
2 | Buenos Aires | R | 117.0 |
3 | New York | Python | 146.0 |
4 | Buenos Aires | R | 130.0 |
5 | London | Javascript | 191.0 |
Delete multiple rows off your DataFrame
Removing a few rows is quite easy as well. We saw already how to use the iloc accesor for that. We can also select the rows to drop by passing a list of indexes.
emp_df.drop (index = [0,2,3])
# or alternatively
rows = [0,2,3]
emp_df.drop (index = rows)
The relevant rows are removed as can be seen below
area | language | salary | |
---|---|---|---|
1 | Rio de Janeiro | Javascript | 157.0 |
4 | Buenos Aires | R | 130.0 |
5 | London | Javascript | 191.0 |
Remove first duplicated row
This is a tangent question from a reader. In order to get rid of your first duplicated row. By default when using removing duplicates, the first occurrence is kept. THe trick is to use the keep = ‘last’ parameter. Note that using keep=False deletes all duplicated records.
emp_df.drop_duplicates(keep= 'last')