How to drop the first rows from your pandas dataframe?

In today’s quick data analysis tutorial we’ll learn how to remove the first or multiple few rows of a pandas DataFrame.

Remove first rows in pandas

Option 1: drop by row label

mydf.drop (labels = 'label_first_row', inplace= True)

Option 2: using the row index


mydf.drop (index = 0, inplace= True)

Option 3: using the iloc accessor

mydf.iloc[1:]

Delete the first row in pandas – Example

Let’s show some use cases in which you will need to drop some one or multiple rows.

Creating the DataFrame

We will start by importing the pandas library. If you have issues importing pandas, read on this tutorial. Next we’ll define a sample dataset.

import pandas as pd

language = ['R', 'Javascript', 'R', 'Python', 'R', 'Javascript']
area = ['Paris', 'Rio de Janeiro', 'Buenos Aires', 'New York', 'Buenos Aires', 'London']
salary = [149.0, 157.0, 117.0, 146.0, 130.0, 191.0]
emp = dict(area=area, language  =language, salary = salary)

emp_df = pd.DataFrame(data = emp)

#visualize the dataframe first 5 rows
emp_df.head()

Here’s the output:

arealanguagesalary
0ParisR149.0
1Rio de JaneiroJavascript157.0
2Buenos AiresR117.0
3New YorkPython146.0
4Buenos AiresR130.0

Drop first row from the DataFrame

We’ll can use the DataFrame drop() method here (use same method to select and drop any single record) :

emp_df.drop(index=0)

Note: make sure to add the inplace = True parameter to ensure your changes are persisted.


emp_df.drop(index=0, inplace = True)

Alternatively:

emp_df.iloc[1:]

When using iloc you can persist a change by assigning to a new DataFrame:

emp_df_2 = emp_df.iloc[1:]

Remove first two rows

Using the iloc accessor:

emp_df.iloc[2:]

Here’s our output:


area
languagesalary
2Buenos AiresR117.0
3New YorkPython146.0
4Buenos AiresR130.0
5LondonJavascript191.0

Delete multiple rows off your DataFrame

Removing a few rows is quite easy as well. We saw already how to use the iloc accesor for that. We can also select the rows to drop by passing a list of indexes.

 emp_df.drop (index = [0,2,3])

# or alternatively

rows = [0,2,3]
emp_df.drop (index = rows)

The relevant rows are removed as can be seen below

arealanguagesalary
1Rio de JaneiroJavascript157.0
4Buenos AiresR130.0
5LondonJavascript191.0

Remove first duplicated row

This is a tangent question from a reader. In order to get rid of your first duplicated row. By default when using removing duplicates, the first occurrence is kept. THe trick is to use the keep = ‘last’ parameter. Note that using keep=False deletes all duplicated records.

emp_df.drop_duplicates(keep= 'last')

Suggested Learning