In today’s quick data analysis tutorial we’ll learn how to remove the first or multiple few rows of a pandas DataFrame.
Remove first rows in pandas – Summary
Option 1: drop by row label
mydf.drop (labels = 'label_first_row', inplace= True)
Option 2: using the row index
mydf.drop (index = 0, inplace= True)
Option 3: using the iloc accessor
mydf.iloc[1:]
Delete the first row in pandas – Example
Let’s show some use cases in which you will need to drop some one or multiple rows.
Creating the DataFrame
We will start by importing the pandas library. Next we’ll define a sample dataset.
import pandas as pd
language = ['R', 'Javascript', 'R', 'Python', 'R', 'Javascript']
area = ['Paris', 'Rio de Janeiro', 'Buenos Aires', 'New York', 'Buenos Aires', 'London']
salary = [149.0, 157.0, 117.0, 146.0, 130.0, 191.0]
emp = dict(area=area, language =language, salary = salary)
emp_df = pd.DataFrame(data = emp)
#visualize the dataframe first 5 rows
emp_df.head()
Here’s the output:
area | language | salary | |
---|---|---|---|
0 | Paris | R | 149.0 |
1 | Rio de Janeiro | Javascript | 157.0 |
2 | Buenos Aires | R | 117.0 |
3 | New York | Python | 146.0 |
4 | Buenos Aires | R | 130.0 |
Drop first row from the DataFrame
We’ll can use the DataFrame drop() method here (use same method to select and drop any single record) :
emp_df.drop(index=0)
Note: make sure to add the inplace = True parameter to ensure your changes are persisted.
emp_df.drop(index=0, inplace = True)
Alternatively:
emp_df.iloc[1:]
When using iloc you can persist a change by assigning to a new DataFrame:
emp_df_2 = emp_df.iloc[1:]
Remove first two rows
Using the iloc accessor:
emp_df.iloc[2:]
Here’s our output:
area | language | salary | |
---|---|---|---|
2 | Buenos Aires | R | 117.0 |
3 | New York | Python | 146.0 |
4 | Buenos Aires | R | 130.0 |
5 | London | Javascript | 191.0 |
Delete multiple rows off your DataFrame
Removing a few rows is quite easy as well. We saw already how to use the iloc accesor for that.
emp_df.drop (index = [0,2,3])
# or alternatively
rows = [0,2,3]
emp_df.drop (index = rows)
The relevant rows are removed as can be seen below
area | language | salary | |
---|---|---|---|
1 | Rio de Janeiro | Javascript | 157.0 |
4 | Buenos Aires | R | 130.0 |
5 | London | Javascript | 191.0 |
Remove first duplicated row
This is a tangent question from a reader. In order to get rid of your first duplicated row. By default when using removing duplicates, the first occurrence is kept. THe trick is to use the keep = ‘last’ parameter. Note that using keep=False deletes all duplicated records.
emp_df.drop_duplicates(keep= 'last')
Get and Write Pandas DataFrame first rows
Next in this tutorial we’ll quickly find out how to extract the first row of a Pandas DataFrame to a list.
Create our example DataFrame
We will get started by quickly importing the Pandas library and creating a simple DataFrame that we can use for this example.
import pandas as pd
# Define data using lists
month = ['June', 'November', 'December', ]
language = ['R', 'Swift', 'Ruby', ]
first_interview = [71, 74, 76]
second_interview = [68, 53, 56]
#Constructing the DataFrame
hr_data = dict(language=language, interview_1=first_interview, interview_2=second_interview)
hr_df = pd.DataFrame(data=hr_data, index=month)
Get the first row of a Pandas DataFrame
To look into the first row of our data we’ll use the head function:
hr_df.head(1)
language | interview_1 | interview_2 | |
---|---|---|---|
June | R | 71 | 68 |
Exporting the first DataFrame row as list
Several options here, we’ll focus on using the iloc and loc indexers.
Using iloc to fetch the first row by integer location (in this case 0):
first_rec = hr_df.iloc[0]
Using loc to select the first row using its label:
first_rec = hr_df.loc['June']
In both cases the first row values will be retrieved into a Pandas Series. We can then using the to_list() method to export the Series to a list:
first_rec.to_list()
And our result will be:
['R', 71, 68]
Exporting the first record to an array
We can use the to_numpy function in order to retrieve the row values to a Numpy array:
first_rec.to_numpy()
#This will result in
array(['R', 71, 68], dtype=object)
Get first DataFrame column to a list
For completeness, i have added a simple snippet that uses the iloc indexer to export the first column (location = 0) to a list.
hr_df.iloc[:,0].to_list()