In this tutorial we’ll learn how to remove the one or several last rows of a DataFrame.
We’ll be touching on several cases:
- Getting the last (or last n) rows in a DataFrame.
- Removing the last (or last n) rows from the DataFrame.
- Dropping all rows except the first row
- Drop the last column
Example data
We’ll start by defining a simple DataFrame that you can use in order to follow along with this exercise.
import pandas as pd
month = ['March', 'March', 'March', 'April', 'April', 'March']
language = ['Java', 'Javascript', 'Javascript', 'R', 'R', 'Javascript']
salary = [138.0, 138.0, 108.0, 109.0, 109.0, 127.0]
salaries = dict(month=month, language=language, salary = salary)
salary_df = pd.DataFrame(data=salaries)
salary_df
Here’s our small DataFrame
month | language | salary | |
---|---|---|---|
0 | March | Java | 138.0 |
1 | March | Javascript | 138.0 |
2 | March | Javascript | 108.0 |
3 | April | R | 109.0 |
4 | April | R | 109.0 |
5 | March | Javascript | 127.0 |
Get the last row of a Pandas DataFrame
We are well familiar with the head() DataFrame method, that allows to fetch the first rows of a DataFrame. Conversely, we also have the tail() method, the allows to retrieve the last:
salary_df.tail(1)
Will retrieve the last row:
month | language | salary | |
---|---|---|---|
5 | March | Javascript | 127.0 |
Note that we can retrieve more rows from the DataFrame tail. In this example – the last 3 rows.
n=3
salary_df.tail(n)
Drop the last row from the DataFrame
We can now use the drop() function to easily remove the last row from our DataFrame
last_row = salary_df.tail(1).index
salary_df.drop (last_row, inplace=True)
The inplace=True persist the changes we have done in the original DataFrame. If you are not interested in modifying your DataFrame, you can simply assign the change data into a new DataFrame:
new_df = salary_df.drop (last_row)
Drop the last n rows
In a similar fashion:
n=3
last_n_rows = salary_df.tail(n).index
salary_df.drop (last_n_rows, inplace=True)
Removing all rows except the first
We can easily drop all DataFrame rows, but leave the first:
all_rows_except_first = salary_df.tail(len(salary_df)-1).index
salary_df.drop (all_rows_except_first)
Here’s our result:
month | language | salary | |
---|---|---|---|
0 | March | Java | 138.0 |
Removing the last column off your DataFrame
So far, we dealt with rows, but using a similar technique we can also get rid of specific columns.
#find the last element in the column index
last_col = salary_df.columns[-1]
new_df = salary_df.drop(cols, axis=1)
Note the usage of axis=1, to determine that we are interested in removing a column and not a row index.