You can delete the first row of a Pandas DataFrame by using the following syntax in your RStudio script or Jupyter Notebook:
# using r-base
my_df_subset <- my_df[-1,]
# using tidyverse
library(dplyr)
my_df_subset <- my_df %>% slice (-1)
Read on for some practical examples.
Creating a sample R DataFrame
office <- c ('Bangalore', 'New York', 'Bangalore', 'Osaka', 'Toronto', 'New York')
language <- c ('Javascript', 'Python', 'Javascript', 'Java', 'Java', 'Java')
salary <- c (160.0, 170.0, 173.0, 197.0, 171.0, 204.0)
hr <- data.frame (language = language, office = office, salary = salary)
We can visualize the DataFrame rows using the print function:
print(hr)
This will render the following 6 lines:
language | office | salary | |
---|---|---|---|
1 | Javascript | Bangalore | 160 |
2 | Python | New York | 170 |
3 | Javascript | Bangalore | 173 |
4 | Java | Osaka | 197 |
5 | Java | Toronto | 171 |
6 | Java | New York | 204 |
Delete first row in R Dataframe using dplyr
Dplyr has the very handy slice function which allows us to filter DataFrame rows according to their position index.
library(dplyr)
hr_subset <- hr %>% slice(-(1))
Drop first n rows with dplyr
In a similar fashion we can also remove two or more rows. Let’s assume for example that we would like to filter our the first 3 rows in our R DataFrame:
rows_to_remove = 3
library(dplyr)
hr_subset <- hr %>% slice((rows_to_remove+1 : nrow(hr)))
Explanation: We pass a range of row indexes to the slice function. The function nrow() returns the number of rows in the DataFrame, which in our case equals to six. So effectively we are passing the following:
hr_subset <- hr %>% slice(4 : 6)
print(hr_subset)
Which returns the last 3 rows of our DataFrame:
1 | Java | Osaka | 197 |
2 | Java | Toronto | 171 |
3 | Java | New York | 204 |
Remove first DataFrame column/s with R base
Removing the first row is simple also with r-base:
hr_subset <- hr[-1,]
Removing first N rows:
rows_to_remove = 3
hr_subset <- hr[seq(rows_to_remove+1:nrow(hr)),]
Note: Failing to add the trailing comma when sub setting rows from our DataFrame will result in the following error:
undefined columns selected
Remove DataFrame specific rows
Let’s assume that we would like to remove only the second and third rows:
hr_subset <- hr[-c(2,3),]
This will return a DataFrame subset made of four rows.
Dropping the last row
If you are interested to subset the last row out you can use the following syntax:
Using dplyr:
library(dplyr)
hr_subset <- hr %>% slice(-(nrow(hr)))
Using r-base:
hr_subset <- hr[-(nrow(hr)),]