Filter rows by index in R dataframes
You can subset one or multiple rows from an R DataFrame using the following R syntax
#subset a single row using R base
your_df[row_number,]
# choose multiple rows using R base
your_df[row_number_vector,]
# subset rows using dplyr
library (dplyr)
subset <- slice (your_df, row_number_vector)
We’ll start by creating the following example DataFrame that you can use to follow along:
# create vectors
month <- c ('October', 'December', 'October', 'June', 'June', 'December')
office <- c ('Hong Kong', 'Toronto', 'Hong Kong', 'Buenos Aires', 'Toronto', 'Los Angeles')
salary <- c (134.0, 99.0, 234.0, 134.0, 86.0, 186.0)
# initialize DataFrame
hrdf <- data.frame (month = month, office = office, salary = salary)
This will return the following R DataFrame:
month | office | salary | |
---|---|---|---|
1 | October | Hong Kong | 134 |
2 | December | Toronto | 99 |
3 | October | Hong Kong | 234 |
4 | June | Buenos Aires | 134 |
5 | June | Toronto | 86 |
6 | December | Los Angeles | 186 |
Subset a specific single row by index
To choose a single row by index we simply pass the row number using the brackets notation. The following snippet select the second row of our DataFrame. Note that in R the index count starts at 1, unlike in Python / Pandas.
second_row <- hrdf[2,]
2 | December | Toronto | 99 |
Note: The statement hrdf[2] will return the second column of your DataFrame.
Choose multiple rows by index
To select multiple rows we’ll pass a vector containing the row order:
row_vector <- c(2,3,4)
hrdf[row_vector,]
This will return the second, third and fourth rows.
We could have achieved the same output by passing a range of rows:
hrdf [2:4,]
Filter R DataFrame rows with dplyr
Whenever possible i try to use the dplyr library, which simplifies data wrangling in R. In this case, we’ll start by importing dplyr (requires to install it first in your R development environment). We then use the dplyr slice function and pass the DataFrame and row vector (in this case, we passed the second and fifth rows).
library(dplyr) #1
subset <- slice(hrdf, c(2,5))
print (subset)
This will return the following rows:
1 | December | Toronto | 99 |
2 | June | Toronto | 86 |
Select the last row by index
We can use the nrow DataFrame function to find the number of DataFrame rows, we can then use to subset the last row of the DataFrame:
hrdf[nrow(hrdf),]
For completeness, you can do the same using the tail() function:
tail(hrdf,1)