How to select rows by index in an R dataframe?

Filter rows by index in R dataframes

You can subset one or multiple rows from an R DataFrame using the following R syntax

#subset a single row using R base
your_df[row_number,]

# choose multiple rows using  R base
your_df[row_number_vector,]

# subset rows using dplyr
library (dplyr)
subset <- slice (your_df, row_number_vector)

We’ll start by creating the following example DataFrame that you can use to follow along:

# create vectors
month <- c ('October', 'December', 'October', 'June', 'June', 'December')
office <- c ('Hong Kong', 'Toronto', 'Hong Kong', 'Buenos Aires', 'Toronto', 'Los Angeles')
salary <- c (134.0, 99.0, 234.0, 134.0, 86.0, 186.0)

# initialize DataFrame
hrdf <- data.frame (month = month, office = office, salary = salary)

This will return the following R DataFrame:

monthofficesalary
1OctoberHong Kong134
2DecemberToronto99
3OctoberHong Kong234
4JuneBuenos Aires134
5JuneToronto86
6DecemberLos Angeles186

Subset a specific single row by index

To choose a single row by index we simply pass the row number using the brackets notation. The following snippet select the second row of our DataFrame. Note that in R the index count starts at 1, unlike in Python / Pandas.

second_row <- hrdf[2,]
2DecemberToronto99

Note: The statement hrdf[2] will return the second column of your DataFrame.

Choose multiple rows by index

To select multiple rows we’ll pass a vector containing the row order:

row_vector <- c(2,3,4)
hrdf[row_vector,]

This will return the second, third and fourth rows.

We could have achieved the same output by passing a range of rows:

hrdf [2:4,]

Filter R DataFrame rows with dplyr

Whenever possible i try to use the dplyr library, which simplifies data wrangling in R. In this case, we’ll start by importing dplyr (requires to install it first in your R development environment). We then use the dplyr slice function and pass the DataFrame and row vector (in this case, we passed the second and fifth rows).

library(dplyr)  #1
subset <- slice(hrdf, c(2,5))
print (subset)

This will return the following rows:

1DecemberToronto99
2JuneToronto86

Select the last row by index

We can use the nrow DataFrame function to find the number of DataFrame rows, we can then use to subset the last row of the DataFrame:

hrdf[nrow(hrdf),]

For completeness, you can do the same using the tail() function:

tail(hrdf,1)