How to keep specific rows in an R DataFrame?

In today’s R Data Analysis tutorial we will learn how to subset and filter one or multiple rows in an R DataFrame , then keep those columns as a new DataFrame for further analysis and visualization.

In this tutorial we will mostly use the dplyr package that allow for easier manipulation of R DataFrames.

Initialize R DataFrame

We will start by creating a very simple DataFrame:

#R
area <- c ("North", "South", "West", "East")
indirect <- c(275, 218, 217, 226)
direct <- c(353, 350, 326, 368)
online <- c(150, 186, 132, 136)
revenue <- data.frame (area = area, indirect = indirect, direct = direct, online = online)

After running this R script command in RStudio or other R development environment, we get the following DataFrame:

areaindirectdirectonline
1North275353150
2South218350186
3West217326132
4East226368136

Select by index with base r

We can use the script below to keep specific rows by row position. Using base R, we first define a row vector and then use that vector to subset our DataFrame.

# select by index with base r
selected_rows <- c(2,3,4)
subset <- revenue[selected_rows,]
print (subset)

The result is a DataFrame:

2South218350186
3West217326132
4East226368136

Subset rows by position with dplyr

Using the dplyr library and specifically the slice function we can easily extract specific single or multiple rows at specific positions:

library(dplyr)
selected_rows <- c(2,3,4)
subset <- slice(revenue, selected_rows)
print (subset)

Note: make sure to install the dplyr package before calling it. Failing to do so will raise the following exception:

error in library(dplyr) : there is no package called ‘dplyr’

Extract rows containing certain column values

We can use the filter function delivered by dplyr to keep rows that meet a certain value criteria:

library(dplyr)
subset <- filter (revenue, online >= 150)
print (subset)

You’ll get the following DataFrame:

1North275353150
2South218350186

Subset data by multiple conditions

In the same fashion we can define more complex conditions to define our row selection:

library(dplyr)
subset <- filter (revenue, online < 150 & direct >=350)
print (subset)

This returns a single row DataFrame:

1East226368136

Keep random rows

In our last example, we’ll use the random function to select an arbitrary number of rows from our Data. In our case, we’ll choose to select two rows.

library(dplyr)
random_subset <- sample_n ( revenue, 2, replace=TRUE)
print(random_subset)