How to keep specific columns by name in R DataFrames?

Subset R DataFrames by one or multiple column names

To subset an R DataFrame and keep some of its columns use the following code:

R-base:

hr_subset <- subset(hr, select = c(language, interviews))

dplyr package:

library(dplyr)
hr_subset <- select(hr, language, interviews)

Initialize DataFrame

To launch this tutorial, we will create a sample R DataFrame.

language <- c ('Java', 'Python', 'Javascript', 'C#', 'R', 'Julia')
month <- c ('September', 'September', 'October', 'December', 'October', 'September')
interviews <- c (163, 229, 131, 191, 131, 153)
hr <- data.frame (month = month, language = language, interviews = interviews)

Keep one or multiple columns with R base

Use the subset function (part of R-Base) to keep only the required columns:

hr_subset <- subset(hr, select = c(language, interviews))

This will return the following DataFrame:

languageinterviews
1Java163
2Python229
3Javascript131
4C#191
5R131
6Julia153

Subset DataFrame by column index

In order to keep specific column indexers, you can use the following R code:

hr_subset <- hr[, c (2,3) ]

Result will be the same subset we created before.

Filter Dataframe by column with dplyr

We can use dplyr, a very powerful data manipulation package for R, to subset our dataframe by column name:

library(dplyr)
hr_subset <- select(hr, language, interviews)

Remark: you will need to ensure that dplyr is available in your RStudio environment before using it. You can check if that is the case by running the following command:

find.package("dplyr")

Related learning

How to remove the last column from an R DataFrame?