Subset R DataFrames by one or multiple column names
To subset an R DataFrame and keep some of its columns use the following code:
R-base:
hr_subset <- subset(hr, select = c(language, interviews))
dplyr package:
library(dplyr)
hr_subset <- select(hr, language, interviews)
Initialize DataFrame
To launch this tutorial, we will create a sample R DataFrame.
language <- c ('Java', 'Python', 'Javascript', 'C#', 'R', 'Julia')
month <- c ('September', 'September', 'October', 'December', 'October', 'September')
interviews <- c (163, 229, 131, 191, 131, 153)
hr <- data.frame (month = month, language = language, interviews = interviews)
Keep one or multiple columns with R base
Use the subset function (part of R-Base) to keep only the required columns:
hr_subset <- subset(hr, select = c(language, interviews))
This will return the following DataFrame:
language | interviews | |
---|---|---|
1 | Java | 163 |
2 | Python | 229 |
3 | Javascript | 131 |
4 | C# | 191 |
5 | R | 131 |
6 | Julia | 153 |
Subset DataFrame by column index
In order to keep specific column indexers, you can use the following R code:
hr_subset <- hr[, c (2,3) ]
Result will be the same subset we created before.
Filter Dataframe by column with dplyr
We can use dplyr, a very powerful data manipulation package for R, to subset our dataframe by column name:
library(dplyr)
hr_subset <- select(hr, language, interviews)
Remark: you will need to ensure that dplyr is available in your RStudio environment before using it. You can check if that is the case by running the following command:
find.package("dplyr")