To merge two or more R DataFrame columns into one, use one of the following options:
# merge columns using unite
import (tidyverse)
tydr::unite ( my_df, merge_col_name, c(col_1, col_2, col_n), sep'-')
# combine columns using r base
new_col <- paste(my_df$col_1, '-', my_df$col_2 ,'-', my_df$col_n)
# concatenate columns using stringr str_c
import (tidyverse)
new_col <- stringr::str_c(my_df$col_1, '-', my_df$col_2, '-', my_df$col_n)
Sample DataFrame
Before getting started, go ahead and import the tidyverse package, which delivers libraries we’ll be using such as tydr and stringr- but also other key libraries such as dplyr and ggplot2. We then create a simple DataFrame to use in the exercise.
library (tidyverse)
hire_quarter <- quarter (ymd('2022-11-29', '2022-11-30', '2023-01-02', '2023-01-03','2024-01-04'))
hire_year <- year (ymd('2022-11-29', '2022-11-30', '2023-01-02', '2023-01-03','2024-01-04'))
new_hires <- c (13, 18, 9, 10, 12)
hr <- data.frame ( quarter= hire_quarter, year = hire_year, employees = new_hires )
Let’s look into our DataFrame contents:
print(hr)
quarter | year | employees | |
---|---|---|---|
1 | 4 | 2022 | 13 |
2 | 4 | 2022 | 18 |
3 | 1 | 2023 | 9 |
4 | 1 | 2023 | 10 |
5 | 1 | 2024 | 12 |
Merge multiple variables into one
The tidyr package delivers the very handy unite package, that allows to combine and merge two or more columns of an exiting DataFrame to one. It also automatically removes the merged columns from the DataFrame an replaces them by the newly create column.
hr <- unite(hr, quarter_year ,c(quarter, year), sep='-')
This renders the following DataFrame – Note the new
quarter_year | employees | |
---|---|---|
1 | 4-2022 | 13 |
2 | 4-2022 | 18 |
3 | 1-2023 | 9 |
4 | 1-2023 | 10 |
5 | 1-2024 | 12 |
Common errors when using Unite:
- Can’t subset column that don’t exist: They root cause is that you might be running the unite function to unite multiple columns, but one or more of those are not in the DataFrame.
Combine two or more columns with r-base
You can use the paste function to concatenate two or more columns. Note that in this case the new column is added to the DataFrame but the original columns are not removed automatically:
hr$quarter_year <- paste (hr$quarter ,'-',hr$year)
To remove the original columns you can use the following command. This creates a new DataFrame with the required columns:
new_hr <- hr[c("quarter_year", "employees")]
We can check the column names of the new DataFrame using the names() function:
names (new_hr)
As expected, this will return two columns:
> names (new_hr) [1] "quarter_year" "employees"
Concatenate R columns with stringr
The third method we can use to combine column values is using the str_c function:
hr$quarter_year2 <- str_c (hr$quarter,'-', hr$year)
print(hr)
This method also creates a new column but doesn’t remove the merged variables (similar to the previous method we reviewed).
quarter | year | employees | quarter_year | |
---|---|---|---|---|
3 | 1 | 2023 | 9 | 1 – 2023 |
4 | 1 | 2023 | 10 | 1 – 2023 |
5 | 1 | 2024 | 12 | 1 – 2024 |
1 | 4 | 2022 | 13 | 4 – 2022 |
2 | 4 | 2022 | 18 | 4 – 2022 |