How to merge multiple DataFrame columns into one in R?

To merge two or more R DataFrame columns into one, use one of the following options:

# merge columns using unite
import (tidyverse)
tydr::unite ( my_df, merge_col_name, c(col_1, col_2, col_n), sep'-')

# combine columns using r base
new_col <- paste(my_df$col_1, '-', my_df$col_2 ,'-', my_df$col_n)

# concatenate columns using stringr str_c
import (tidyverse)
new_col <- stringr::str_c(my_df$col_1, '-', my_df$col_2, '-', my_df$col_n)

Sample DataFrame

Before getting started, go ahead and import the tidyverse package, which delivers libraries we’ll be using such as tydr and stringr- but also other key libraries such as dplyr and ggplot2. We then create a simple DataFrame to use in the exercise.

library (tidyverse)

hire_quarter <- quarter (ymd('2022-11-29', '2022-11-30', '2023-01-02', '2023-01-03','2024-01-04'))
hire_year <- year (ymd('2022-11-29', '2022-11-30', '2023-01-02', '2023-01-03','2024-01-04'))

new_hires <- c (13, 18, 9, 10, 12)
hr <- data.frame ( quarter= hire_quarter, year = hire_year, employees = new_hires )

Let’s look into our DataFrame contents:

print(hr)
quarteryearemployees
14202213
24202218
3120239
41202310
51202412

Merge multiple variables into one

The tidyr package delivers the very handy unite package, that allows to combine and merge two or more columns of an exiting DataFrame to one. It also automatically removes the merged columns from the DataFrame an replaces them by the newly create column.

hr <- unite(hr, quarter_year ,c(quarter, year), sep='-')

This renders the following DataFrame – Note the new

quarter_yearemployees
14-202213
24-202218
31-20239
41-202310
51-202412

Common errors when using Unite:

    • Can’t subset column that don’t exist: They root cause is that you might be running the unite function to unite multiple columns, but one or more of those are not in the DataFrame.

    Combine two or more columns with r-base

    You can use the paste function to concatenate two or more columns. Note that in this case the new column is added to the DataFrame but the original columns are not removed automatically:

    hr$quarter_year <- paste (hr$quarter ,'-',hr$year)

    To remove the original columns you can use the following command. This creates a new DataFrame with the required columns:

    new_hr <- hr[c("quarter_year", "employees")]

    We can check the column names of the new DataFrame using the names() function:

    names (new_hr)

    As expected, this will return two columns:

    > names (new_hr)
    [1] "quarter_year" "employees" 

    Concatenate R columns with stringr

    The third method we can use to combine column values is using the str_c function:

    hr$quarter_year2 <- str_c (hr$quarter,'-', hr$year)
    print(hr)

    This method also creates a new column but doesn’t remove the merged variables (similar to the previous method we reviewed).

    quarteryearemployeesquarter_year
    31202391 – 2023
    412023101 – 2023
    512024121 – 2024
    142022134 – 2022
    242022184 – 2022