If you often work with datasets in Excel, i am sure that you are familiar with cases in which you need to concatenate values from multiple columns into a new column.
Concatenating values is also very common as part of our Data Wrangling workflow. In this tutorial we’ll learn how to combine two o more columns for further analysis.
Example DataFrame
cand_dict = {'city': ['New York', 'Boston', 'Boston','New York'],
'office': [1,2,1,2],
'num_candidates': [90,92,88, 78]}
candidates = pd.DataFrame(cand_dict)
Concatenate two columns with a separating string
A common use case is to combine two column values and concatenate them using a separator.
#concatenate two columns values
candidates['city-office'] = candidates['city']+'-'+candidates['office'].astype(str)
candidates.head()
Here’s our result:
Important Note: Before joining the columns, make sure to cast numerical values to string with the astype() method, as otherwise Pandas will throw an exception similar to the one below:
# error: can only concatenate str (not "int") to str
Combining two columns using the Series.cat() method
An alternative method to accomplish the same result as above is to use the Series.cat() method as shown below:
# concatenate two series using the cat method
candidates['city-office-cat'] = candidates['city'].str.cat(candidates['office'].astype(str), sep = '-')
Note: Also here, before merging the two columns, we converted the Series into a string as well as defined the separator using sep parameter.
Combine column values based in a condition
In this case, we’ll choose to combine only specific values. In our case, we’ll concatenate only values pertaining to the New York city offices:
# based on condition
candidates['city-office-condition'] = candidates[candidates['city'] =='New York']['city'].str.cat(candidates['office'].astype(str), sep = '-')
Here’s our DataFrame:
Concatenating columns into a list
If we want to export the combined values into a list, we can use the to_list() method as shown below:
candidates_list = candidates['city-office-cat'].to_list()
# alternatively:
candidates['city-office'] = candidates['city']+'-'+candidates['office'].astype(str).to_list()