In this quick tutorial we will go ahead and learn how to combine multiple Pandas columns. We will cover the following use cases, which you will typically encounter in your data preparation process.
- Concatenate multiple columns into one
- Merge several columns into a list
- Merge columns if one of them has NAN values.
Example Data
As we typically do, we will start by importing the Pandas library and creating data:
import pandas as pd
name = ['Dave', 'John', 'Matt','Horace']
family = ['Jordan', 'Jones', 'Temp', 'Joyce']
count = [12,15,14,18]
perf_dict = dict(name=name, family=family, count=count)
sales = pd.DataFrame (perf_dict)
print (sales.head())
name | family | count | |
---|---|---|---|
0 | Dave | Jordan | 12 |
1 | John | Jones | 15 |
2 | Matt | Temp | 14 |
3 | Horace | Joyce | 18 |
Merge multiple Pandas columns into one
We can easily combine two text / string columns into one int two ways:
# Method 1
sales['full_name'] = sales['family']+' , '+ sales['name']
Alternatively, we can use the str.cat function:
sales['full_name'] = sales['family'].str.cat(sales['name'], sep = ' , ')
Note that we have defined commas (,) as the separator between the two objects.
Merge two dataframe columns into a list
In this case, we would like to concatenate the column values, and export them to a list. In this case, we’ll first create the new column, and then go ahead and call the to_list() Series method.
sales['full_name'] = sales['family']+' , '+ sales['name']
sales['full_name'].to_list()
The result will be:
['Jordan , Dave', 'Jones , John', 'Temp , Matt', 'Joyce , Horace']
Combine two column if one is null
When trying to concatenate two values which one of them in null, the result will be a null / NAN / NA value.
Therefore before merging two columns we should first handle the null values using the fillna() DataFrame or Series method.
# Handling NAN values
sales.fillna(' ', inplace=True)
#combining the two columns
sales['full_name'] = sales['family']+' , '+ sales['name']
Rename the new column in Pandas
Once we create the new column, we can easily rename it:
sales.rename(columns= {'full_name': 'last_first'})