How to add one or multiple columns from one Pandas dataframe to another?

In this tutorial we’ll learn how to use Python to join columns from multiple Pandas DataFrame objects. We’ll look into several cases:

  1. Join columns of similar length
  2. Add columns based on index
  3. Join DataFrame columns based on conditions
  4. Add columns with different length

Creating Example DataFrames

As we always do, we’ll start by importing Pandas and creating some data that can help you follow along this example. Here’s how to troubleshoot your Pandas install in case the library can’t be imported.

import pandas as pd

#define DataFrames

sales = pd.DataFrame({'area': ['North', 'East', 'South', 'West'], 'sales': [250,350,500,400]})
quotas = pd.DataFrame({'area': ['North', 'East', 'South', 'West'], 'target': [350,200,550,400], 'num_sales' : [10,15,13,14]})

targets = quotas['target']

Append a single column from one DataFrame to other

We can use the join() DataFrame method to add columns from different DataFrames. We chose to persist the resulting data as a new DataFrame.

sales_vs_targets = sales.join(targets, how='left')

sales_vs_targets.head()

Note that a left outer join is performed by default

Here’s our data:

areasalestarget
0North250350
1East350200
2South500550
3West400400

Adding columns based on specific index

In the following case, instead of using the default DataFrame index, we’ll join based on a different index column.

sales_vs_targets = sales.set_index('area').join(quotas.set_index('area')['target'])

The result is slightly different:

salestarget
area
North250350
East350200
South500550
West400400

Add column based on condition

A more interesting case is that you might need to add columns based on a specific condition. Let’s assume that you want to focus only on areas with high sales targets.

#filter dataframe according to condition
filt = quotas['target'] > 350
quotas_subset = quotas[filt]

#join the dataframes
sales_vs_high_targets = sales.set_index('area').join(quotas_subset.set_index('area')['target'], how='right')

The result will be:

salestarget
area
South500550
West400400

Note: the right outer join helped to ensure that only the filtered rows in the quotes dataframe will be joined. In general tweaking the join strategy using the how parameter will help us to deal with dataframes having different length.

Adding multiple columns

In the same fashion we can add multiple DataFrame columns. Instead of passing a single column, we’ll pass a list.

sales_vs_targets_multiple = sales.set_index('area').join(quotas.set_index('area')[['target', 'num_sales']])
salestargetnum_sales
area
North25035010
East35020015
South50055013
West40040014

DataFrames with different length

In reality, you will most probably be joining complex DataFrames having different shapes.

Let’s see a very simple example:


quotas_short = pd.DataFrame({'area': ['North', 'East', 'South'], 'target': [350,200,550]})

sales.set_index('area').join(quotas_short.set_index('area'))

Will render the following dataframe containing empty values.

salestarget
area
North250350.0
East350200.0
South500550.0
West400NaN

Whereas specifying a right outer join will show only rows with non empty values.

sales.set_index('area').join(quotas_short.set_index('area'), how='right')

sales
target
area
North250350
East350200
South500550