In today’s tutorial we’ll learn how to concatenate two or more DataFrames having the same columns in Pandas. This will come very handy in cases that you receive several comma separated value (CSV) files or Excel spreadsheets that you need to merge together before starting your Data Analysis process.
Creating example data
Let’s start by creating several DataFrames that we would like to append to each other.
import pandas as pd
europe_cand = {'area': ['Python', 'R', 'Python'],
'city': ['London', 'Paris', 'Madrid'],
'candidates': [24, 65, 82]}
usa_cand = {'area': ['R', 'Python', 'R'],
'city': ['NYC', 'LA', 'Seattle'],
'candidates': [72, 110, 123]}
asia_cand = {'area': ['R', 'Python', 'R'],
'city': ['Delhi', 'Tokyo', 'Seoul'],
'candidates': [72, 110, 123],
'salary': [100, 150, 125]}
# create DataFrame from dictionaries
europe_df = pd.DataFrame (europe_cand)
usa_df = pd.DataFrame (usa_cand)
asia_df = pd.DataFrame (asia_cand)
cand_df_lst = [europe_df, usa_df, asia_df]
Concatenate multiple DataFrames
In the first example we would like to use the pd.concat() function to append two or more DataFrames. Let’s see how we make it with Python:
cand = pd.concat([europe_df, usa_df], ignore_index=True)
cand
Note: Watch out for TypeError messages if passing a list of lists or dictionary object to pd.concat. You can pass a list of DataFrames a shown above.
TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid
Here’s the resulting DataFrame values:
area | city | candidates | |
---|---|---|---|
0 | Python | London | 24 |
1 | R | Paris | 65 |
2 | Python | Madrid | 82 |
3 | R | NYC | 72 |
4 | Python | LA | 110 |
5 | R | Seattle | 123 |
Note: The pd.concat function receives an iterable, in our case a list of two DataFrames and simply brings them together. Using this method is specially useful if both DataFrames have the same columns.
We could have reached a similar result if using the append DataFrame method:
cand = europe_df.append(usa_df, ignore_index=True)
Append DataFrames using a for loop
A more interesting example is when we would like to concatenate DataFrame that have different columns. In this example we’ll loop into a list of DataFrames and append their content to an empty DataFrame. Let’s take a look:
cand =pd.DataFrame()
for df in cand_df_lst:
cand = cand.append(df, ignore_index=True).fillna('')
Note that we replaced NAN values using the fillna() DataFrame method.
We could have accomplish the same with pd.concat:
cand1 = pd.concat(cand_df_lst, ignore_index=True).fillna('')