In this tutorial we would like to show how to quickly append a new column to a Pandas DataFrame. You might want to follow along the step by step process.
Example data to create DataFrame
We’ll first going to define a DataFrame made out of some random numbers that we have generated.
import pandas as pd
#define lists containing data
week = ['2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
'2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31']
salary = [127., 125., 105., 126., 131., 113., 110., 106.]
candidates = [40., 38., 49., 74., 31., 46., 64., 52.]
Let’s quickly create our Pandas DataFrame using the pd.DataFrame constructor as shown below:
hr_df = pd.DataFrame ({ 'week': week,
'salary': salary})
Adding a list or series as a new DataFrame column
We’ll show three methods for adding a Series as a new column to the DataFrame.
Method #1 : Assign a Series to the DataFrame
We’ll start by creating a Series out of our candidates list. We do that using the pd.Series constructor.
cand_s = pd.Series (candidates)
Now we’ll append the newly created series as a column into the DataFrame using the pd.assign() method.
new_hr = hr_df.assign(candidates = cand_s)
Note: if the DataFrame is longer than the list, the assignment will still work. Missing values will be populated with NaN values.
In that case you can use the series fillna() method to fill those missing values:
import numpy as np
new_hr['candidates'].fillna(np.mean, inplace=true)
Method #2 : Add a Python list to a DataFrame using Join
Here we first need to convert the list to a DataFrame, then join its content to the source DataFrame:
cand_df = pd.DataFrame (candidates)
new_hr_2 = hr_df.join(cand_df)
Method #3 : Append the list directly to the DataFrame
hr_df['candidates'] = candidates
Adding a column based on other column
We can easily derive column values based on other column values. In our example we’ll define a column named weekly salary.
hr_df['weekly_salary'] = hr_df['salary']/4
An alternative way is to use the pd.assign method:
hr_df.assign(weekly_salary = hr_df['salary']/4)
Append a Series as a DataFrame Row
Another use case to cover is adding a Series to an existing DataFrame.
#first we define a Series from a list
new_s = pd.Series(['2021-09-30', 137, 48], index = hr_df.columns)
# using the loc indexer, append the series to the end of the df
hr_df.loc[len(hr_df)] = new_s
Related: insert row at specific index in Pandas
In the previous section, we added a Series as the last row of our DataFrame. That said, what if we would like to insert it at an arbitrary position?
Let’s assume that we would like to add a new row between the 2nd and 3rd row of our DataFrame:
# insert a list at a location between the 2nd and 3rd row
hr_df.loc[1.5] = ['2021-09-30', 137, 48]
# sort the index and drop the previous index column
hr_df = hr_df.sort_index().reset_index(drop=True)