How to add columns to a Python Pandas DataFrame ?

In a recent tutorial, we learn how to add rows to existing DataFrames in Pandas. Today, we’ll work with DF columns.

In this post, we would like to double click on several use cases that are foundational when wrangling tabular data with Pandas:

Adding columns into Python DataFrames

  1. New columns based on other columns
  2. Adding columns with default / constant / same value (could be a column of zeros).
  3. Inserting a column based on values in another DataFrame
  4. Adding new column to col index
  5. Create a Series from a column
  6. Export a column to a csv file

Create Data

# Import libraries

import pandas as pd
import numpy as np
np.random.seed(100)

# Create data frame, with specific column names

dates =pd.date_range(start="20210101",periods=5)
scores = pd.DataFrame(data = np.random.randint(60,100, size = (5,2)), columns = ['score_1', 'score_2'], index =dates )
display(scores)

Here’s our DataFrame:

score_1score_2
2021-01-016884
2021-01-026399
2021-01-038375
2021-01-047090
2021-01-059462

2. Column based on other column values

Simple example that showcases a calculated column:

# based on other columns
scores['dif'] = scores['score_2']-scores['score_1']

Here’s our output:

score_1score_2dif
2021-01-01688416
2021-01-02639936
2021-01-038375-8
2021-01-04709020
2021-01-059462-32

We could as well use a lambda function:

scores['dif2'] = scores.apply (lambda x:scores['score_2']-scores['score_1']) 

3. Column with default values

Let’s assume we want to assign all our students to the Engineering department:

# with constant value
scores['area'] = 'Engineering'
scores
score_1score_2difarea
2021-01-01688416Engineering
2021-01-02639936Engineering
2021-01-038375-8Engineering
2021-01-04709020Engineering
2021-01-059462-32Engineering

If we want to insert a column of zeros:

scores['zeros'] = 0

4. Column with values from other DataFrame

Let’s start by defining a new DataFrame from a list:

# from other dataframe
cities = pd.DataFrame(['NYC', 'FRA', 'SFO', 'MUC', 'SYD'], columns=['city'], index=dates)

We can then assign as following:

scores['city'] = cities['city']
scores
score_1score_2difcity
2021-01-01688416NYC
2021-01-02639936FRA
2021-01-038375-8SFO
2021-01-04709020MUC
2021-01-059462-32SYD

Or alternatively use pd.concat():

# recreating our original DF
scores = pd.DataFrame(data = np.random.randint(60,100, size = (5,2)), columns = ['score_1', 'score_2'], index =dates )
scores_cities = pd.concat([scores, cities], axis=1)

Note: the method above will work well if both DataFrames have a similar index. If that’s not the case you might want to use:

scores['city'] = cities['city'].values
scores

4. Adding new column to index

One we added a new column, we can easily add it to the DataFrame index:

scores_cities.set_index('city', append=True)

5. Create new Series from Column

We are easily able to drop column values to a Series.

city_s = scores_cities['city']
type(city_s)

Calling the type function of city_s will return:

pandas.core.series.Series

6. Export new column to CSV file

Last topic in today’s tutorial is show how you can easily export a specific column to a CSV file:

city_s.to_csv('cities.csv')