In today’s tutorial we’ll learn how to write specific values into a new or existing pandas DataFrame column. We’ll look into several use cases.
Example data
We will start by defining a simple dataset:
import pandas as pd
dates = pd.date_range(start='3/1/24', end = '3/08/24', freq='B')
sales = [284, 472, 361, 269, 386, 401]
revenue = pd.DataFrame (dict (dates=dates, sales=sales))
print (revenue.head())
Here’s our data:
dates | sales | |
---|---|---|
0 | 2024-03-01 | 284 |
1 | 2024-03-04 | 472 |
2 | 2024-03-05 | 361 |
3 | 2024-03-06 | 269 |
4 | 2024-03-07 | 386 |
5 | 2024-03-08 | 401 |
Write a single value to a pandas column
First example will be to use the assign DataFrame method to add a column containing a single value into our pandas DataFrame. In our case we will set all columns of the channel column to be equal to Web.
revenue = revenue.assign(channel = 'Web')
print (revenue.head(6))
Here’s our data:
dates | sales | channel | |
---|---|---|---|
0 | 2024-03-01 | 284 | Web |
1 | 2024-03-04 | 472 | Web |
2 | 2024-03-05 | 361 | Web |
3 | 2024-03-06 | 269 | Web |
4 | 2024-03-07 | 386 | Web |
5 | 2024-03-08 | 401 | Web |
Fill column with column based on other column
In the same fashion we would like to introduce new column named margin. Historically we know that our operating margin is 25% hence we’ll apply this logic to the data:
revenue = revenue.assign(margin = revenue['sales'] * 0.25)
We can as well use a direct assignment or a lambda function here:
revenue['margin'] = revenue['sales'] * 0.25
Or using a lambda function:
revenue ['margin'] = revenue.apply(lambda r : r['sales'] *0.25 , axis = 1)
If using the apply method with the lambda function, make sure to specify the parameter axis = 1 to ensure the calculation will be executed across the column axis – meaning that the function will be applied to each row.
All the above will render the following:
print(revenue.head(6))
dates | sales | margin | |
---|---|---|---|
0 | 2024-03-01 | 284 | 71.00 |
1 | 2024-03-04 | 472 | 118.00 |
2 | 2024-03-05 | 361 | 90.25 |
3 | 2024-03-06 | 269 | 67.25 |
4 | 2024-03-07 | 386 | 96.50 |
5 | 2024-03-08 | 401 | 100.25 |
Fill column with increasing numbers
In our case we can use the following code:
revenue ['order'] = revenue.index + 1
Write random values from list to column
Next, we would like to modify the channel column and populate it with random values from a Python list.
import random
rand_lst = random.choices (['Web', 'Indirect', 'Direct'], k = len(revenue) )
Follow up learning
How to filter a pandas DataFrame by value in one or more columns?