How to perform a function on one or multiple Pandas columns?

While preparing your data for analysis, as part of your data pre – processing you might need to harmonize some of your raw feature data so that it can be better comprehended by business stakeholders.

In this short tutorial we will learn how you can easily apply standard or lambda functions to columns of your DataFrame.

Example Data

As we typically do, let’s define a very simple DataFrame that you can use to follow along this example.

import pandas as pd

sales = pd.DataFrame({'area': [101, 102, 103, 104], 'sales': [250,350,500,400], 'num_account_managers' : [10,15,13,14]})

sales_wip = sales.copy()

sales_wip.head()
areasalesnum_account_managers
010125010
110235015
210350013
310440014

Note: In order to complete this tutorial, you will need to import the 3rd party Pandas library into your Development environment. Here’s how to troubleshoot Pandas module import issues that you might encounter.

Apply lambda function to specific Pandas column / series

The area column in our dataframe is populated with area codes, we instead would like to see the area names. We are able to easily apply a Lambda function to the area column as shown below:

# define a mapping dictionary
area_dict = {101: 'NYC', 102: 'LAX', 103: 'LON', 104:'CDG' }

#apply the lambda function
sales_wip['area'] = sales_wip['area'].apply (lambda x: area_dict[x])

sales_wip.head()

Here’s the result:

areasalesnum_account_managers
0NYC25010
1LAX35015
2LON50013
3CDG40014

Apply function to Pandas and create new column with condition

In the next example, we will apply a function to create a new column in our DataFrame.

#define function
def calc_performance_level(sales_amnt):
    if sales_amnt > 400:
        return ('high')
    else:
        return ('low')
 
# apply function to create new_column
sales_wip['performance'] = sales_wip['sales'].apply\
(calc_performance_level)

We could have accomplish the same result using a lambda function:

sales_wip['performance'] = sales_wip['sales'].apply\
(lambda x: calc_performance_level(x))

Here’s the result:

areasalesnum_account_managersperformance
0NYC25010low
1LAX35015low
2LON50013high
3CDG40014low

Perform functions with arguments on a Series

When calling a function we can also pass arguments to the function as needed. Here’s a very simple example that calculates the weekly sales.

#define function
def calc_weekly_sales(sales_amnt, weeks):
    return (round(sales_amnt / weeks,2))

#call function on DataFrame column
sales_wip['weekly_sales'] = sales_wip['sales'].apply(calc_weekly_sales, weeks=52)