How to divide DataFrame columns in Pandas?

In previous tutorials, we learnt how to sum and multiply columns values in Pandas. Today, we would like to discuss several cases related to the application of the division arithmetic operation in Pandas DataFrames.

In this post, we will cover step-by-step process to divide a column:

  1. By value / constant / scalar
  2. By other column
  3. By sum of a column
  4. By its first value

Create an example Pandas DataFrame

We will start by creating a very simply DataFrame with some data that you can use to follow along the example.

# Python3
import pandas as pd
employee = ['Dorothy', 'Sarah', 'Liam', 'Larry']
salary = [183, 48, 92, 181]
bonus = [5,6,4,7]

hr = pd.DataFrame (dict(employee = employee, salary = salary, bonus = bonus))

hr.head()

Here’s our DataFrame data:

Divide columns by constant / scalar / value

Our first example is just divide a DataFrame column by a constant value. In our case, we’ll just go ahead and calculate the monthly salary of each employee. Here’s the code you’ll need to accomplish that

# By value / constant
num_months = 12
hr['monthly_salary'] = (hr['salary'] / num_months).round(2)

hr.head()

A new column will be added to our DataFrame:

Divide a DataFrame column by other column

Another common use case is simply to create a new column in our DataFrame by dividing to or multiple columns. In this case, we’ll calculate the bonus percentage from the annual salary. Here we go:

# division by other column
hr['bonus_pct'] = (hr['bonus']/ hr['salary']*100).round(2)

hr.head()

Here’s the resulting DataFrame:

Calculate percentage from sum of a column

In the next example we’ll simply divide column values by its sum. This is helpful in order to calculate percentages.

max_sal = hr['salary'].sum()
hr ['sal_pct'] = (hr['salary'] / max_sal *100).round(2)

hr.head()

Here’s the resulting DataFrame:

Divide column rows by first value

Last example is when we just want to divide all column values by the first value. We can obviously apply the same logic to divide by the maximum, minimum, average, std deviation and so forth of the column.

first_val = hr['salary'][0]

hr ['sal_pct'] = (hr['salary'] / first_val).round(2)

Divide by zero error in Pandas

There might be cases in which your denominator column value will be equal to zero. The result will trigger an infinite value, displayed as inf in your DataFrame.

You might want to convert the inf values to empty values: None, np.nan or pd.NA. If so, use the following code (make sure that you replace the col_name placeholder with your relevant column name.

hr['col_name'] = \
(hr['col_name']
.where(~np.isinf(hr['col_name']),pd.NA))

This will keep all values which are not infinite and replace the ones that are with pd.NA.

How to multiply pandas Series element wise?

Now, we will learn how to calculate the multiplication of multiple pandas series objects as shown below.

series_3 = series_1 * series_2

Data Preparation

We will first import pandas and create two series of randomly created numbers:

import random
import pandas as pd

The create two random Series objects, each consisting of 20 elements.

s_employees= pd.Series(random.choices(range(30,100), k=20))
s_working_days = pd.Series(random.choices(range (20,30), k=20))

Expert Tip: When trying to generate random list, you might have used the random.sample() function. If so, you might have received the following error message:

# ValueError: Sample larger than population or is negative

If so, make sure to use the random.choices() function as shown above.

Multiplying you Series elements

We can now calculate the product of the two Series using the following vectorized operation

s_total_working_days = s_employees * s_working_days

Multiply by a constant / scalar / float

You are able to multiply your series by an integer scalar:

s_yearly_hours = s_total_working_days * 22

Similarly, you can multiply your pandas column by a float value:

s_yearly_hours = s_total_working_days * 22.545

Convert string series to numeric values and multiply

In case that you have a Series consisting of non numeric values, you won’t be able to apply arithmetic operations on it. Consider this example:

s1 = pd.Series([5,2,3,2,1])
s2 = pd.Series (["100", "200", "300", "400", "500"]) # series of strings


#we'll try to multiply the series objects:
s1*s2

This will render the following string series:

0    100100100100100
1 200200
2 300300300
3 400400
4 500
dtype: object

You can calculate the arithmetic multiplication by using the pandas pd.to_numeric() function:

s1*pd.to_numeric(s2)

This will render the right result

0    5000
1 4000
2 9000
3 8000
4 5000
dtype: int64

Sum your multiplied Series

After multiplying your two or more series you can easily sum the total:

print (f"The total number of working days was: {sum(s_total_working_days)}")

This will return the following string (your result will be different as we are using random data).

The total number of working days was: 31640