How to sum all, multiple and specific columns in a DataFrame?

In today’s quick data Analysis tip we would like to learn how to summarize one or multiple Python DataFrame columns at once.

Creating test data

As we normally do, let’s create a simple test DataFrame using the red_csv() method.

# import required Python libraries
import pandas as pd

'Use the pd.read_csv() method to import a csv file
inter = pd.read_csv('interviews.csv')

print(inter)

Note: If for some reason, you are not able to import the pandas library, you might want to look into this article on how to troubleshoot your Pandas installation.

Here’s our data:

monthfirst_interviewsecond_interview
1November85.078.0
2August87.082.0
3March85.084.0
4August77.080.0
5June81.087.0

Sum all columns in a Pandas DataFrame into new column

If we want to summarize all the columns, then we can simply use the DataFrame sum() method. Note that we passed the following parameters:

  • axis: If we want to aggregate the columns, then we’ll use axis=1. For rows we’ll use axis=0.
  • numeric_only = we’ll take under consideration only numeric columns.

We’ll aggregate the sum of the columns into a newly created column – named total.

inter['total'] = inter.sum(axis=1, numeric_only= True)

inter
monthfirst_interviewsecond_interviewtotal
1November85.078.0163.0
2August87.082.0169.0
3March85.084.0169.0
4August77.080.0157.0
5June81.087.0168.0

Add all columns using Apply and Lambda

This might not always be practical, but for completeness: if your DataFrame contains only numeric columns you add up all columns by using a simple apply statement and call a lambda function.


inter['total'] = inter.apply(lambda r:r.sum(), axis =1)

Sum DataFrame columns into a Pandas Series

Instead of creating a new column, we’ll receive a Python series:

int_s = inter.sum(axis=1, numeric_only= True)

Sum multiple columns in a Python DataFrame

If we want to go ahead and sum only specific columns, then we can subset the DataFrame by those columns and then summarize the result. We can obviously pass one column, specific ones, all columns except one etc’.

'Define your list of columns
cols = ['first_interview', 'second_interview']

inter ['total_2'] = inter[cols].sum(axis=1)

inter
monthfirst_interviewsecond_interviewtotaltotal_2
1November85.078.0163.0163.0
2August87.082.0169.0169.0
3March85.084.0169.0169.0
4August77.080.0157.0157.0
5June81.087.0168.0168.0

DataFrame sum columns based on name (or other condition)

You could potentially only sum columns which name matches a specific string value as well. In this example, we’ll select the column to aggregate based on a specific text pattern. We then pass the selection into the column parameter or a loc indexer on our DataFrame.

col_array = inter.columns.str.contains('first')
inter.loc[:, col_array].sum(axis=1)

Next learning

Now that you are proficient with summing up columns, we suggest that you look into our tutorial on how to sum Pandas rows.