How to total DataFrame columns and rows in Pandas?

In today’s data analysis tutorial we’ll learn how to sum across rows in Pandas DataFrame columns and add a total summary row.

Create Example Data

We will start by creating some sample data based on a fictitious human resources dataset.

# import the Python Pandas library

import pandas as pd

# Random data set
language = ['JavaScript', 'Kotlin', 'PHP', 'Java', 'R', 'Python', 'Swift', 'SQL']
first_interview = (78, 82, 85, 86, 82, 93, 82, 77)
second_interview = (54, 53, 48, 59, 54, 58, 50, 52)
third_interview = (13, 5, 11, 16, 16, 2, 17, 18)
hr = dict(language=language, interview_1=first_interview, interview_2=second_interview, interview_3=third_interview)

# Inititialize DataFrame
hr_df = pd.DataFrame(data=hr)

Adding a Total Row to the DataFrame

We can use the sum() DataFrame method to quickly total all DataFrame columns. The result will be a Series.

row_sum  = hr_df.sum()
row_sum

The result looks as following:

language       JavaScriptKotlinPHPJavaRPythonSwiftSQL
interview_1                                       665
interview_2                                       428
interview_3                                        98
dtype: object

This is a step forward but not exactly what we aimed for. Let’s first of all ensure that we are adding only numeric values.

row_sum  = hr_df.sum(numeric_only = True)
print (row_sum)

Our result will be:

interview_1    665
interview_2    428
interview_3     98
dtype: int64

We can also use the iloc accessor to filter out the specific columns to sum:

row_sum = hr_df.iloc[:,1:4].sum()

Next we will append the resulting total Series as a new row to the DataFrame. We can do it using the loc accessor:

hr_df.loc['Total'] = row_sum
hr_df.fillna('')

As you can see below, a new row, labeled Total was added to your DataFrame.

Adding a total column

For completeness, here is a short snippet that adds a total column to your DataFrame. Note the use of the axis = 1 parameter to ensure that the sum is made across the columns, rather than by default across rows.

hr_df['Total'] = hr_df.sum(numeric_only = True, axis =1)