How to sum in a Pandas pivot table?

When working with pivot tables, you are able to aggregate the data as needed for your analysis. In today’s tutorial we’ll learn how to use Pandas to sum up aggregated pivot table values. Below you’ll find a step-by-step example that you are welcome to follow along.

Pandas Pivot table sum example

Creating the example DataFrame

We’ll first go ahead and create a DataFrame that we’ll use throughout our example.

import pandas as pd
month = ['May', 'February', 'May', 'April', 'April', 'January', 'April', 'March']
language = ['Java', 'Java', 'Java', 'C#', 'Java', 'Java', 'C#', 'C#']
office = ['Atlanta', 'New York', 'New York', 'New York', 'Atlanta', 'New York', 'Atlanta', 'Atlanta']
salary = [124.0, 119.0, 117.0, 119.0, 124.0, 124.0, 129.0, 117.0]

salaries = dict(month=month, language=language, office=office,salary = salary)
salarydf = pd.DataFrame(data=salaries)
salarydf.head()

Here’s our DataFrame:

Pandas pivot table with sum aggfunc

Pandas delivers a pivot_table method for DataFrames. For every pivot table you can specify the table index (rows), columns and values. THe aggfunc parameter allows you to summarize your pivot table values according to specific logic. Below is a short snippet that creates the pivot and summarizes using sum:

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum')

Here’s the initial pivot table that we have created:

Add a Summary / Total row & column to Pandas pivot table

We can easily insert a total / sum row to our Python pivot table by using the margins and margin_names parameters. The margins parameters insets the summary row and columns. THe margin_names parameters, allows us to rename the pivot table summary columns.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum', \
                     margins = True, margins_name='Total')

Pivot table count and sum

Next example will be to use multiple aggregation functions for our Pivot table calculated field values. In this example, we’ll use both count and sum. To do that, we’ll pass a list to the aggfunc parameter.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', \
                     aggfunc= ['count','sum'], \
                     margins = True, margins_name='Total')

Sum multiple columns

In this next section, we will aggregate and summarize our pivot data by multiple columns. In this example we’ll first group the data by the office and then by the language column. For that, we’ll pass a list to the columns parameter.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= ['office','language'], \
                     aggfunc= 'sum', \
                     margins = True, margins_name='Total')

Here’s our table:

Pivot columns to rows in Pandas

It’s relatively simple to transpose a pivot table by using the the T (transpose) DataFrame method as shown in the snippet below:


salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum').T

Export pivot table to csv

Once done with defining the layout of our pivot table, we are able to very easily export it to an external format, such as a comma separated value (csv) file.

As the Python pivot table is a DataFrame, we can simply use the to_csv() DataFrame method. Here’s a very simple example:

my_pivot = salarydf.pivot_table(values='salary', index = 'month', \
                     columns= ['office','language'], \
                     aggfunc= 'sum', \
                     margins = True, margins_name='Total')


my_pivot.to_csv('salaries.csv', sep = ',')

Additional Learning