How to sum in a Pandas pivot table?

When working with pivot tables, you are able to aggregate the data as needed for your analysis. In today’s tutorial we’ll learn how to use Pandas to sum up aggregated pivot table values. Below you’ll find a step-by-step example that you are welcome to follow along.

Pandas Pivot table sum example

Creating the example DataFrame

We’ll first go ahead and create a DataFrame that we’ll use throughout our example.

import pandas as pd
month = ['May', 'February', 'May', 'April', 'April', 'January', 'April', 'March']
language = ['Java', 'Java', 'Java', 'C#', 'Java', 'Java', 'C#', 'C#']
office = ['Atlanta', 'New York', 'New York', 'New York', 'Atlanta', 'New York', 'Atlanta', 'Atlanta']
salary = [124.0, 119.0, 117.0, 119.0, 124.0, 124.0, 129.0, 117.0]

salaries = dict(month=month, language=language, office=office,salary = salary)
salarydf = pd.DataFrame(data=salaries)
salarydf.head()

Here’s our DataFrame:

Pandas pivot table with sum aggfunc

Pandas delivers a pivot_table method for DataFrames. For every pivot table you can specify the table index (rows), columns and values. The aggfunc parameter allows you to summarize your pivot table values according to specific logic. Below is a short snippet that creates the pivot and summarizes using sum:

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum')

Here’s the initial pivot table that we have created:

Add a Summary / Total row & column to Pandas pivot table

We can easily insert a total / sum row to our Python pivot table by using the margins and margin_names parameters. The margins parameters insets the summary row and columns. THe margin_names parameters, allows us to rename the pivot table summary columns.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum', \
                     margins = True, margins_name='Total')

Pivot table count and sum

Next example will be to use multiple aggregation functions for our Pivot table calculated field values. In this example, we’ll use both count and sum. To do that, we’ll pass a list to the aggfunc parameter.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', \
                     aggfunc= ['count','sum'], \
                     margins = True, margins_name='Total')

Sum multiple columns

In this next section, we will aggregate and summarize our pivot data by multiple columns. In this example we’ll first group the data by the office and then by the language column. For that, we’ll pass a list to the columns parameter.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= ['office','language'], \
                     aggfunc= 'sum', \
                     margins = True, margins_name='Total')

Here’s our table:

Pivot table without aggregation

We can create a pivot table without specifically specifying an aggregation method. Pandas will use ‘mean’ as the default aggregation method.

salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language')

Pivot columns to rows in Pandas

It’s relatively simple to transpose a pivot table by using the the T (transpose) DataFrame method as shown in the snippet below:


salarydf.pivot_table(values='salary', index = 'month', \
                     columns= 'language', aggfunc= 'sum').T

Export pivot table to a csv file

Once done with defining the layout of our pivot table, we are able to very easily export it to an external format, such as a comma separated value (.csv) file.

As the Python pivot table is a DataFrame, we can simply use the to_csv() DataFrame method. Here’s a very simple example:

my_pivot = salarydf.pivot_table(values='salary', index = 'month', \
                     columns= ['office','language'], \
                     aggfunc= 'sum', \
                     margins = True, margins_name='Total')


my_pivot.to_csv('salaries.csv', sep = ',')

Creating Pandas crosstab tables

Next in this tutorial, we will quickly create and customize Pandas crosstabs. As part of our Data Analysis work, we typically use crosstabs in order to count , plot and analyze frequencies of occurrences on multiple columns.

Let’s start by importing data:

# import pandas
import pandas as pd

sal_df = pd.read_csv('hr.csv')

print(sal_df)

Here’s our data:

Create a Simple Pandas crosstab

We’ll call the pd.crosstab function and render a very simple crosstab:

crosstb1 = pd.crosstab(index = sal_df['month'], columns = sal_df['language'])

crosstb1

Note that by default the crosstab displays the occurrence count (not the unique / distinct values) and there is no need to specify the aggfunc = count parameter.

Create a cross tab with percentages

By definition, the crosstab counts the occurrences of each row in the DataFrame. We can do the same using a Pandas Pivot Table. What’s special about the crosstab function, if that as we mentioned before, it also allows to easily computer the relative frequency between values in different columns. We do that by adding the normalize=True parameter.

crosstb2 = pd.crosstab(index = sal_df['month'],  \
                      columns = sal_df['language'] , \
                      normalize=True) *100
crosstb2

Using a simple snippet we computed the relative frequency (percentage) of each observation in our data.

Adding a total column and row to the crosstab

Next, we’ll use the margins and margins_name parameters to add a summary column and row to the table.

crosstb3 = pd.crosstab(index = sal_df['month'],  \
                      columns = sal_df['language'] , \
                      normalize=True, margins = True, \
                      margins_name= "Total") *100
crosstb3

To improve the look and feel of our table, we’ll append percentage signs to each cell in the cross tab. In order to do so, we’ll first convert the crosstab contents to a Python string data type. We do that by using the astype DataFrame method. We then append the % character using a simple lambda function.

crosstb3.astype(str).apply(lambda x:x + '%')

Here is the resulting crosstab:

Pandas crosstabs with multiple columns

Next, we would like to allow an easier drill down into the data by adding another level to our crosstab columns section. Note that instead of passing a single column we are passing a list containing the columns we want displayed.

crosstb4 = pd.crosstab(index = sal_df['month'],  \
                      columns = [sal_df['language'], sal_df['office']] , \
                      normalize=True, margins = True, \
                      margins_name= "Total") *100

crosstb4.astype(str).applymap(lambda x:x + '%')