In one of our previous tutorials we learnt about how to sum multiple values in Pandas pivot tables. Today we’ll look specifically into the usage of the Pandas pivot_table module to display a cross tabular representation that shows both unique and non-unique occurrence counts.
Pivot table all and unique count in Pandas – Example
Example data
We’ll start by defining some sample data for our example. Feel free to use the code below to generate a simple data set for you to follow along.
import pandas as pd
# Define dataset
language = ['R', 'C#', 'Python', 'C#', 'C#', 'Python', 'C#', 'C#', 'C#', 'C#', 'Java', 'Java', 'Python', 'Java', 'C#', 'Python']
office = ['NYC', 'SFO', 'NYC', 'SFO', 'SFO', 'ATL', 'SFO', 'ATL', 'ATL', 'SFO', 'SFO', 'NYC', 'SFO', 'NYC', 'ATL', 'ATL']
salary = [115.0, 136.0, 116.0, 129.0, 136.0, 129.0, 115.0, 133.0, 132.0, 132.0, 115.0, 133.0, 136.0, 141.0, 98.0, 118.0]
#Intitialize DataFrame from Python dictionary
hr = dict( language=language, office=office, salary = salary)
sal_df = pd.DataFrame(data=hr)
Pandas pivot table with count aggregate function
The Pandas library delivers several method that allows to pivot data for easier analysis and presentation. Today we will use the pivot_table method to quickly define a cross tabular summary of our data.
The pivot_table parameters are self explanatory:
- values => specify the numeric data that will be aggregated.
- index => represents the pivot table rows
- columns => represents the pivot table columns
If we want to aggregate our data in a way that it will show a count of all instances of a value, we’ll use the aggfunc=’count’ parameter as shown below:
sal_df.pivot_table(values='salary', index = 'office', \
columns= 'language', aggfunc= 'count')
Here’s the result:
Counting distinct values in Pandas pivot
If we want to count the unique occurrences of a specific observation (row) we’ll need to use a somewhat different aggregation method. aggfunc= pd.Series.nunique will allow us to count only the distinct rows in the DataFrame that we pivoted.
sal_df.pivot_table(values='salary', index = 'office', \
columns= 'language', \
aggfunc= pd.Series.nunique)
Here’s the result:
Note that only unique occurrences of each row were counted.
Count and sum in one Pivot table
Pandas pivot_tables support multiple aggregation functions. In the next example we show how we can pass a list into the aggfunc parameter to calculate bothunique occurrences count and sum in the same pivot table:
sal_df.pivot_table(values='salary', index = 'office', \
columns= 'language', \
aggfunc= [pd.Series.nunique, 'sum'], \
fill_value = '-')
Here’s our result:
Note: the fill_value parameter, allows to specify a filling character that replaces NAN values in the table.