Fix ValueError: Index contains duplicate entries, cannot reshape in Python

Reproducing the Value Error cannot reshape in Pandas

Let’s assume that we have the following Python code that creates a very small DataFrame:

import pandas as pd

month = ['Jan', 'Jan', 'Feb', 'Mar']
office = ['Los Angeles', 'Los Angeles', 'New York', 'New York']
salary = [ 98, 92, 87, 92]
data = dict(month = month, office = office, salary = salary)
test_df = pd.DataFrame(data=data)

Let’s look at our data:

print (test_df.head())
monthofficesalary
0JanLos Angeles98
1JanLos Angeles92
2FebNew York87
3MarNew York92

To reproduce the error message, we will call the DataFrame pivot() function:

test_df.pivot(values= 'salary', index='office', columns='month')

This will render the following exception error message:

ValueError: Index contains duplicate entries, cannot reshape

Fixing the index contains duplicate entries exception

The root cause for the error, is that if you look carefully, the first and second rows in our DataFrame has the same values in the office and month columns. Those are used as the pivot row and columns index respectively. As the pivot function in Pandas doesn’t (by default) aggregate numeric values it doesn’t know how to “crunch” those two rows.

Solution #1 – create a pandas pivot table

We can use the pivot_table() function instead, and specify an aggregating function. In our case we’ll aggregate the numeric values by their average value.

test_df.pivot_table(values= 'salary', index='office', columns='month', aggfunc='mean', fill_value = 0)

This will return the following pivot table (technically a DataFrame):

monthFebJanMar
office
Los Angeles92980
New York87092

Related: How to write a pandas pivot table to a new DataFrame?

Solution #2 : Use groupby and unstack

We first aggregate the data as required and the use the unstack() DataFrame function in order to arrange the pivot rows and columns. Last, we replace missing values in our pivot table.

test_df.groupby(['office','month']).mean().unstack().fillna(0)

The result will be similar to the one produced in the previous section.