Below is an example from pandas official document for pivot():
import pandas as pd df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], 'baz': [1, 2, 3, 4, 5, 6], 'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) result = df.pivot(index='foo', columns='bar', values='baz') print(result)
the result will be
bar A B C foo one 1 2 3 two 4 5 6
But if there are duplicate rows in the Dataframe, it will report error:
ValueError: Index contains duplicate entries, cannot reshape
To fix this, we can just drop_duplicates()
first and then pivot()
:
result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')
As matter of fact, there are situations that drop_duplicates()
couldn’t fix:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'A', 'C', 'A', 'B', 'C'], 'baz': [1, 2, 3, 4, 5, 6], 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
Now we will need to use groupby()
and unstack()
to replace pivot()
:
result = (df.groupby(["foo", "bar"]) .baz .first() .unstack() )
And the result is
bar A B C foo one 1.0 NaN 3.0 two 4.0 5.0 6.0