Below is an example from pandas official document for pivot():
import pandas as pd
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
result = df.pivot(index='foo', columns='bar', values='baz')
print(result)
the result will be
bar A B C
foo
one 1 2 3
two 4 5 6
But if there are duplicate rows in the Dataframe, it will report error:
ValueError: Index contains duplicate entries, cannot reshape
To fix this, we can just drop_duplicates()
first and then pivot()
:
result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')
As matter of fact, there are situations that drop_duplicates()
couldn’t fix:
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
Now we will need to use groupby()
and unstack()
to replace pivot()
:
result = (df.groupby(["foo", "bar"])
.baz
.first()
.unstack()
)
And the result is
bar A B C
foo
one 1.0 NaN 3.0
two 4.0 5.0 6.0