I barely pay attention to the pandas.datetime64 type. But yesterday a problem stroke me.
It was a parquet file with a column “start_date”:
>>> df["start_date"] start_date 0 2022-03-22 00:00:00+11:00 1 2022-03-22 00:00:00+11:00 2 2022-03-22 00:00:00+11:00 3 2022-03-22 00:00:00+11:00 4 2022-03-22 00:00:00+11:00
Looks they are “2022-03-22” on Tuesday. But after I export this into BigQuery and select them, they became “2022-03-21 UTC”, which is Monday by default.
The problem is definitely about the Timezone this column has:
>>> df.dtypes start_date datetime64[ns, Australia/Sydney]
What we need to do to be aligned with BigQuery is just remove the timezone and make the time to just “2022-03-22”.
The solution is forcibly and simple:
df["start_date"] = df["start_date"].dt.tz_localize(None)