Imaging we have a small CSV file:
name,enroll_time robin,2021-01-15 09:50:33 tony,2021-01-14 01:50:33 jaime,2021-01-13 00:50:33 tyrion,2021-2-15 13:22:17 bran,2022-3-16 14:00:01
Let’s try to load it into DataFrame of Pandas and upload it to a table of BigQuery:
import pandas as pd from google.cloud import bigquery df = pd.read_csv("test.csv", parse_dates=["enroll_time"], index_col=0) schema = [] schema.append(bigquery.SchemaField("name", "STRING")) schema.append(bigquery.SchemaField("enroll_time", "DATE")) job_config = bigquery.LoadJobConfig(schema=schema) bq_client = bigquery.Client() table = "project.dataset.test_table" job = bq_client.load_table_from_dataframe( df, table, job_config=job_config ) job.result()
But it reports error:
File "pyarrow/array.pxi", line 176, in pyarrow.lib.array File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to date32[day] would lose data: 1610704233000000000
Seems the BigQuery library couldn’t recognize the 1610704233000000000
as nano-seconds. Then I tried to divide the 1610704233000000000
with 1e9 but it also failed.
Actually what we need to do is just use TIMESTAMP
instead of DATE
as the type of column enroll_time
:
schema.append(bigquery.SchemaField("name", "STRING")) schema.append(bigquery.SchemaField("enroll_time", "TIMESTAMP"))
and the BigQuery library could recognize the column even with nano-seconds unit.