Some tips about BigQuery on GCP

Migrate SQL script from AWS Redshift to BigQuery

in Redshift should be changed to

in BigQuery.
Since BigQuery doesn’t force type conversion, some NULL value in Redshift could be a NULL value or a ‘NULL’ string in BigQuery. Make sure you use both

and

for checking.
In BigQuery, we can also use UDF like this:

Performance improvement of BigQuery SQL
Remove ‘DISTINCT’ in SQL and de-dup data later in Pandas could boost whole performance for data processing. Even ‘CAST’ in BigQuery would hurt the performance. The best way to find the bottlenecks for your SQL is by looking at the ‘Execution details‘ in GUI.

Loading speed
For pandas-gbq, we can accelerate the speed of reading BigQuery table by adding argument ‘use_bqstorage_api=True’ in ‘read_gbq()’ function:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.