Some tips about BigQuery on GCP

Migrate SQL script from AWS Redshift to BigQuery

CONVERT_TIMEZONE('AEDT',getdate())::DATE

in Redshift should be changed to

current_date("Australia/Sydney")

in BigQuery.
Since BigQuery doesn’t force type conversion, some NULL value in Redshift could be a NULL value or a ‘NULL’ string in BigQuery. Make sure you use both

column is NULL

and

column = 'NULL'

for checking.
In BigQuery, we can also use UDF like this:

create temp function change_date(the_date DATE, offset_day INT64) AS (
    DATE_ADD(the_date, INTERVAL offset_day DAY)
);
create temp function next_thursday(the_date DATE) AS (
    change_date(DATE_TRUNC(the_date, WEEK(THURSDAY)), 7)
);

Performance improvement of BigQuery SQL
Remove ‘DISTINCT’ in SQL and de-dup data later in Pandas could boost whole performance for data processing. Even ‘CAST’ in BigQuery would hurt the performance. The best way to find the bottlenecks for your SQL is by looking at the ‘Execution details‘ in GUI.
Loading speed
For pandas-gbq, we can accelerate the speed of reading BigQuery table by adding argument ‘use_bqstorage_api=True’ in ‘read_gbq()’ function:

df = pandas_gbq.read_gbq(bqsqlfile, project_id='myproject', use_bqstorage_api=True)

Robin on Linux

Some tips about BigQuery on GCP

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply