Migrate SQL script from AWS Redshift to BigQuery


in Redshift should be changed to


in BigQuery.
Since BigQuery doesn’t force type conversion, some NULL value in Redshift could be a NULL value or a ‘NULL’ string in BigQuery. Make sure you use both

column is NULL


column = 'NULL'

for checking.
In BigQuery, we can also use UDF like this:

create temp function change_date(the_date DATE, offset_day INT64) AS (                        
    DATE_ADD(the_date, INTERVAL offset_day DAY)                                               
create temp function next_thursday(the_date DATE) AS (                                        
    change_date(DATE_TRUNC(the_date, WEEK(THURSDAY)), 7)                                      

Performance improvement of BigQuery SQL
Remove ‘DISTINCT’ in SQL and de-dup data later in Pandas could boost whole performance for data processing. Even ‘CAST’ in BigQuery would hurt the performance. The best way to find the bottlenecks for your SQL is by looking at the ‘Execution details‘ in GUI.

Loading speed
For pandas-gbq, we can accelerate the speed of reading BigQuery table by adding argument ‘use_bqstorage_api=True’ in ‘read_gbq()’ function:

df = pandas_gbq.read_gbq(bqsqlfile, project_id='myproject', use_bqstorage_api=True)