Migrate SQL script from AWS Redshift to BigQuery
in Redshift should be changed to
Since BigQuery doesn’t force type conversion, some NULL value in Redshift could be a NULL value or a ‘NULL’ string in BigQuery. Make sure you use both
column is NULL
column = 'NULL'
In BigQuery, we can also use UDF like this:
create temp function change_date(the_date DATE, offset_day INT64) AS ( DATE_ADD(the_date, INTERVAL offset_day DAY) ); create temp function next_thursday(the_date DATE) AS ( change_date(DATE_TRUNC(the_date, WEEK(THURSDAY)), 7) );
Performance improvement of BigQuery SQL
Remove ‘DISTINCT’ in SQL and de-dup data later in Pandas could boost whole performance for data processing. Even ‘CAST’ in BigQuery would hurt the performance. The best way to find the bottlenecks for your SQL is by looking at the ‘Execution details‘ in GUI.
For pandas-gbq, we can accelerate the speed of reading BigQuery table by adding argument ‘use_bqstorage_api=True’ in ‘read_gbq()’ function:
df = pandas_gbq.read_gbq(bqsqlfile, project_id='myproject', use_bqstorage_api=True)