Author Archives: Robin Dong

Using Single Shot Detection to detect birds (Episode four)

In the previous article, I reached mAP 0.770 for VOC2007 test. Four months has past. After trying a lot of interesting ideas from different papers, such as FPN, celu, RFBNet, I finally realised that the data is more important than network structures. Then I use COCO2017+VOC instead of only VOC… Read more »

An example of using Spark Structured Streaming

This snippet will monitor two directories and join the data from them when there is a new CSV file in any directory.

The join operation is implemented by Spark SQL which is easy to use (for DBA), and also easy to maintain. Some articles said if the Spark process… Read more »

A problem of using Pyspark SQL

Here is the code:

It will report error after running ‘cat xxx.py|bin/pyspark’:

I used to think it was because ‘2’ is a string, so I changed ‘row’ to be ‘[2, 29, 29, 29]’. But the error also changed to:

Then I searched on google, and find this… Read more »

Processing date and time in AWS Redshift

Since AWS Redshift don’t have function like FROM_UNIX(), it’s much more weird to get formatted time from a UNIX timestamp (called ‘epoch’ in Reshift):

Ref: https://stackoverflow.com/questions/39815425/how-to-convert-epoch-to-datetime-redshift If we want to see the statistics result group by hours:

Some tips about using AWS Glue

Configure about data format To use AWS Glue, I write a ‘catalog table’ into my Terraform script:

But after using PySpark script to access this table, it reports:

Seems we can’t use ‘OpenCSVSerde’. Actually, the correct answer is:

The version of zeppelin When using zeppelin to run… Read more »