Author Archives: Robin Dong

How to ignore illegal sample of dataset in PyTorch?

I have implemented a dataset class for my image samples. But it can’t handle the situation that a corrupted image has been read:

The correct solution is in Pytorch Forum. Therefore I changed my code:

But it reports:

Seems default_collate() couldn’t recognize the ‘filter’ object. Don’t worry…. Read more »

Tips about pytest

      No Comments on Tips about pytest

1. Error for “fixture ‘mocker’ not found” After running pytest, it reported:

The solution is just installing the missing pip package:

2. How to make sure a function has been called without caring about its arguments? There are two methods. The first method is using “.called”

The… Read more »

The generating speed for random number in Python3

Just want to generate random number in a range (no matter float or integer) by using Python. Since I only need to get a random number in my code once a time, the speed for calling the generating-function is critical. So let’s do the experiment:

The result is:

Read more »

A problem about using DataFrame in Apache Spark

Here is the code for loading CSV file (table employee) to DataFrame of Apache Spark:

But after I run the jar in Spark, it report:

Seems data haven’t been correctly load. After reviewed the document for CSV format carefully, I noticed that the quote in my CSV file… Read more »

Using Single Shot Detection to detect birds (Episode four)

In the previous article, I reached mAP 0.770 for VOC2007 test. Four months has past. After trying a lot of interesting ideas from different papers, such as FPN, celu, RFBNet, I finally realised that the data is more important than network structures. Then I use COCO2017+VOC instead of only VOC… Read more »

An example of using Spark Structured Streaming

This snippet will monitor two directories and join the data from them when there is a new CSV file in any directory.

The join operation is implemented by Spark SQL which is easy to use (for DBA), and also easy to maintain. Some articles said if the Spark process… Read more »

A problem of using Pyspark SQL

Here is the code:

It will report error after running ‘cat xxx.py|bin/pyspark’:

I used to think it was because ‘2’ is a string, so I changed ‘row’ to be ‘[2, 29, 29, 29]’. But the error also changed to:

Then I searched on google, and find this… Read more »