1. ‘()’ may mean tuple or nothing.

len(("birds"))  # the inner '()' means nothing
len(("birds",))  # the inner '()' means tulple because of the comma

The result is:

5
1

2. Unlike TensorFlow’s static graph, PyTorch could run neural network just as the code. This means a lot of conveniences. The first advantage, we could print out any tensor in our program, no matter in prediction or training. Second, just adding ‘time.time()’ in code, could help us profiling every step of training.
3. Follow the example of NVIDIA’s apex, I wrote a prefetcher to let PyTorch loading data and computing parallelly. But in my test, the ‘data_prefetcher’ actually hurt the performance of training. The reason may be my model (VGG16) is not dense enough, thus computing cost less time than loading data.