This week, I was trying to train two deep-learning models. They are different from my previous training job: they are really hard to converge to a small ‘loss’.
The first model is about bird image classification. Previously we wrote a modified Resnet-50 model by using MXNet and could use it to reach 78% evaluation-accuracy. But after we rewrote the same model by using Tensorflow, it could only reach 50% evaluation-accuracy, which seems very weird. The first thing that in my mind is that it’s a regularization problem, so I randomly pad/crop and rotate the training images:

  image = tf.image.resize_image_with_crop_or_pad(image, IMAGE_HEIGHT + 80, IMAGE_WIDTH + 80)
  image = tf.contrib.image.rotate(image, tf.random_uniform([1], minval = -math.pi / 3.0, maxval = math.pi / 3.0))
  image = tf.random_crop(image, [IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS])

By data augmentation, the evaluation accuracy rise to about 60%, but still far from the result of MXNet.
Then I change the optimizer from AdamOptimizer to GradientDescentOptimizer, since my colleague tell me the AdamOptimizer is too powerful that it tends to cause overfit. And I also add ‘weight_decay’ for my Resnet-50 model. This time, the evaluation accuracy shrived to 76%. The affection of ‘weight_decay’ is significantly positive.
The second model is about object detection. We just use the example of Tensorflow’s model library. It includes many cutting-edge models to implement object detection. I just want to try SSD(Single Shot Detection) on MobileNetV2:

python object_detection/train.py \
  --logtostderr \
  --pipeline_config_path=/disk3/donghao/models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config \
  --train_dir=/disk3/donghao/myckpt/ \
  --num_clones=2

The loss is rapidly reducing from hundreds to twelve, but stay at eleven for a very long time. The loss looks like will stay here forever. Then I begin to adjust hyper-parameters. After testing several learning rates and optimizers, the results doesn’t change at all.
Eventually, I noticed that the loss doesn’t stay forever, it WILL REDUCE AT LAST. For some tasks such as classification, its loss will converge significantly. But for other tasks such as object detection, its loss will shrink at extremely slow speed. Use AdamOptimizer and small learning rate is a better choice for this type of task.