In the previous article, I reached mAP 0.770 for VOC2007 test.
Four months has past. After trying a lot of interesting ideas from different papers, such as FPN, celu, RFBNet, I finally realised that the data is more important than network structures. Then I use COCO2017+VOC instead of only VOC to train my model. The mAP for VOC2007 test eventually reached 0.797.
But another strange thing happens: there will be a strange big bounding box around the whole image for the 16-birds-image. After using dropout and changing augmentation policies, the strange big box still existed.
I doubt that COCO2017 dataset for birds is not general enough. Therefore I decided to use a more abundant dataset — Open Images Dataset V5. After retrieving all bird images from Open Images Dataset V5, I get 18525 images with corresponding annotations. By using them for training, I finally got a more promising bird detection result for that 16-birds-image (by using threshold 0.65):

Seems these bird images in Open Images Dataset V5 are more general than COCO2017. But the mAP of COCO evaluation is smaller for the model trained by Open Images than a model trained by COCO2017. So it looks like I need a more comprehensive evaluation metrics now.