In the previous article, I reached mAP 0.740 for VOC2007 test. After one month, I found out that the key to boost the performance of object detction is not only based on cutting edge model, but also depends on sophisticated augmentation methodology. Therefore I manually checked every image generated by ‘utils/‘. Soon, some confusing images came out:

There are lots of shining noise in these images. The reason is we only use add-operation and multiply-operation to change the contrast/brightness of images, and this may cause some pixels overflow. To prevent it, I use clip() from numpy:

  im, boxes, labels = self.rand_light_noise(im, boxes, labels)
  return np.clip(im, 0, 255), boxes, labels

Now the images looks much normal:

After this tiny modification, the mean AP jumped from 0.740 to 0.769. This is the power of fine-tunned augmentation!
Afterward, I continued to change the augmentation function Expand() in ‘utils/’. The origin code use a fixed value to build a ‘background’ for all images. Then my program will randomly choose images from VOC2012 (crop out foreground objects) as the background. It looks like below:

This method is borrowed from mixup[1,2]. And by using it, the mean AP even reached 0.770.