In the previous article, I reached mAP 0.740 for VOC2007 test. After one month, I found out that the key to boost the performance of object detction is not only based on cutting edge model, but also depends on sophisticated augmentation methodology. Therefore I manually checked every image generated by ‘utils/augmentations.py‘. Soon, some confusing images came out:
data:image/s3,"s3://crabby-images/e8a90/e8a90809a80ccf66d92d243df88b52dac8e6827f" alt=""
data:image/s3,"s3://crabby-images/a1a80/a1a803c20dd483928e1468a7b311093aa833f6bf" alt=""
data:image/s3,"s3://crabby-images/84a15/84a150dd271d9bba8063a0640a9501ee4cef38b9" alt=""
There are lots of shining noise in these images. The reason is we only use add-operation and multiply-operation to change the contrast/brightness of images, and this may cause some pixels overflow. To prevent it, I use clip() from numpy:
im, boxes, labels = self.rand_light_noise(im, boxes, labels)
return np.clip(im, 0, 255), boxes, labels
Now the images looks much normal:
data:image/s3,"s3://crabby-images/2d051/2d05194d14d3dc9da0278b5b899e1bb3416491f3" alt=""
data:image/s3,"s3://crabby-images/6c7aa/6c7aa148acba80a0b7008b464fe6a82fb6726061" alt=""
After this tiny modification, the mean AP jumped from 0.740 to 0.769. This is the power of fine-tunned augmentation!
Afterward, I continued to change the augmentation function Expand() in ‘utils/augmentations.py’. The origin code use a fixed value to build a ‘background’ for all images. Then my program will randomly choose images from VOC2012 (crop out foreground objects) as the background. It looks like below:
data:image/s3,"s3://crabby-images/d7966/d7966fc4308f0c01e5cd683e3d7bc31dbfce8ff6" alt=""
data:image/s3,"s3://crabby-images/9dd50/9dd5087b52272134dbea7d75a88ba77961bf357f" alt=""
data:image/s3,"s3://crabby-images/410f4/410f47b8d994b92997cf9e2fe9d34592a8b29dee" alt=""
This method is borrowed from mixup[1,2]. And by using it, the mean AP even reached 0.770.