Advertisement

Learning Diversified Features for Object Detection via Multi-region Occlusion Example Generating

  • Junsheng Liang
  • Zhiqiang Li
  • Hongchen GuoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11440)

Abstract

Object detection refers to the classification and localization of objects within an image by learning their diversified features. However, the existing detection models are usually sensitive to the important features in some local regions of the object. The existing algorithms cannot learn the diversified features regarding to each region effectively, which limit the performance of the model to a certain range. In this paper, we propose a novel and principle method called Multi-region Occlusion Example Generating (MOEG) to guide the detection model in fully learning the features of the various regions of the object. MOEG can generate completely new occlusion examples and it enables our detection model to learn the features of the remaining regions in the object by blocking the important regions in the proposal. It is a general method to generate occlusion examples and it can be implemented to most mainstream region-based detectors very easily such as Fast-RCNN and Faster-RCNN. Our experimental results indicate a \(2.4\%\) mAP boost on VOC2007 dataset and a 4.1% mAP boost on VOC2012 dataset compared to the Fast-RCNN pipeline. And as datasets become larger and more challenge, our method MOEG become more effective as demonstrated by the results on the MS COCO dataset.

Keywords

Feature extraction Data augmentation Object detection 

References

  1. 1.
    Bell, S., Lawrence Zitnick, C., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2874–2883 (2016)Google Scholar
  2. 2.
    Chen, X., Gupta, A.: Spatial memory for context reasoning in object detection. arXiv preprint arXiv:1704.04224 (2017)
  3. 3.
    Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018)
  4. 4.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)Google Scholar
  5. 5.
    Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Farabet, C., Couprie, C., Najman, L., Lecun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)CrossRefGoogle Scholar
  7. 7.
    Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)Google Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  9. 9.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_23CrossRefGoogle Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Li, M., Zhang, Z., Yu, H., Chen, X., Li, D.: S-OHEM: stratified online hard example mining for object detection. In: Yang, J., et al. (eds.) CCCV 2017. CCIS, vol. 773, pp. 166–177. Springer, Singapore (2017).  https://doi.org/10.1007/978-981-10-7305-2_15CrossRefGoogle Scholar
  16. 16.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)Google Scholar
  17. 17.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  18. 18.
    Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. Mathematics (2016)Google Scholar
  19. 19.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  20. 20.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)Google Scholar
  22. 22.
    Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  23. 23.
    Szegedy, C., et al.: Going deeper with convolutions, pp. 1–9 (2014)Google Scholar
  24. 24.
    Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  25. 25.
    Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. arXiv preprint arXiv:1704.03414 2 (2017)
  26. 26.
    Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated bi-directional CNN for object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 354–369. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_22CrossRefGoogle Scholar
  27. 27.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina

Personalised recommendations