Advertisement

Enhance the recognition ability to occlusions and small objects with Robust Faster R-CNN

  • Tao Zhou
  • Zhixin LiEmail author
  • Canlong Zhang
Original Article
  • 104 Downloads

Abstract

Recognizing objects with vastly different size scales and objects with occlusions is a fundamental challenge in computer vision. This paper addresses this issue by proposing a novel approach denoted as Robust Faster R-CNN for detecting objects in multi-label images. Robust Faster R-CNN employs a cascaded network structure based on the Faster R-CNN architecture to extract features from objects with different size scales. However, the proposed design provides greater robustness than Faster R-CNN by replacing the RoIPooling operation with RoIAligns to eliminate the harsh quantization conducted by RoIPooling, and we design a multi-scale RoIAligns operation by adding multiple pool sizes for adapting the detection ability of the network to objects with different sizes. Furthermore, we combine an adversarial network with the proposed network to generate training samples with occlusions significantly affecting the classification ability of the model, which improves its robustness to occlusions. Experimental results for the PASCAL VOC 2012 and 2007 datasets demonstrate the superiority of the proposed object detection approach relative to several state-of-the-art approaches.

Keywords

Object detection Robust Faster R-CNN Multi-cascaded network Adversarial network Feature fusion 

Notes

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61966004, 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

References

  1. 1.
    Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112Google Scholar
  2. 2.
    Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387Google Scholar
  3. 3.
    Everingham M, Williams C (2010) The pascal visual object classes challenge 2010 (voc2010). In: International conference on machine learning, pp 117–176Google Scholar
  4. 4.
    Girshick R (2015) Fast r-cnn. In: Advances in neural information processing systems, pp 91–99Google Scholar
  5. 5.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 580–587Google Scholar
  6. 6.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904CrossRefGoogle Scholar
  7. 7.
    He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 99:1–1Google Scholar
  8. 8.
    Huang G, Liu Z, Laurens VDM, Weinberger KQ (2016) Densely connected convolutional networks. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 2261–2269Google Scholar
  9. 9.
    Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: Rotational region cnn for orientation robust scene text detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 2261–2269Google Scholar
  10. 10.
    Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853Google Scholar
  11. 11.
    Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, vol 1Google Scholar
  12. 12.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  13. 13.
    Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125Google Scholar
  14. 14.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2015) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37Google Scholar
  15. 15.
    Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 1717–1724Google Scholar
  16. 16.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Computer vision and pattern recognitionGoogle Scholar
  17. 17.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  18. 18.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRefGoogle Scholar
  19. 19.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognitionGoogle Scholar
  20. 20.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
  21. 21.
    Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligenceGoogle Scholar
  22. 22.
    Tao Z, Li Z, Zhang C, Lan L (2018) An improved convolutional neural network model with adversarial net for multi-label image classification. In: Pacific Rim international conference on artificial intelligenceGoogle Scholar
  23. 23.
    Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171CrossRefGoogle Scholar
  24. 24.
    Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 21–26Google Scholar
  25. 25.
    Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybernet 9(7):1085–1100CrossRefGoogle Scholar
  26. 26.
    Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: A flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907CrossRefGoogle Scholar
  27. 27.
    Zheng Y, Li Z, Zhang C (2018) A hybrid architecture based on cnn for cross-modal semantic instance annotation. Multimedia Tools and Applications 77(7):8695–8710CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Guangxi Key Lab of Multi-source Information Mining and SecurityGuangxi Normal UniversityGuilinChina

Personalised recommendations