Abstract
Recognizing objects at vastly different scales and objects with occlusion is a fundamental challenge in computer vision. In this paper, we propose a novel method called Robust Faster R-CNN for detecting objects in multi-label images. The framework is based on Faster R-CNN architecture. We improve the Faster R-CNN by replacing ROIpoolings with ROIAligns to remove the harsh quantization of RoIPool and we design multi-ROIAligns by adding different sizes’ pooling(Aligns operation) in order to adapt to different sizes of objects. Furthermore, we adopt multi-feature fusion to enhance the ability to recognize small objects. In model training, we train an adversarial network to generate examples with occlusions and combine it with our model to make our model invariant to occlusions. Experimental results on Pascal VOC 2012 and 2007 datasets demonstrate the superiority of the proposed approach over many state-of-the-arts approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Everingham, M., Williams, C.: The PASCAL visual object classes challenge 2010 (VOC2010). In: International Conference on Machine Learning, pp. 117–176 (2010)
Girshick, R.: Fast R-CNN. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904 (2015)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017)
Huang, G., Liu, Z., Laurens, V.D.M., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2016)
Jiang, Y., et al.: R2CNN: rotational region CNN for orientation robust scene text detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1 (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Zhou, T., Li, Z., Zhang, C., Lin, L.: An improved convolutional neural network model with adversarial net for multi-label image classification. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11013, pp. 38–46. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97310-4_5
Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 21–26 (2017)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos. 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), the Guangxi Bagui Scholar Teams for Innovation and Research Project, and Innovation Project of Guangxi Graduate Education under grant (XYCSZ2018077).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, T., Li, Z., Zhang, C. (2019). Robust Faster R-CNN: Increasing Robustness to Occlusions and Multi-scale Objects. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-26142-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)