Advertisement

Residual Joint Attention Network with Graph Structure Inference for Object Detection

  • Chuansheng Xu
  • Gaoyun AnEmail author
  • Qiuqi Ruan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11901)

Abstract

Most object detectors include three main parts, CNN feature extraction, proposal classification, and duplicate detection removal. In this work, focusing on the improvement of the feature extraction, we propose Residual Joint Attention Network, a convolutional neural network using a residual joint attention module which is composed of a spatial attention branch, a channel attention branch, and a residual learning branch within an advanced object detector with graph structure inference. An attention map generated by the joint attention mechanism is used to weight the original features extracted from a specific layer of VGG16 aiming at performing feature recalibration. Besides, the residual learning mechanism is complementary to the joint attention mechanism and keeps good attributes of the original features. Experimental results show that different branches of our residual joint attention module do not contradict each other. By combining them together, the proposed network obtains higher mAP than many advanced detectors including the baseline on VOC dataset.

Keywords

Joint attention Residual learning Graph structure inference Object detection 

Notes

Acknowledgment

This work was supported partly by the National Natural Science Foundation of China (61772067, 61472030, 61471032).

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, NIPS 2012, pp. 1097–1105. Curran Associates Inc. (2012)Google Scholar
  2. 2.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  3. 3.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 (2016)Google Scholar
  4. 4.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRefGoogle Scholar
  5. 5.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference On Computer Vision, ICCV 2015, pp. 1440–1448 (2015)Google Scholar
  6. 6.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, NIPS 2015, pp. 91–99 (2015)Google Scholar
  7. 7.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017 (2017)Google Scholar
  8. 8.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 3431–3440 (2015)Google Scholar
  9. 9.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  10. 10.
    Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 779–788 (2015)Google Scholar
  11. 11.
    Liu, Y., Wang, R., Shan, S., Chen, X.: Structure inference net: object detection using scene-level context and instance-level relationships. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (2018)Google Scholar
  12. 12.
    Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)Google Scholar
  13. 13.
    Zhang, Z., Qiao, S., Xie, C., Wei, S.: Single-shot object detection with enriched semantics. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (2018)Google Scholar
  14. 14.
    Li, Z., Chao, P., Gang, Y., Zhang, X., Jian, S.: DetNet: a backbone network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (2018)Google Scholar
  15. 15.
    Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (2018)CrossRefGoogle Scholar
  16. 16.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arxiv:1709.01507 (2017)
  17. 17.
    Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)Google Scholar
  18. 18.
    Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008 (2008)Google Scholar
  19. 19.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2003 (2003)Google Scholar
  20. 20.
    Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (2018)Google Scholar
  21. 21.
    Uijlings, J.R., Van, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  22. 22.
    Cho, K., Merrienboer, B.V., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: SSST-8 (2014)Google Scholar
  23. 23.
    Everingham, M., Gool, L.V., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  24. 24.
    Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 (2016)Google Scholar
  25. 25.
    Shrivastava, A., Gupta, A.: Contextual priming and feedback for faster R-CNN. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 330–348. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_20CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Information ScienceBeijing Jiaotong UniversityBeijingChina
  2. 2.Beijing Key Laboratory of Advanced Information Science and Network TechnologyBeijingChina

Personalised recommendations