Advertisement

Computational Visual Media

, Volume 4, Issue 3, pp 231–244 | Cite as

Learning adaptive receptive fields for deep image parsing networks

  • Zhen Wei
  • Yao Sun
  • Junyu Lin
  • Si Liu
Open Access
Research Article

Abstract

In this paper, we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks. Unlike previous work which placed much importance on obtaining better receptive fields using manually selected dilated convolutional kernels, our approach uses two affine transformation layers in the network’s backbone and operates on feature maps. Feature maps are inflated or shrunk by the new layer, thereby changing the receptive fields in the following layers. By use of end-to-end training, the whole framework is data-driven, without laborious manual intervention. The proposed method is generic across datasets and different tasks. We have conducted extensive experiments on both general image parsing tasks, and face parsing tasks as concrete examples, to demonstrate the method’s superior ability to regulate over manual designs.

Keywords

semantic segmentation receptive field data-driven face parsing 

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. U1536203, 61572493), the Cutting Edge Technology Research Program of the Institute of Information Engineering, CAS (No. Y7Z0241102), the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of the Ministry of Education (No. Y6Z0021102), and Nanjing University of Science and Technology (No. JYB201702).

References

  1. [1]
    Long, J.; Zhang, N.; Darrell, T. Do convnets learn correspondence? In: Proceedings of the Advances in Neural Information Processing Systems 27, 1601–1609, 2014.Google Scholar
  2. [2]
    Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.Google Scholar
  3. [3]
    Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1528, 2015.Google Scholar
  4. [4]
    Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014.Google Scholar
  5. [5]
    Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.Google Scholar
  6. [6]
    Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
  7. [7]
    Le, V.; Brandt, J.; Lin, Z.; Bourdev, L.; Huang, T. S. Interactive facial feature localization. In: Computer Vision–ECCV 2012. Lecture Notes in Computer Science, Vol. 7574. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer, Berlin, Heidelberg, 679–692, 2012.Google Scholar
  8. [8]
    Smith, B. M.; Zhang, L.; Brandt, J.; Lin, Z.; Yang, J. Exemplar-based face parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3484–3491, 2013.Google Scholar
  9. [9]
    Wei, Z.; Sun, Y.; Wang, J.; Lai, H.; Lui, S. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2434–2442, 2017.Google Scholar
  10. [10]
    Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1606.00915, 2016.Google Scholar
  11. [11]
    Mostajabi, M.; Yadollahpour, P.; Shakhnarovich, G. Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3376–3385, 2015.Google Scholar
  12. [12]
    Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 2017–2025, 2015.Google Scholar
  13. [13]
    Chen, D.; Hua, G.; Wen, F.; Sun, J. Supervised transformer network for efficient face detection. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer, Cham, 122–138, 2016.Google Scholar
  14. [14]
    Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 764–773, 2017.Google Scholar
  15. [15]
    Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P. H. S. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1529–1537, 2015.Google Scholar
  16. [16]
    Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, 448–456, 2015.Google Scholar
  17. [17]
    Zhang, R.; Isola, P.; Efros, A. A. Colorful image colorization. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer, Cham, 649–666, 2016.Google Scholar
  18. [18]
    Yamashita, T.; Nakamura, T.; Fukui, H.; Yamauchi, Y.; Fujiyoshi, H. Cost-alleviative learning for deep convolutional neural network-based facial part labeling. IPSJ Transactions on Computer Vision and Applications Vol. 7, 99–103, 2015.CrossRefGoogle Scholar
  19. [19]
    Liu, S.; Yang, J.; Huang, C.; Yang, M.-H. Multiobjective convolutional learning for face labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3451–3459, 2015.Google Scholar
  20. [20]
    Sun, Y.; Wang, X.; Tang, X. Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3476–3483, 2013.Google Scholar
  21. [21]
    Everingham, M.; Van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303–338, 2010.CrossRefGoogle Scholar
  22. [22]
    Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer, Cham, 297–312, 2014.Google Scholar
  23. [23]
    Liu, C.; Yuen, J.; Torralba, A. Nonparametric scene parsing via label transfer. IEEE Transaction on Pattern Analysis and Machine Intelligence Vol. 33, No. 12, 2368–2382, 2011.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Information Security, Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  4. 4.Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of EducationNanjing University of Science and TechnologyNanjingChina

Personalised recommendations