Learning adaptive receptive fields for deep image parsing networks

Wei, Zhen; Sun, Yao; Lin, Junyu; Liu, Si

doi:10.1007/s41095-018-0112-1

Learning adaptive receptive fields for deep image parsing networks

Research Article
Open access
Published: 04 April 2018

Volume 4, pages 231–244, (2018)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Learning adaptive receptive fields for deep image parsing networks

Download PDF

Zhen Wei^1,2,
Yao Sun¹,
Junyu Lin³ &
…
Si Liu^1,4

705 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks. Unlike previous work which placed much importance on obtaining better receptive fields using manually selected dilated convolutional kernels, our approach uses two affine transformation layers in the network’s backbone and operates on feature maps. Feature maps are inflated or shrunk by the new layer, thereby changing the receptive fields in the following layers. By use of end-to-end training, the whole framework is data-driven, without laborious manual intervention. The proposed method is generic across datasets and different tasks. We have conducted extensive experiments on both general image parsing tasks, and face parsing tasks as concrete examples, to demonstrate the method’s superior ability to regulate over manual designs.

Article PDF

Supervised Transformer Network for Efficient Face Detection

Semantic convolutional features for face detection

Article 30 October 2021

Face Recognition Using EfficientNet

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Long, J.; Zhang, N.; Darrell, T. Do convnets learn correspondence? In: Proceedings of the Advances in Neural Information Processing Systems 27, 1601–1609, 2014.
Google Scholar
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Google Scholar
Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1528, 2015.
Google Scholar
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014.
Google Scholar
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
Google Scholar
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Google Scholar
Le, V.; Brandt, J.; Lin, Z.; Bourdev, L.; Huang, T. S. Interactive facial feature localization. In: Computer Vision–ECCV 2012. Lecture Notes in Computer Science, Vol. 7574. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer, Berlin, Heidelberg, 679–692, 2012.
Google Scholar
Smith, B. M.; Zhang, L.; Brandt, J.; Lin, Z.; Yang, J. Exemplar-based face parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3484–3491, 2013.
Google Scholar
Wei, Z.; Sun, Y.; Wang, J.; Lai, H.; Lui, S. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2434–2442, 2017.
Google Scholar
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1606.00915, 2016.
Google Scholar
Mostajabi, M.; Yadollahpour, P.; Shakhnarovich, G. Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3376–3385, 2015.
Google Scholar
Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 2017–2025, 2015.
Google Scholar
Chen, D.; Hua, G.; Wen, F.; Sun, J. Supervised transformer network for efficient face detection. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer, Cham, 122–138, 2016.
Google Scholar
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, 764–773, 2017.
Google Scholar
Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P. H. S. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1529–1537, 2015.
Google Scholar
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, 448–456, 2015.
Google Scholar
Zhang, R.; Isola, P.; Efros, A. A. Colorful image colorization. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer, Cham, 649–666, 2016.
Google Scholar
Yamashita, T.; Nakamura, T.; Fukui, H.; Yamauchi, Y.; Fujiyoshi, H. Cost-alleviative learning for deep convolutional neural network-based facial part labeling. IPSJ Transactions on Computer Vision and Applications Vol. 7, 99–103, 2015.
Article Google Scholar
Liu, S.; Yang, J.; Huang, C.; Yang, M.-H. Multiobjective convolutional learning for face labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3451–3459, 2015.
Google Scholar
Sun, Y.; Wang, X.; Tang, X. Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3476–3483, 2013.
Google Scholar
Everingham, M.; Van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303–338, 2010.
Article Google Scholar
Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014. Lecture Notes in Computer Science, Vol. 8695. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer, Cham, 297–312, 2014.
Google Scholar
Liu, C.; Yuen, J.; Torralba, A. Nonparametric scene parsing via label transfer. IEEE Transaction on Pattern Analysis and Machine Intelligence Vol. 33, No. 12, 2368–2382, 2011.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. U1536203, 61572493), the Cutting Edge Technology Research Program of the Institute of Information Engineering, CAS (No. Y7Z0241102), the Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of the Ministry of Education (No. Y6Z0021102), and Nanjing University of Science and Technology (No. JYB201702).

Author information

Authors and Affiliations

State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Zhen Wei, Yao Sun & Si Liu
University of Chinese Academy of Sciences, Beijing, 101408, China
Zhen Wei
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
Junyu Lin
Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, China
Si Liu

Authors

Zhen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Si Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Sun.

Additional information

Zhen Wei received his B.S. degree in computer science and technology from Yingcai Honors School, the University of Electronic Science and Technology of China, Chengdu, China. He is now a master student in the Institute of Information Engineering, the Chinese Academy of Sciences.

Yao Sun is an associate professor in the Institute of Information Engineering, Chinese Academy of Sciences. He received his Ph.D. degree from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences.

Junyu Lin is assistant director of the Laboratory of Cyberspace Technology of the Institute of Information Engineering, Chinese Academy of Sciences. He is a member of the CCF YOCSEF academic committee and the CCF TCAPP standing committee. He is also the member of CCF council. He has more than 50 publications in Peer to Peer Networking and Applications, the Journal of Software, and IEEE conferences and journals.

Si Liu is an associate professor in the Institute of Information Engineering, Chinese Academy of Sciences. She was a research fellow in the Learning and Vision Research Group at National University of Singapore. She obtained her Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences. Her research interests include object categorization, object detection, image parsing, and human pose estimation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wei, Z., Sun, Y., Lin, J. et al. Learning adaptive receptive fields for deep image parsing networks. Comp. Visual Media 4, 231–244 (2018). https://doi.org/10.1007/s41095-018-0112-1

Download citation

Received: 06 December 2017
Accepted: 14 January 2018
Published: 04 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s41095-018-0112-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning adaptive receptive fields for deep image parsing networks

Abstract

Article PDF

Similar content being viewed by others

Supervised Transformer Network for Efficient Face Detection

Semantic convolutional features for face detection

Face Recognition Using EfficientNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning adaptive receptive fields for deep image parsing networks

Abstract

Article PDF

Similar content being viewed by others

Supervised Transformer Network for Efficient Face Detection

Semantic convolutional features for face detection

Face Recognition Using EfficientNet

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation