Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation

Li, Zhiqiang; Jiang, Jie; Chen, Xi; Laganière, Robert; Li, Qingli; Liu, Min; Qi, Honggang; Wang, Yong; Zhang, Min

doi:10.1007/s10489-023-04935-4

Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation

Published: 29 August 2023

Volume 53, pages 26810–26826, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhiqiang Li^1,2,3,4,
Jie Jiang^1,2,
Xi Chen ORCID: orcid.org/0000-0001-8611-0742^1,2,3,4,
Robert Laganière⁵,
Qingli Li⁴,
Min Liu^1,2,
Honggang Qi⁶,
Yong Wang⁷ &
…
Min Zhang⁸

154 Accesses
1 Citation
Explore all metrics

Abstract

Deep convolution neural networks (DCNNs) in deep learning have been widely used in semantic segmentation. However, the filters of most regular convolutions in DCNNs are spatially invariant to local transformations, which reduces localization accuracy and hinders the improvement of semantic segmentation. Dynamic convolution with pixel-level filters can enhance the localization accuracy through its region-awareness, but these are sensitive to objects with large-scale variations in semantic segmentation. To simultaneously address the low localization accuracy and objects with large-scale variations, we propose a filter-varying atrous convolution (FAC) to efficiently enlarge the per-pixel receptive fields pertaining to various objects. FAC mainly consists of a conditional-filter-generating network (CFGN) and a dynamic local filtering operation (DLFO). In the CFGN, a class probability map is used to generate the corresponding filters, making the FAC genuinely dynamic. In the DLFO, by replacing the sliding convolution operation one by one with a one-time dot product operation, the efficiency of the algorithm is greatly improved. Also, a dense scale module (DSM) is constructed to generate denser scales and larger receptive fields for exploring long-range contextual information. Finally, a dense-scale dynamic network (DsDNet) simultaneously enhances the localization accuracy and reduces the effect of large-scale variations of the object, by assigning FAC to different spatial locations at dense scales. In addition, to accelerate network convergence and improve segmentation accuracy, our network employs two pixel-wise cross-entropy loss functions. One is between the Backbone and DSM, and the other is at the network’s end. Extensive experiments on Cityscapes, PASCAL VOC 2012, and ADE20K datasets verify that the performance of our DsDNet is superior to the non-dynamic and multi-scale convolution neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

Article 28 November 2019

Learning More Accurate Features for Semantic Segmentation in CycleNet

Enhanced multi-scale networks for semantic segmentation

Article Open access 04 December 2023

References

Li Z, Jiang J, Chen X, Qi H, Li Q, Liu J, Zheng L, Liu M, Zhang Y (2022) Superdense-scale network for semantic segmentation. Neurocomputing 504:30–41
Article Google Scholar
Wang D, Zhang J, Du B, Zhang L, Tao D (2023) Dcn-t: Dual context network with transformer for hyperspectral image classification. IEEE Trans Image Process 32:2536–2551. https://doi.org/10.1109/TIP.2023.3270104
Article Google Scholar
Sang S, Zhou Y, Islam MT, Xing L (2023) Small-object sensitive segmentation using across feature map attention. IEEE Trans Pattern Anal Mach Intell 45(5):6289–6306. https://doi.org/10.1109/TPAMI.2022.3211171
Article Google Scholar
Zhang J, Liu Y, Guo C, Zhan J (2022) Optimized segmentation with image inpainting for semantic mapping in dynamic scenes. Appl Intell 1–16
Hou C, Zhang W, Wang H, Liu F, Liu D, Chang J (2022) A semantic segmentation model for lumbar mri images using divergence loss. Appl Intell 1–14
Wang C, Zhong J, Dai Q, Li R, Yu Q, Fang B (2022) Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Appl Intell 1–17
Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2022) Image segmentation using deep learning: A survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542. https://doi.org/10.1109/TPAMI.2021.3059968
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Kwon HJ, Koo HI, Soh JW, Cho NI (2022) Inverse-based approach to explaining and visualizing convolutional neural networks. IEEE Trans Neural Netw Learn Syst 33(12):7318–7329. https://doi.org/10.1109/TNNLS.2021.3084757
Article Google Scholar
Liu J, He J, Qiao Y, Ren JS, Li H (2020) Learning to predict contextadaptive convolution for semantic segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision - ECCV 2020. Springer, Cham, pp 769–786
Google Scholar
Yang B, Bender G, Le QV, Ngiam J (2019) Condconv: Conditionally parameterized convolutions for efficient inference. In: Advances in neural information processing systems, pp 1307–1318
Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11030–11039
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Article Google Scholar
Chen J, Wang X, Guo Z, Zhang X, Sun J (2021) Dynamic region-aware convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8064–8073
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Article Google Scholar
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641
Yu B, Jiao L, Liu X, Li L, Liu F, Yang S, Tang X (2022) Entire deformable convnets for semantic segmentation. Knowl-Based Syst 108871
Lu L, Xiao Y, Chang X, Wang X, Ren P, Ren Z (2022) Deformable attention-oriented feature pyramid network for semantic segmentation. Knowl-Based Syst 109623
Zhou J, Jampani V, Pi Z, Liu Q, Yang M-H (2021) Decoupled dynamic filter networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6643–6652 . https://doi.org/10.1109/CVPR46437.2021.00658
Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang MY, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2022) Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans Pattern Anal Mach Intell 44(11):7778–7796. https://doi.org/10.1109/TPAMI.2021.3117983
Article Google Scholar
Liu Y, Fan B, Wang L, Bai J, Xiang S, Pan C (2018) Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J Photogrammetry Remote Sensing 145:78–95
Article Google Scholar
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3684–3692
Xu J, Li Y, Wang S (2022) Adazoom: Towards scale-aware large scene object detection. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2022.3178871
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoderdecoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7519–7528
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151–7160
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, et al. (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Jia X, De Brabandere B, Tuytelaars T, Gool LV (2016) Dynamic filter networks. In: Advances in neural information processing systems, pp 667–675
Rota Buló S, Porzi L, Kontschieder P (2018) In-place activated batchnorm for memory-optimized training of dnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5639–5647
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International conference on learning representations, pp 10–19
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Liu C, Chen L-C, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92
Chen L-C, Collins M, Zhu Y, Papandreou G, Zoph B, Schroff F, Adam H, Shlens J (2018) Searching for efficient multi-scale architectures for dense image prediction. In: Advances in neural information processing systems, pp 8699–8710
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision, pp 108–126
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: European conference on computer vision, pp 173–190
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755
Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3194–3203
Zhang H, Zhang H, Wang C, Xie J (2019) Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 548–557
He J, Deng Z, Qiao Y (2019) Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3562–3572
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 603–612
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) A \(\hat{}\) 2-nets: Double attention networks. In: Advances in neural information processing systems, pp 352–361
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Project of China (2017YFE0100700), in part by the National Natural Science Foundation of China (Grant No.41871340), in part by the Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, No. 2020KEY003, in part by Beijing Municipal Science & Technology Commission No. Z201100003920003, in part by the Key Research Project of Shanghai Agricultural Science and Technology (Grant No. SASTI-2018-2-1), in part by the Fundamental Research Funds for the Central Universities, and in part by Shanghai "Science and Technology Innovation Action Plan" Project (22002400300)

Author information

Authors and Affiliations

School of Geographic Sciences, East China Normal University, Shanghai, 200241, China
Zhiqiang Li, Jie Jiang, Xi Chen & Min Liu
The Key Laboratory of Geographic Information Science, Ministry of Education of China, East China Normal University, Shanghai, 200241, China
Zhiqiang Li, Jie Jiang, Xi Chen & Min Liu
Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, East China Normal University, Shanghai, 200241, China
Zhiqiang Li & Xi Chen
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, 200241, China
Zhiqiang Li, Xi Chen & Qingli Li
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, K1N 6N5, ON, Canada
Robert Laganière
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China
Honggang Qi
School of Aeronautics and Astronautics, Sun Yat-sen University, Guangzhou, 518107, China
Yong Wang
Engineering University of PAP, Xi’an, 710086, China
Min Zhang

Authors

Zhiqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Robert Laganière
View author publications
You can also search for this author in PubMed Google Scholar
Qingli Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Liu
View author publications
You can also search for this author in PubMed Google Scholar
Honggang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Chen.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jie Jiang, Xi Chen, Robert Laganière, Qingli Li, Min Liu, Honggang Qi, Yong Wang and Min Zhang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Z., Jiang, J., Chen, X. et al. Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation. Appl Intell 53, 26810–26826 (2023). https://doi.org/10.1007/s10489-023-04935-4

Download citation

Accepted: 01 August 2023
Published: 29 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04935-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

Learning More Accurate Features for Semantic Segmentation in CycleNet

Enhanced multi-scale networks for semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

Learning More Accurate Features for Semantic Segmentation in CycleNet

Enhanced multi-scale networks for semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation