Skip to main content
Log in

Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep convolution neural networks (DCNNs) in deep learning have been widely used in semantic segmentation. However, the filters of most regular convolutions in DCNNs are spatially invariant to local transformations, which reduces localization accuracy and hinders the improvement of semantic segmentation. Dynamic convolution with pixel-level filters can enhance the localization accuracy through its region-awareness, but these are sensitive to objects with large-scale variations in semantic segmentation. To simultaneously address the low localization accuracy and objects with large-scale variations, we propose a filter-varying atrous convolution (FAC) to efficiently enlarge the per-pixel receptive fields pertaining to various objects. FAC mainly consists of a conditional-filter-generating network (CFGN) and a dynamic local filtering operation (DLFO). In the CFGN, a class probability map is used to generate the corresponding filters, making the FAC genuinely dynamic. In the DLFO, by replacing the sliding convolution operation one by one with a one-time dot product operation, the efficiency of the algorithm is greatly improved. Also, a dense scale module (DSM) is constructed to generate denser scales and larger receptive fields for exploring long-range contextual information. Finally, a dense-scale dynamic network (DsDNet) simultaneously enhances the localization accuracy and reduces the effect of large-scale variations of the object, by assigning FAC to different spatial locations at dense scales. In addition, to accelerate network convergence and improve segmentation accuracy, our network employs two pixel-wise cross-entropy loss functions. One is between the Backbone and DSM, and the other is at the network’s end. Extensive experiments on Cityscapes, PASCAL VOC 2012, and ADE20K datasets verify that the performance of our DsDNet is superior to the non-dynamic and multi-scale convolution neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Li Z, Jiang J, Chen X, Qi H, Li Q, Liu J, Zheng L, Liu M, Zhang Y (2022) Superdense-scale network for semantic segmentation. Neurocomputing 504:30–41

    Article  Google Scholar 

  2. Wang D, Zhang J, Du B, Zhang L, Tao D (2023) Dcn-t: Dual context network with transformer for hyperspectral image classification. IEEE Trans Image Process 32:2536–2551. https://doi.org/10.1109/TIP.2023.3270104

    Article  Google Scholar 

  3. Sang S, Zhou Y, Islam MT, Xing L (2023) Small-object sensitive segmentation using across feature map attention. IEEE Trans Pattern Anal Mach Intell 45(5):6289–6306. https://doi.org/10.1109/TPAMI.2022.3211171

    Article  Google Scholar 

  4. Zhang J, Liu Y, Guo C, Zhan J (2022) Optimized segmentation with image inpainting for semantic mapping in dynamic scenes. Appl Intell 1–16

  5. Hou C, Zhang W, Wang H, Liu F, Liu D, Chang J (2022) A semantic segmentation model for lumbar mri images using divergence loss. Appl Intell 1–14

  6. Wang C, Zhong J, Dai Q, Li R, Yu Q, Fang B (2022) Local structure consistency and pixel-correlation distillation for compact semantic segmentation. Appl Intell 1–17

  7. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2022) Image segmentation using deep learning: A survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542. https://doi.org/10.1109/TPAMI.2021.3059968

    Article  Google Scholar 

  8. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  9. Kwon HJ, Koo HI, Soh JW, Cho NI (2022) Inverse-based approach to explaining and visualizing convolutional neural networks. IEEE Trans Neural Netw Learn Syst 33(12):7318–7329. https://doi.org/10.1109/TNNLS.2021.3084757

    Article  Google Scholar 

  10. Liu J, He J, Qiao Y, Ren JS, Li H (2020) Learning to predict contextadaptive convolution for semantic segmentation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision - ECCV 2020. Springer, Cham, pp 769–786

    Google Scholar 

  11. Yang B, Bender G, Le QV, Ngiam J (2019) Condconv: Conditionally parameterized convolutions for efficient inference. In: Advances in neural information processing systems, pp 1307–1318

  12. Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: Attention over convolution kernels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11030–11039

  13. Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381

    Article  Google Scholar 

  14. Chen J, Wang X, Guo Z, Zhang X, Sun J (2021) Dynamic region-aware convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8064–8073

  15. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  16. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  17. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641

  18. Yu B, Jiao L, Liu X, Li L, Liu F, Yang S, Tang X (2022) Entire deformable convnets for semantic segmentation. Knowl-Based Syst 108871

  19. Lu L, Xiao Y, Chang X, Wang X, Ren P, Ren Z (2022) Deformable attention-oriented feature pyramid network for semantic segmentation. Knowl-Based Syst 109623

  20. Zhou J, Jampani V, Pi Z, Liu Q, Yang M-H (2021) Decoupled dynamic filter networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6643–6652 . https://doi.org/10.1109/CVPR46437.2021.00658

  21. Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang MY, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2022) Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans Pattern Anal Mach Intell 44(11):7778–7796. https://doi.org/10.1109/TPAMI.2021.3117983

    Article  Google Scholar 

  22. Liu Y, Fan B, Wang L, Bai J, Xiang S, Pan C (2018) Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J Photogrammetry Remote Sensing 145:78–95

    Article  Google Scholar 

  23. Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3684–3692

  24. Xu J, Li Y, Wang S (2022) Adazoom: Towards scale-aware large scene object detection. IEEE Trans Multimedia 1–1. https://doi.org/10.1109/TMM.2022.3178871

  25. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  26. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoderdecoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  27. He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7519–7528

  28. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154

  29. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154

  30. Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7151–7160

  31. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, et al. (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell

  32. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  33. Jia X, De Brabandere B, Tuytelaars T, Gool LV (2016) Dynamic filter networks. In: Advances in neural information processing systems, pp 667–675

  34. Rota Buló S, Porzi L, Kontschieder P (2018) In-place activated batchnorm for memory-optimized training of dnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5639–5647

  35. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International conference on learning representations, pp 10–19

  36. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  37. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  38. Liu C, Chen L-C, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92

  39. Chen L-C, Collins M, Zhu Y, Papandreou G, Zoph B, Schroff F, Adam H, Shlens J (2018) Searching for efficient multi-scale architectures for dense image prediction. In: Advances in neural information processing systems, pp 8699–8710

  40. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision, pp 108–126

  41. Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: European conference on computer vision, pp 173–190

  42. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755

  43. Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3194–3203

  44. Zhang H, Zhang H, Wang C, Xie J (2019) Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 548–557

  45. He J, Deng Z, Qiao Y (2019) Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3562–3572

  46. Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456

  47. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 603–612

  48. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  49. Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018) A \(\hat{}\) 2-nets: Double attention networks. In: Advances in neural information processing systems, pp 352–361

  50. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Project of China (2017YFE0100700), in part by the National Natural Science Foundation of China (Grant No.41871340), in part by the Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, No. 2020KEY003, in part by Beijing Municipal Science & Technology Commission No. Z201100003920003, in part by the Key Research Project of Shanghai Agricultural Science and Technology (Grant No. SASTI-2018-2-1), in part by the Fundamental Research Funds for the Central Universities, and in part by Shanghai "Science and Technology Innovation Action Plan" Project (22002400300)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Chen.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jie Jiang, Xi Chen, Robert Laganière, Qingli Li, Min Liu, Honggang Qi, Yong Wang and Min Zhang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Jiang, J., Chen, X. et al. Dense-scale dynamic network with filter-varying atrous convolution for semantic segmentation. Appl Intell 53, 26810–26826 (2023). https://doi.org/10.1007/s10489-023-04935-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04935-4

Keywords

Navigation