Skip to main content
Log in

Supervised semantic segmentation based on deep learning: a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, many semantic segmentation methods based on fully supervised learning are leading the way in the computer vision field. In particular, deep neural networks headed by convolutional neural networks can effectively solve many challenging semantic segmentation tasks. To realize more refined semantic image segmentation, this paper studies the semantic segmentation task with a novel perspective, in which three key issues affecting the segmentation effect are considered. Firstly, it is hard to predict the classification results accurately in the high-resolution map from the reduced feature map since the scales are different between them. Secondly, the multi-scale characteristics of the target and the complexity of the background make it difficult to extract semantic features. Thirdly, the problem of intra-class differences and inter-class similarities can lead to incorrect classification of the boundary. To find the solutions to the above issues based on existing methods, the inner connection between past research and ongoing research is explored in this paper. In addition, qualitative and quantitative analyses are made, which can help the researchers to establish an intuitive understanding of various methods. At last, some conclusions about the existing methods are drawn to enhance segmentation performance. Moreover, the deficiencies of existing methods are researched and criticized, and a guide for future directions is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data presented in this study are available on request from the corresponding author.

References

  1. Mardia KV, Hainsworth TJ (1988) A spatial thresholding method for image segmentation. IEEE Transactions on Pattern and Machine Intelligence 10(6):919–927

    Article  Google Scholar 

  2. Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. in Proceedings of 26th IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2008.4587503.

  3. Li SZ (1994) Markov random field models in computer vision. In proceedings of computer vision—ECCV 1994 - 3rd European conference on computer vision, pp. 361-370. https://doi.org/10.1007/bfb0028368.

  4. Lafferty JD, Mccallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In proceedings of the eighteenth international conference on machine learning, pp. 282-289.

  5. Adams R, Bishof L (1994) Seeded region growing. IEEE Trans Pattern Anal Mach Intell 16(6):641–647. https://doi.org/10.1109/34.295913

    Article  Google Scholar 

  6. Lakshmi S, Sankaranarayanan DV (2010) A study of edge detection techniques for segmentation computing approaches. International Journal of Computer Applications 1:35–41

    Article  Google Scholar 

  7. Liu ST, Yin FL (2012) The basic principle and its new advances of image segmentation methods based on graph cuts. Acta Automat Sin 38(6):911–922. https://doi.org/10.3724/SP.J.1004.2012.00911

    Article  MathSciNet  Google Scholar 

  8. Simonyan K, Zisserman (2015) Very deep convolutional networks for large-scale image recognition. in Proceedings of 3rd International Conference on Learning Representations.

  9. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. in Proceedings of 26th Annual Conference on Neural Information Processing Systems 2:1097–1105

    Google Scholar 

  10. Szegedy C, Liu W, Jia Y, et al. (2015) Going deeper with convolutions. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1-9. https://doi.org/10.1109/CVPR.2015.7298594.

  11. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 770-778. https://doi.org/10.1109/CVPR.2016.90.

  12. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In proceedings of IEEE conference on computer vision and pattern recognition, pp. 3431-3440. https://doi.org/10.1109/CVPR.2015.7298965.

  13. Garcia-Garcia A, Orts-Escolano S, Oprea S, et al. (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857.

  14. Ghosh S, Das N, Das I, et al. (2019) Understanding deep learning techniques for image segmentation. ACM Computing Surveys, vol. 52, no. 4, pp. 40. https://doi.org/10.1145/3329784.

  15. Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. International journal of multimedia information retrieval 7(2):87–93

    Article  Google Scholar 

  16. Dumoulin V, Visin F. (2016). A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285. https://arxiv.org/abs/1603.07285.

  17. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In proceedings of the IEEE international conference on computer vision, pp. 1520-1528. https://doi.org/10.1109/ICCV.2015.178.

  18. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  19. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In proceedings of 32nd international conference on machine learning, pp. 448-456.

  20. Kendall A, Badrinarayanan V, Cipolla R (2017) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. in Proceedings of British Machine Vision Conference. https://doi.org/10.5244/c.31.57.

  21. Zhang Z, Zhang X, Peng C, et al. (2018) ExFuse: enhancing feature fusion for semantic segmentation. in Proceedings of Computer Vision – ECCV 2018 - 15th European Conference, vol. 10, pp. 273–288. https://doi.org/10.1007/978-3-030-01249-6_17.

  22. Yu C, Wang J, Peng C, et al. (2018) Learning a discriminative feature network for semantic segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1857-1866. https://doi.org/10.1109/CVPR.2018.00199.

  23. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In proceedings of medical image computing and computer-assisted intervention – MICCAI 2015 - 18th international conference, pp. 234-241.https://doi.org/10.1007/978-3-319-24574-4_28.

  24. Lin G, Milan A, Shen C, et al. (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In proceedings of 30th IEEE conference on computer vision and pattern recognition, pp. 5168-5177. https://doi.org/10.1109/CVPR.2017.549.

  25. Peng C, Zhang X, Yu G, et al. (2017) Large kernel matters — improve semantic segmentation by global convolutional network. In proceedings of 30th IEEE conference on computer vision and pattern recognition, pp. 1743-1751. https://doi.org/10.1109/CVPR.2017.189.

  26. Sun K, Xiao B, Liu D, et al. (2019) Deep high-resolution representation learning for human pose estimation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 5686-5696. https://doi.org/10.1109/CVPR.2019.00584.

  27. Yu C, Wang J, Peng C et al (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. in Proceedings of Computer Vision – ECCV 2018 - 15th European Conference 13:334–349. https://doi.org/10.1007/978-3-030-01261-8_20

    Article  Google Scholar 

  28. Poudel RPK, Liwicki S, Cipolla R (2019) Fast-SCNN: fast semantic segmentation network. in Proceedings of 30th British Machine Vision Conference 2019.

  29. Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking Wider to See Better. arXiv preprint arXiv:1506.04579.

  30. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8):1915–1929. https://doi.org/10.1109/TPAMI.2012.231

    Article  Google Scholar 

  31. Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In proceedings of 30th IEEE conference on computer vision and pattern recognition, pp. 6230-6239. https://doi.org/10.1109/CVPR.2017.660.

  32. He J, Deng Z, Qiao Y (2019) Dynamic multi-scale filters for semantic segmentation. In proceedings of the IEEE international conference on computer vision, pp. 3561-3571. 10.11-09/ICCV.2019.00366.

  33. Li Y, Song L, Chen Y, et al. (2020) Learning dynamic routing for semantic segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 8550-8559. https://doi.org/10.1109/CVPR42600.2020.00858.

  34. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. in Proceedings of 4th International Conference on Learning Representations.

  35. Chen L, Papandreou G, Kokkinos I, et al. (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. in Proceedings of 3rd International Conference on Learning Representations.

  36. Chen L, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  37. Chen L, Papandreou G, Schroff F, et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation arXiv preprint arXiv: 1706.05587.

  38. Wang P, Chen L, Yuan Y, et al. (2018) Understanding convolution for semantic segmentation. In proceedings of 2018 IEEE winter conference on applications of computer vision, pp. 1451-1460. https://doi.org/10.1109/WACV.2018.00163.

  39. Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with Atrous separable convolution for semantic image segmentation. in Proceedings of Computer Vision – ECCV 2018 - 15th European Conference 7:833–851. https://doi.org/10.1007/978-3-030-01234-2_49

    Article  Google Scholar 

  40. Wu H, Zhang J, Huang K, et al. (2019) FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816.

  41. Liu C, Chen L C, Schroff F, et al. (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 82-92. https://doi.org/10.1109/CVPR.2019.00017.

  42. Devlin J, Chang MW, Lee K, et al. (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. in Proceedings of NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186.

  43. Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In proceedings of advances in neural information processing systems 30, pp 5999-6009.

  44. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TP-AMI.2019.2913372

    Article  Google Scholar 

  45. Wang X, Girshick R, Gupta A, et al. (2018) Non-local neural networks. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 7794-7803. https://doi.org/10.1109/CVPR.2018.00813.

  46. Yin M, Yao Z, Cao Y et al (2020) Disentangled non-local neural networks. In. Proceedings of Computer Vision – ECCV 2020 - 16th European Conference 12360(15):191–207. https://doi.org/10.1007/978-3-030-58555-6_12

    Article  Google Scholar 

  47. Huang Z, Wang X, Huang L, et al. (2019) CCNet: Criss-cross attention for semantic segmentation. In proceedings of the IEEE international conference on computer vision, pp 603-612. https://doi.org/10.1109/ICCV.2019.00069.

  48. Li X, Zhong Z, Wu J (2019) EMANet: expectation-maximization attention networks for semantic segmentation. In proceedings of the IEEE international conference on computer vision, pp 9166-9175. https://doi.org/10.1109/ICCV.2019.00926.

  49. Fu J, Liu J, Tian H, et al. (2019) Dual attention network for scene segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3141-3149. https://doi.org/10.1109/CVPR.2019.00326.

  50. Yuan Y, Wang J (2018) OCNet: object context network for scene parsing. arXiv preprint arXiv:1809.00916.

  51. Cao Y, Xu J, Lin S, et al. (2019) GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. in Proceedings of 2019 International Conference on Computer Vision Workshop, pp. 1971-1980. https://doi.org/10.1109/ICCVW.2019.00246.

  52. Zhang H, Dana K, Shi J, et al. (2018) Context encoding for semantic segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 7151-7160. https://doi.org/10.1109/CVPR.2018.00747.

  53. Andrew T, Karan S, Bryan C (2020) Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821.

  54. Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. in Proceedings of Computer Vision – ECCV 2020 - 16th European Conference 6:173–190. https://doi.org/10.1007/978-3-030-58539-6_11

    Article  Google Scholar 

  55. Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32:4291–4308. https://doi.org/10.1109/T-NNLS.2020.3019893

    Article  Google Scholar 

  56. Dosovitskiy A, Beyer L, Kolesnikov A, et al. (2021) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  57. Liu Z, Lin YT, Cao Y, et al. (2021) Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030.

  58. Zheng S, Lu J, Zhao H, et al.(2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 6881–6890.

  59. Chen J, Lu Y, Yu Q, et al. (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306.

  60. Xie E, Wang W, Yu Z, et al. (2021) Segformer: simple and efficient design for semantic segmentation with transformers. arXiv preprint arXiv:2105.15203v3.

  61. Liu Z, Li X, Luo P, et al. (2015) Semantic image segmentation via deep parsing network. In proceedings of the IEEE international conference on computer vision, pp. 1377-1385. https://doi.org/10.1109/ICCV.2015.162.

  62. Liu S, De MS, Gu J, et al. (2017) Learning affinity via spatial propagation networks. In proceedings of advances in neural information processing systems, pp. 1521-1531.

  63. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with Gaussian edge potentials. in Proceedings of Advances in Neural Information Processing Systems 24.

  64. Vemulapalli R, Tuzel O, Liu M, et al. (2016) Gaussian conditional random field network for semantic segmentation. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3224-3233. https://doi.org/10.1109/CVPR.2016.351.

  65. Bertasius G, Shi J, Torresani L (2016) Semantic segmentation with boundary neural fields. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 3602-3610. https://doi.org/10.1109/CVPR.2016.392.

  66. Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In proceedings of advances in neural information processing systems, pp. 2017-2025.

  67. Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. in Proceedings of British Machine Vision Conference 2018.

  68. Kirillov A, Wu Y, He K, et al. (2020) PointRend: image segmentation as rendering. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 9796-9805. https://doi.org/10.1109/CVPR42600.2020.00982.

  69. Kittler J (1983) On the accuracy of the Sobel edge detector. Image Vis Comput 1(1):37–42. https://doi.org/10.1016/0262-8856(83)90006-9

    Article  Google Scholar 

  70. Canny JF (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698. https://doi.org/10.1109/T-PAMI.1986.4767851

    Article  Google Scholar 

  71. Xie S, Tu Z (2017) Holistically-nested edge detection. Int J Comput Vis 125(3):3–18. https://doi.org/10.1007/s11263-017-1004-z

    Article  MathSciNet  Google Scholar 

  72. Liu Y, Cheng M, Hu X, Bian JW, Zhang L, Bai X, Tang J (2017) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10.1109/TPAMI.2018.2878849

    Article  Google Scholar 

  73. Wang Z, Acuna D, Ling H, et al. (2019) Object instance annotation with deep extreme level set evolution. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 7492-7500. https://doi.org/10.1109/CVPR.2019.00768.

  74. Acuna D, Kar A, Fidler S (2019) Devil is in the edges: learning semantic boundaries from noisy annotations. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 11067-11075. https://doi.org/10.1109/CVPR.2019.01133.

  75. Yuan Y, Xie J, Chen X, et al. (2020) SegFix: model-agnostic boundary refinement for segmentation. In proceedings of computer vision – ECCV 2020 - 16th European conference, pp. 489-506. https://doi.org/10.1007/978-3-030-58610-2_29.

  76. Shao J, Huang X, Cao K (2019) A review on deep learning techniques applied to semantic segmentation. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China 48(5):644–654. https://doi.org/10.3969/j.issn.1001-0548.2019.05.001

    Article  Google Scholar 

  77. Brostow G, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, vol. 30, no. 2, pp. 88–97. 10.101–6/j.patrec.2008.04.005.

  78. Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 3213-3223. https://doi.org/10.1109/CVPR.2016.350.

  79. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913-491297

    Article  Google Scholar 

  80. Suyash S (2016) Application of convolutional neural network for image classification on Pascal VOC challenge 2012 dataset. arXiv preprint arXiv:1607.03785.

  81. Mottaghi R, Chen X, Liu X, et al. (2014) The role of context for object detection and semantic segmentation in the wild. In proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 891-898. https://doi.org/10.1109/CVPR.2014.119.

  82. Zhou B, Zhao H, Puig X, et al. (2017) Scene parsing through ADE20K dataset. In proceedings of 30th IEEE conference on computer vision and pattern recognition, pp. 5122-5130. https://doi.org/10.1109/CVPR.2017.544.

  83. Hariharan B, Arbelaez P, Bourdev L, et al. (2011) Semantic contours from inverse detectors. In proceedings of the IEEE international conference on computer vision, pp. 991-998. https://doi.org/10.1109/ICCV.2011.6126343.

  84. Staal J, Abramoff M, Niemeijer M et al (2004) Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23(4):501–509. https://doi.org/10.1109/TMI.2004.825627

    Article  Google Scholar 

  85. Menze B, Jakab A, Bauer S et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694

    Article  Google Scholar 

  86. Paisitkriangkrai S, Sherrah J, Janney P, van den Hengel A (2016) Semantic labeling of aerial and satellite imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9(7):2868–2881. https://doi.org/10.1109/JSTARS.2016.2582921

    Article  Google Scholar 

  87. Maggiori E, Tarabalka Y, Charpiat G, et al. (2017) Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. International Geoscience and Remote Sensing Symposium (IGARSS), pp. 3226–3229. https://doi.org/10.1109/IGARSS.2017.8127684.

Download references

Acknowledgments

We convey our sincere appreciation to Qingdao University of Technological and Machine Vision Laboratory. We also wish to acknowledge the younger martial brothers and sisters who provide help but do not appear in the ranks of the authors.

Code availability

Not applicable.

Funding

This work was supported by a National Natural Science Foundation (Grant No. 62171247) and two Natural Science Foundation of Shandong Province (Grant No. ZR2020MF001 and Grant No. ZR2020QF101).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Zhou YG; Methodology: Ren YB, Zhou YG; Formal analysis and investigation: Zhou LJ; Writing - original draft preparation: Ren YB; Writing - review and editing: Zhou YG, Xu EY, Liu SL; Funding acquisition: Zhou YG; Resources: Zhou YG; Supervision: Zhou YG. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lijian Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Ren, Y., Xu, E. et al. Supervised semantic segmentation based on deep learning: a survey. Multimed Tools Appl 81, 29283–29304 (2022). https://doi.org/10.1007/s11042-022-12842-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12842-y

Keywords

Navigation