Skip to main content
Log in

Light-Deeplabv3+: a lightweight real-time semantic segmentation method for complex environment perception

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Current semantic segmentation methods have high accuracy. However, it has the disadvantage of high computational complexity and time consumption, which makes it difficult to meet the application requirements in complex environments. To achieve fast and accurate semantic segmentation of images, we propose a lightweight semantic segmentation method called Light-Deeplabv3+. First, a MobileNetV2Lite-SE architecture with SE module is proposed as the backbone network of the model, which can reduce the number of model parameters and improve the segmentation speed. Second, we propose an ACsc-ASPP module based on asymmetric dilated convolution block (ADCB) and scSE module to solve the semantic information loss during feature extraction. Our improvements can obtain more semantic features and improve segmentation accuracy. Finally, we propose a DSC-Blaze module to replace the original \(3\times 3\) standard convolution. It consists of depthwise separable convolution (DSC) and Blaze module, which can improve the model segmentation speed while maintaining the receptive field. The experimental results prove that the Mean Intersection over Union (MIoU) of Light-Deeplabv3+ on the PASCAL VOC2012 dataset is 73.15\(\%\), and the parameter size is only 6.291MB. Its calculation amount is only 20.883G, and the speed on the 3060Ti platform is 37.66 frames per second (FPS). Compared with traditional Deeplabv3+, Light-Deeplabv3+ can achieve efficient and accurate image segmentation results with less computational overhead, and its performance is comparable to state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Bazarevsky, V., Kartynnik, Y., Vakunov, A., et al.: Blazeface: Sub-millisecond Neural Face Detection on Mobile gpus. arXiv preprint arXiv:1907.05047 (2019)

  2. Chen, J., Liu, Z., Jin, D., et al.: Light transport induced domain adaptation for semantic segmentation in thermal infrared urban scenes. IEEE Trans. Intell. Transp. Syst. 23(12), 23194–23211 (2022)

    Article  Google Scholar 

  3. Chen, L., Papandreou, G., Kokkinos, I.: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). arXiv:1412.7062

  4. Chen, L.C., Papandreou, G., Kokkinos, I., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  PubMed  Google Scholar 

  5. Chen, L.C., Papandreou, G., Schroff, F., et al.: Rethinking Atrous Convolution for Semantic Image Segmentation, vol. 2. arXiv preprint arXiv:1706.05587 (2019)

  6. Chen, L.C., Zhu, Y., Papandreou, G., et al.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  7. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 1800–1807. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.195

  8. Ding, X., Guo, Y., Ding, G.: Acnet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 1911–1920. IEEE (2019). https://doi.org/10.1109/ICCV.2019.00200

  9. Fu, J., Liu, J., Tian, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 3146–3154. Computer Vision Foundation/IEEE (2019)

  10. Gao, X., Bai, H., Xiong, Y., et al.: Robust lane line segmentation based on group feature enhancement. Eng. Appl. Artif. Intell. 117, 105568 (2023)

    Article  Google Scholar 

  11. He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  PubMed  Google Scholar 

  12. Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017)

  13. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  14. Li, H., Xiong, P., Fan, H.: Dfanet: Deep feature aggregation for real-time semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 9522–9531 (2019)

  15. Lin, Z., Sun, W., Tang, B., et al.: Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding. Vis. Comput. 39(2), 597–608 (2023)

    Article  Google Scholar 

  16. Minaee, S., Boykov, Y., Porikli, F., et al.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2021)

    Google Scholar 

  17. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp. 1520–1528. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.178

  18. Paszke, A., Chaurasia, A., Kim, S., et al.: Enet: A Deep Neural Network Architecture for Real-time Semantic Segmentation. arXiv preprint arXiv:1606.02147 (2016)

  19. Qureshi, I., Yan, J., Abbas, Q., et al.: Medical image segmentation using deep semantic-based methods: a review of techniques, applications and emerging trends. Inf. Fusion 90, 316–352 (2022)

    Article  Google Scholar 

  20. Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel ’squeeze & excitation’ in fully convolutional networks. In: Medical Image Computing and Computer Assisted Intervention—MICCAI 2018—21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part I, Lecture Notes in Computer Science, vol. 11070, pp. 421–429. Springer (2018). https://doi.org/10.1007/978-3-030-00928-1_48

  21. Sandler, M., Howard, A.G., Zhu, M.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 4510–4520. Computer Vision Foundation/IEEE Computer Society (2018)

  22. Wang, Z., Wang, J., Yang, K., et al.: Semantic segmentation of high-resolution remote sensing images based on a class feature attention mechanism fused with deeplabv3+. Comput. Geosci. 158, 104969 (2022)

    Article  Google Scholar 

  23. Xu, H., Wang, S., Huang, Y.: Fpanet: feature-enhanced position attention network for semantic segmentation. Mach. Vis. Appl. 32, 1–9 (2021)

    Article  Google Scholar 

  24. Yi, Q., Dai, G., Shi, M.: Elanet: effective lightweight attention-guided network for real-time semantic segmentation. Neural Process. Lett. 55(12), 1–18 (2023)

    ADS  Google Scholar 

  25. You, L., Jiang, H., Hu, J., et al.: Gpu-accelerated faster mean shift with Euclidean distance metrics. In: 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 211–216. IEEE (2022)

  26. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). arXiv:1511.07122

  27. Zhao, H., Shi, J., Qi, X.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 6230–6239. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.660

  28. Zhao, M., Jha, A., Liu, Q., et al.: Faster mean-shift: Gpu-accelerated clustering for cosine embedding-based cell segmentation and tracking. Med. Image Anal. 71, 102048 (2021)

    Article  PubMed  PubMed Central  Google Scholar 

  29. Zhao, M., Liu, Q., Jha, A., et al.: Voxelembed: 3d instance segmentation and tracking with voxel embedding based deep learning. In: Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings vol. 12, pp. 437–446. Springer (2021)

  30. Zheng, Z., Hu, Y., Guo, T., et al.: Aghrnet: An attention ghost-hrnet for confirmation of catch-and-shake locations in jujube fruits vibration harvesting. Comput. Electron. Agric. 210, 107921 (2023)

    Article  ADS  Google Scholar 

  31. Zhou, E., Xu, X., Xu, B., et al.: An enhancement model based on dense atrous and inception convolution for image semantic segmentation. Appl. Intell. 53(5), 5519–5531 (2023)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B0909020001, the National Natural Science Foundation of China under Grant No.61573113.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaming Qian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, P., Qian, H. Light-Deeplabv3+: a lightweight real-time semantic segmentation method for complex environment perception. J Real-Time Image Proc 21, 1 (2024). https://doi.org/10.1007/s11554-023-01380-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01380-x

Keywords

Navigation