EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

Hu, Xuegang; Ke, Yan

doi:10.1007/s11554-024-01421-z

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

Research
Published: 27 February 2024

Volume 21, article number 40, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Xuegang Hu¹ &
Yan Ke^1,2

215 Accesses
Explore all metrics

Abstract

In recent years, the performance of real-time semantic segmentation has increasingly become a research focus for real-time applications such as autonomous driving. Although large deep models have excellent segmentation results, their inference speed is slow and the models are complex, which makes them difficult to deploy in practice. To address these problems, a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation (EMFANet) is proposed in this paper, which employs the encoder–decoder framework with efficient channel attention mechanism. In EMFANet, the effective symmetric attention residual unit (SARU) is presented to rapidly obtain large amounts of multi-scale contextual information. The lightweight multi-scale information aggregation unit (MIAU) is presented for efficient fusion of multi-scale features. Experimental results on the Cityscapes test set show that EMFANet can obtain 72.1% mean intersection over union (mIoU) and 143 FPS with only 1.03 M parameters. It also has competitive segmentation capability on the low-resolution Camvid test set with a fast inference speed of 357 FPS. EMFANet achieves an outstanding performance balance between segmentation accuracy, inference speed and model size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ULAF-Net: Ultra lightweight attention fusion network for real-time semantic segmentation

Article 29 January 2024

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Article 05 March 2024

LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation

Article 24 January 2022

Data availability

Data will be made available on request.

References

Li, G., Liu, Z., Zhang, X., Lin, W.: Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2023)
Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), pp. 1–14 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv preprint (2017). arXiv:1706.05587
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E. Enet: A Deep Neural Network Architecture for Real-time Semantic Segmentation. arXiv preprint (2016). arXiv:1606.02147
Sachin, M., Mohammad, R., Anat, C., Linda, S., Hannaneh, H.: Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 552–568 (2018)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Wang, Y., Cui, Z., Li, Y.: Distribution-consistent modal recovering for incomplete multimodal learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22025–22034 (2023)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018)
Changqian, Y., Gao, C., Wang, J., Gang, Y., Shen, C., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision 129, 3051–3068 (2021)
Article Google Scholar
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint (2017). arXiv:1704.04861
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III, vol. 18, pp. 234–241. Springer (2015)
Wang, Q., Wu, B., Zhu, P, Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zhong, Z, Lin, Z.Q., Bidart, R., Hu, X., Daya, I.B., Li, Z., Zheng, W.-S., Li, J., Wong, A.: Squeeze-and-attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13065–13074 (2020)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S..: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Lu, T., Wang, Y., Zhang, Y., Wang, Y., Wei, L., Wang, Z., Jiang, J.: Face hallucination via split-attention in split-attention network. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5501–5509 (2021)
Wang, Y., Tao, L., Zhang, Y., Wang, Z., Jiang, J., Xiong, Z.: Faceformer: aggregating global and local representation for face hallucination. IEEE Trans. Circuits Syst. Video Technol. 33(6), 2533–2545 (2023)
Article Google Scholar
Wang, Y., Lu, T., Yao, Y., Zhang, Y., Xiong, Z.: Learning to hallucinate face in the dark. IEEE Trans. Multimed. 26, 2314–2326 (2023)
Article Google Scholar
Li, G., Han, C., Liu, Z.: No-service rail surface defect segmentation via normalized attention and dual-scale interaction. IEEE Trans. Instrum. Meas. 72, 1–10 (2023)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Li, G., Wang, Y., Liu, Z., Zhang, X., Zeng, D.: Rgb-t semantic segmentation with location, activation, and sharpening. IEEE Trans. Circuits Syst. Video Technol. 33(3), 1223–1235 (2023)
Article Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Yan, H., Zhang, C., Wu, M.: Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-scale Representations via Large Window Attention. arXiv preprint (2022) arXiv:2201.01615
Wang, Y., Li, G., Liu, Z.: Sgfnet: semantic-guided fusion network for rgb-thermal semantic segmentation. IEEE Trans. Circuits Syst. Video Technol. 33(12), 7737–7748 (2023)
Article Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking Wider to See Better. arXiv preprint (2015) arXiv:1506.04579
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Wang, Y., Zhou, Q., Liu, J., Xiong, J., Gao, G., Wu, X., Latecki, L.J.: Lednet: a lightweight encoder-decoder network for real-time semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1860–1864. IEEE (2019)
Gao, G., Guoan, X., Yi, Y., Xie, J., Yang, J., Yue, D.: Mscfnet: a lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 23(12), 25489–25499 (2021)
Article Google Scholar
Gao, G., Guoan, X., Li, J., Yi, Y., Huimin, L., Yang, J.: Fbsnet: a fast bilateral symmetrical network for real-time semantic segmentation. IEEE Trans. Multimedia 25, 3273–3283 (2023)
Article Google Scholar
Si, H., Zhang, Z., Lv, F., Yu, G., Lu, F.: Real-Time Semantic Segmentation via Multiply Spatial Fusion Network. arXiv preprint (2019). arXiv:1911.07217
Xu, Q., Ma, Y., Wu, J., Long, C.: Faster bisenet: a faster bilateral segmentation network for real-time semantic segmentation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Wang, X., Liu, R., Dong, J., Zhang, Q., Zhou, D.: Lightweight real-time image semantic segmentation network based on multi-resolution hybrid attention mechanism. Wirel. Commun. Mobile Comput. 1–10, 2022 (2022)
Google Scholar
Singha, T., Pham, D.-S., Krishna, A.: A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders. Pattern Recogn. 140, 109557 (2023)
Article Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Emara, T., El Munim, H.E.A., Abbas, H.M.: Liteseg: a novel lightweight convnet for semantic segmentation. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2019)
Xuegang, H., Gong, J.: Larfnet: lightweight asymmetric refining fusion network for real-time semantic segmentation. Comput. Graph. 109, 55–64 (2022)
Article Google Scholar
Wang, P., Li, L., Pan, F., Wang, L.: Lightweight bilateral network for real-time semantic segmentation. J. Adv. Comput. Intell. Intell. Inf. 27(4), 673–682 (2023)
Article Google Scholar
Mazhar, S., Atif, N., Bhuyan, M.K., Ahamed, S.R.: Block attention network: a lightweight deep network for real-time semantic segmentation of road scenes in resource-constrained devices. Eng. Appl. Artif. Intell. 126, 107086 (2023)
Article Google Scholar
Dou, Z., Ye, D., Wang, B.: Autosegedge: searching for the edge device real-time semantic segmentation based on multi-task learning. Image Vis. Comput. 136, 104719 (2023)
Article Google Scholar
Mengxu, L., Zhenxue Chen, Q.M., Jonathan, W., Wang, N., Rong, X., Yan, X.: Frnet: factorized and regular blocks network for semantic segmentation in road scene. IEEE Trans. Intell. Transp. Syst. 23(4), 3522–3530 (2020)
Google Scholar
Singha, T., Pham, D.S., Krishna, A.: Sdbnet: lightweight real-time semantic segmentation using short-term dense bottleneck. In: 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–8 (2022)
Hao, S., Zhou, Y., Guo, Y., Hong, R., Cheng, J., Wang, M.: Real-time semantic segmentation via spatial-detail guided context propagation. IEEE Trans. Neural Netw. Learn. Syst. 33, 1752–1764 (2022)
Google Scholar
Wan, Q., Huang, Z., Lu, J., Yu, G. Zhang, L.: Seaformer: squeeze-enhanced axial transformer for mobile semantic segmentation. In: The Eleventh International Conference on Learning Representations (2023)
Fan, J., Wang, F., Chu, H., Xiao, H., Cheng, Y., Gao, B.: Mlfnet: multi-level fusion network for real-time semantic segmentation of autonomous driving. IEEE Trans. Intell. Veh. 8(1), 756–767 (2023)
Article Google Scholar
Mengxu, L., Chen, Z., Liu, C., Ma, S., Cai, L., Qin, H.: Mfnet: multi-feature fusion network for real-time semantic segmentation in road scenes. IEEE Trans. Intell. Transp. Syst. 23(11), 20991–21003 (2022)
Article Google Scholar
Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: patch-wise hypernetwork for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070 (2021)
Yang, Z., Hongshan, Y., Qiang, F., Sun, W., Jia, W., Sun, M., Mao, Z.-H.: Ndnet: narrow while deep network for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 22(9), 5508–5519 (2020)
Article Google Scholar
Wang, K., Yang, J., Yuan, S., Li, M.: A lightweight network with attention decoder for real-time semantic segmentation. Vis. Comput. 38(7), 2329–2339 (2022)
Article Google Scholar
Liu, J., Zhou, Q., Qiang, Y., Kang, B., Wu, X., Zheng, B.: Fddwnet: a lightweight convolutional neural network for real-time semantic segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2373–2377. IEEE (2020)

Download references

Funding

This work is supported by the National Natural Science Foundation of China (Project Number 62076044), and the Natural Science Foundation of Chongqing, China (Grant No. cstc2019jcyj-zdxm0011).

Author information

Authors and Affiliations

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Xuegang Hu & Yan Ke
Chongqing Key Laboratory of Signal and Information Processing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yan Ke

Authors

Xuegang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Ke
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XH: research direction, resources, formal analysis, project administration, writing—review and editing. YK: conceptualization, methodology, experiment, data collection and analysis, test, writing—original draft.

Corresponding author

Correspondence to Yan Ke.

Ethics declarations

Conflict of interest

The authors declare no any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, X., Ke, Y. EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation. J Real-Time Image Proc 21, 40 (2024). https://doi.org/10.1007/s11554-024-01421-z

Download citation

Received: 03 December 2023
Accepted: 16 January 2024
Published: 27 February 2024
DOI: https://doi.org/10.1007/s11554-024-01421-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

ULAF-Net: Ultra lightweight attention fusion network for real-time semantic segmentation

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

ULAF-Net: Ultra lightweight attention fusion network for real-time semantic segmentation

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation