A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation

Singha, Tanmay; Pham, Duc-Son; Krishna, Aneesh; Gedeon, Tom

doi:10.1007/978-3-030-92270-2_17

Tanmay Singha¹³,
Duc-Son Pham¹³,
Aneesh Krishna¹³ &
…
Tom Gedeon¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Included in the following conference series:

International Conference on Neural Information Processing

1873 Accesses
1 Citations

Abstract

Recently, semantic segmentation has become an emerging research area in computer vision due to a strong demand for autonomous vehicles, robotics, video surveillance, and medical image processing. To address this demand, several real-time semantic segmentation models have been introduced. Relying on existing Deep Convolution Neural networks (DCNNs), these models extract contextual features from the input image and construct the output at the decoder end by simply fusing deep features with shallow features which causes a large semantic gap. However, this large gap causes boundary degeneration and noisy feature effects in the output. To address this issue, we propose a novel architecture, called Feature Scaling Feature Fusion Network (FSFFNet) which alleviates the gap by successively fusing features at consecutive levels in multiple directions. For better dense pixel-level representation, we also employ a feature scaling technique which helps the model assimilate more contextual information from the global features and improves model performance. Our proposed model achieves 71.8% validation accuracy (mIoU) on the Cityscapes dataset whilst having only 1.3M parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings ICCV, September 2018
Google Scholar
Choi, S., Kim, J.T., Choo, J.: Cars can’t fly up in the sky: improving urban-scene segmentation via height-driven attention networks. In: Proceedings CVPR, pp. 9373–9383 (2020)
Google Scholar
He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: Proceedings ICCV, pp. 3562–3572 (2019)
Google Scholar
Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian SegNet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
Li, H., Xiong, P., Fan, H., Sun, J.: DfaNet: deep feature aggregation for real-time semantic segmentation. In: Proceedings CVPR, pp. 9522–9531 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings CVPR, pp. 2117–2125 (2017)
Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings CVPR, pp. 8759–8768 (2018)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings CVPR, pp. 3431–3440 (2015)
Google Scholar
Pang, Y., Li, Y., Shen, J., Shao, L.: Towards bridging semantic gap to improve semantic segmentation. In: Proceedings ICVV, pp. 4230–4239 (2019)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. arXiv preprint arXiv:1805.04554 (2018)
Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. arXiv preprint arXiv:1902.04502 (2019)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv 2: inverted residuals and linear bottlenecks. In: Proceedings CVPR, pp. 4510–4520 (2018)
Google Scholar
Singha, T., Pham, D.S., Krishna, A.: FaNet: feature aggregation network for semantic segmentation. In: Proceedings DICTA, pp. 1–8. IEEE (2020)
Google Scholar
Singha, T., Pham, D.-S., Krishna, A., Dunstan, J.: Efficient segmentation pyramid network. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. CCIS, vol. 1332, pp. 386–393. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63820-7_44
Chapter Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Targ, S., Almeida, D., Lyman, K.: ResNet in ResNet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
Yang, M., Shi, Y.: DSMRSeg: dual-stage feature pyramid and multi-range context aggregation for real-time semantic segmentation. In: Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. CCIS, vol. 1142, pp. 265–273. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36808-1_29
Chapter Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Chapter Google Scholar
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25
Chapter Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings CVPR, pp. 2881–2890 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Computing, and Mathematical Sciences, Curtin University, Bentley, WA, 6102, Australia
Tanmay Singha, Duc-Son Pham & Aneesh Krishna
Research School of Computer Science, The Australian National University, Canberra, Australia
Tom Gedeon

Authors

Tanmay Singha
View author publications
You can also search for this author in PubMed Google Scholar
Duc-Son Pham
View author publications
You can also search for this author in PubMed Google Scholar
Aneesh Krishna
View author publications
You can also search for this author in PubMed Google Scholar
Tom Gedeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanmay Singha .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singha, T., Pham, DS., Krishna, A., Gedeon, T. (2021). A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-92270-2_17
Published: 07 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation