Abstract
Semantic segmentation can be considered as a per-pixel localization and classification problem, which gives a meaningful label to each pixel in an input image. Deep convolutional neural networks have made extremely successful in semantic segmentation in recent years. However, some challenges still exist. The first challenge task is that most current networks are complex and it is hard to deploy these models on mobile devices because of the limitation of computational cost and memory. Getting more contextual information from downsampled feature maps is another challenging task. To this end, we propose an asymmetric depthwise separable convolution network (ADSCNet) which is a lightweight neural network for real-time semantic segmentation. To facilitating information propagation, Dense Dilated Convolution Connections (DDCC), which connects a set of dilated convolutional layers in a dense way, is introduced in the network. Pooling operation is inserted before ADSCNet unit to cover more contextual information in prediction. Extensive experimental results validate the superior performance of our proposed method compared with other network architectures. Our approach achieves mean intersection over union (mIOU) of 67.5% on Cityscapes dataset at 76.9 frames per second.
Similar content being viewed by others
References
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40 (4):834–848
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Chen L.-C., Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
He Y, Han S (2018) Adc: Automated deep compression and acceleration with reinforcement learning. arXiv:1802.03494
Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360
Ioannou Y, Robertson D, Shotton J, Cipolla R, Criminisi A (2015) Training cnns with low-rank filters for efficient image classification. arXiv:1511.06744
Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2019) Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Transactions on Intelligent Transportation Systems
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning (ICML), pp 448–456
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:1405.38661405.3866
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2755–2763
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters – improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751
Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv:1801.04381
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Wang P, Hu Q, Zhang Y, Zhang C, Liu Y, Cheng J (2018) Two-step quantization for low-bit neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4376–4384
Xie G, Wang J, Zhang T, Lai J, Hong R, Qi GJ (2018) Interleaved structured sparse convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 8847–8856
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995
Yoon J, Hwang SJ (2017) Combined group and exclusive sparsity for deep neural networks. In: Proceedings of the international conference on machine learning (ICML), pp 3958–3966
Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, Tang Y (2018) Methods and datasets on semantic segmentation: a review. Neurocomputing 304:82–103
Yu X, Yu Z, Ramalingam S (2018) Learning strict identity mappings in deep residual networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4432–4440
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. Proc IEEE Conf Comput Vis Pattern Recognit, 6848–6856
Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890
Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2016) Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sens 55(2):645–657
Everingham M, Eslami A, Van Gool L, Williams K, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164
Alhaija A, Mustikovela K, Mescheder L, Geiger A, Rother C (2018) Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis 126(9):961–972
Xie D, Deng C, Wang H, Li C, Tao D (2018) Semantic adversarial network with multi-scale pyramid attention for video classification. Association for the Advancement of Artificial Intelligence (AAAI)
Deng C, Yang E, Liu T, Liu W, Li J, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044
Li N, Li C, Deng C, Liu X, Gao X (2018) Deep joint semantic-embedding hashing. Int Joint Conf Artif Intell, 2397–2403
Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neur Comput 29(9):2352–2449
Cai Z, Fan Q, Feris R, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 354–370
Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536
Liu C, Chen L, Schroff F, Adam H, Hua W, Yuille A, Fei-Fei L (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 82–92
Bischke B, Helber P, Folz J, Borth D, Dengel A (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 1480–1484
Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1150–1157
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893
Li J, Allinson N (2008) A comprehensive review of current local features for computer vision. Neurocomputing 71(10):1771–1787
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3376–3385
Vezhnevets A, Ferrari V, Buhmann J (2012) Weakly supervised structured output learning for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–852
Papandreou G, Chen L, Murphy K, Yuille A (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International conference on computer vision (ICCV), pp 1742–1750
Liu S, Yan S, Zhang T, Xu C, Liu J, Lu H (2011) Weakly supervised graph propagation towards collective image parsing. IEEE Trans Multimed 14(2):361–373
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities of Central South University under grant 2017zzts730. We appreciate Xiangyu Zhang for helping on the discussion.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, J., Xiong, H., Wang, H. et al. ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50, 1045–1056 (2020). https://doi.org/10.1007/s10489-019-01587-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01587-1