Advertisement

ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

  • Jiawei Wang
  • Hongyun XiongEmail author
  • Haibo Wang
  • Xiaohong Nian
Article
  • 71 Downloads

Abstract

Semantic segmentation can be considered as a per-pixel localization and classification problem, which gives a meaningful label to each pixel in an input image. Deep convolutional neural networks have made extremely successful in semantic segmentation in recent years. However, some challenges still exist. The first challenge task is that most current networks are complex and it is hard to deploy these models on mobile devices because of the limitation of computational cost and memory. Getting more contextual information from downsampled feature maps is another challenging task. To this end, we propose an asymmetric depthwise separable convolution network (ADSCNet) which is a lightweight neural network for real-time semantic segmentation. To facilitating information propagation, Dense Dilated Convolution Connections (DDCC), which connects a set of dilated convolutional layers in a dense way, is introduced in the network. Pooling operation is inserted before ADSCNet unit to cover more contextual information in prediction. Extensive experimental results validate the superior performance of our proposed method compared with other network architectures. Our approach achieves mean intersection over union (mIOU) of 67.5% on Cityscapes dataset at 76.9 frames per second.

Keywords

Semantic segmentation Dense connection Real-time Depthwise separable convolution 

Notes

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities of Central South University under grant 2017zzts730. We appreciate Xiangyu Zhang for helping on the discussion.

References

  1. 1.
    Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRefGoogle Scholar
  2. 2.
    Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40 (4):834–848CrossRefGoogle Scholar
  3. 3.
    Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
  4. 4.
    Chen L.-C., Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818CrossRefGoogle Scholar
  5. 5.
    Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223Google Scholar
  6. 6.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778Google Scholar
  7. 7.
    He Y, Han S (2018) Adc: Automated deep compression and acceleration with reinforcement learning. arXiv:1802.03494
  8. 8.
    Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
  9. 9.
    Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360
  10. 10.
    Ioannou Y, Robertson D, Shotton J, Cipolla R, Criminisi A (2015) Training cnns with low-rank filters for efficient image classification. arXiv:1511.06744
  11. 11.
    Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2019) Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Transactions on Intelligent Transportation SystemsGoogle Scholar
  12. 12.
    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning (ICML), pp 448–456Google Scholar
  13. 13.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:1405.38661405.3866
  14. 14.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
  15. 15.
    Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710
  16. 16.
    Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2755–2763Google Scholar
  17. 17.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440Google Scholar
  18. 18.
    Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
  19. 19.
    Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters – improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1743–1751Google Scholar
  20. 20.
    Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272CrossRefGoogle Scholar
  21. 21.
    Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv:1801.04381
  22. 22.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  23. 23.
    Wang P, Hu Q, Zhang Y, Zhang C, Liu Y, Cheng J (2018) Two-step quantization for low-bit neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4376–4384Google Scholar
  24. 24.
    Xie G, Wang J, Zhang T, Lai J, Hong R, Qi GJ (2018) Interleaved structured sparse convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit, 8847–8856Google Scholar
  25. 25.
    Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995Google Scholar
  26. 26.
    Yoon J, Hwang SJ (2017) Combined group and exclusive sparsity for deep neural networks. In: Proceedings of the international conference on machine learning (ICML), pp 3958–3966Google Scholar
  27. 27.
    Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, Tang Y (2018) Methods and datasets on semantic segmentation: a review. Neurocomputing 304:82–103CrossRefGoogle Scholar
  28. 28.
    Yu X, Yu Z, Ramalingam S (2018) Learning strict identity mappings in deep residual networks. Proc IEEE Conf Comput Vis Pattern Recognit, 4432–4440Google Scholar
  29. 29.
    Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. Proc IEEE Conf Comput Vis Pattern Recognit, 6848–6856Google Scholar
  30. 30.
    Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955CrossRefGoogle Scholar
  31. 31.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890Google Scholar
  32. 32.
    Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2016) Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sens 55(2):645–657CrossRefGoogle Scholar
  33. 33.
    Everingham M, Eslami A, Van Gool L, Williams K, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136CrossRefGoogle Scholar
  34. 34.
    Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3156–3164Google Scholar
  35. 35.
    Alhaija A, Mustikovela K, Mescheder L, Geiger A, Rother C (2018) Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis 126(9):961–972CrossRefGoogle Scholar
  36. 36.
    Xie D, Deng C, Wang H, Li C, Tao D (2018) Semantic adversarial network with multi-scale pyramid attention for video classification. Association for the Advancement of Artificial Intelligence (AAAI)Google Scholar
  37. 37.
    Deng C, Yang E, Liu T, Liu W, Li J, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044MathSciNetCrossRefGoogle Scholar
  38. 38.
    Li N, Li C, Deng C, Liu X, Gao X (2018) Deep joint semantic-embedding hashing. Int Joint Conf Artif Intell, 2397–2403Google Scholar
  39. 39.
    Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neur Comput 29(9):2352–2449MathSciNetCrossRefGoogle Scholar
  40. 40.
    Cai Z, Fan Q, Feris R, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 354–370CrossRefGoogle Scholar
  41. 41.
    Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536CrossRefGoogle Scholar
  42. 42.
    Liu C, Chen L, Schroff F, Adam H, Hua W, Yuille A, Fei-Fei L (2019) Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 82–92Google Scholar
  43. 43.
    Bischke B, Helber P, Folz J, Borth D, Dengel A (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 1480–1484Google Scholar
  44. 44.
    Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1150–1157Google Scholar
  45. 45.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893Google Scholar
  46. 46.
    Li J, Allinson N (2008) A comprehensive review of current local features for computer vision. Neurocomputing 71(10):1771–1787CrossRefGoogle Scholar
  47. 47.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  48. 48.
    Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRefGoogle Scholar
  49. 49.
    Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3376–3385Google Scholar
  50. 50.
    Vezhnevets A, Ferrari V, Buhmann J (2012) Weakly supervised structured output learning for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 845–852Google Scholar
  51. 51.
    Papandreou G, Chen L, Murphy K, Yuille A (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International conference on computer vision (ICCV), pp 1742–1750Google Scholar
  52. 52.
    Liu S, Yan S, Zhang T, Xu C, Liu J, Lu H (2011) Weakly supervised graph propagation towards collective image parsing. IEEE Trans Multimed 14(2):361–373CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of SoftwareCentral South UniversityChangshaPeople’s Republic of China
  2. 2.School of Information Science and EngineeringCentral South UniversityChangshaPeople’s Republic of China

Personalised recommendations