Depth Augmented Semantic Segmentation Networks for Automated Driving

Rashed, Hazem; Yogamani, Senthil; El-Sallab, Ahmad; Das, Arindam; El-Helw, Mohamed

doi:10.1007/978-981-15-1387-9_1

Hazem Rashed⁸,
Senthil Yogamani⁹,
Ahmad El-Sallab⁸,
Arindam Das¹⁰ &
…
Mohamed El-Helw¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1019))

Included in the following conference series:

Workshop on Computer Vision Applications

342 Accesses
3 Citations

Abstract

In this paper, we explore the augmentation of depth maps to improve the performance of semantic segmentation motivated by the geometric structure in automotive scenes. Typically depth is already computed in an automotive system to localize objects and path planning and thus can be leveraged for semantic segmentation. We construct two networks that serve as a baseline for comparison which are “RGB only” and “Depth only”, and we investigate the impact of fusion of both cues using another two networks which are “RGBD concat”, and “Two Stream RGB+D”. We evaluate these networks on two automotive datasets namely Virtual KITTI using synthetic depth and Cityscapes using a standard stereo depth estimation algorithm. Additionally, we evaluate our approach using monoDepth unsupervised estimator [10]. Two-stream architecture achieves the best results with an improvement of 5.7% IoU in Virtual KITTI and 1% IoU in Cityscapes. There is a large improvement for certain classes like trucks, building, van and cars which have an increase of 29%, 11%, 9% and 8% respectively in Virtual KITTI. Surprisingly, CNN model is able to produce good semantic segmentation from depth images only. The proposed network runs at 4 fps on TitanX GPU, Maxwell architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5
Chapter Google Scholar
Cao, Y., Shen, C., Shen, H.T.: Exploiting depth from single monocular images for object detection and semantic segmentation. IEEE Trans. Image Process. 26(2), 836–846 (2017)
Article MathSciNet Google Scholar
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. arXiv preprint arXiv:1511.03339 (2015)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. arXiv preprint arXiv:1604.01685 (2016)
Cordts, M., et al.: The stixel world: a medium-level representation of traffic scenes. Image Vis. Comput. 68, 40–52 (2017)
Article Google Scholar
Das, A., Yogamani, S.: Evaluation of residual learning in lightweight deep networks for object classification. In: Proceedings of the Irish Machine Vision and Image Processing Conference, pp. 205–208 (2018)
Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: CVPR (2016)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)
Google Scholar
Grangier, D., Bottou, L., Collobert, R.: Deep convolutional networks for scene parsing. In: ICML 2009 Deep Learning Workshop, vol. 3. Citeseer (2009)
Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 213–228. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_14
Chapter Google Scholar
Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005, vol. 2, pp. 807–814. IEEE (2005)
Google Scholar
Horgan, J., Hughes, C., McDonald, J., Yogamani, S.: Vision-based driver assistance systems: survey, taxonomy and advances. In: 2015 IEEE 18th International Conference on. Intelligent Transportation Systems (ITSC), pp. 2032–2039. IEEE (2015)
Google Scholar
Jain, S.D., Xiong, B., Grauman, K.: Fusionseg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. arXiv preprint arXiv:1701.05384 (2017)
Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_45
Chapter Google Scholar
Lin, D., Chen, G., Cohen-Or, D., Heng, P.A., Huang, H.: Cascaded feature network for semantic segmentation of RGB-D images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1320–1328. IEEE (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Ma, L., Stückler, J., Kerl, C., Cremers, D.: Multi-view deep learning for consistent semantic mapping with RGB-D cameras. arXiv preprint arXiv:1703.08866 (2017)
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. arXiv preprint arXiv:1609.05130 (2016)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1520–1528 (2015)
Google Scholar
Qi, G.J.: Hierarchically gated deep networks for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Siam, M., Elkerdawy, S., Jagersand, M., Yogamani, S.: Deep semantic segmentation for automated driving: taxonomy, roadmap and challenges. arXiv preprint arXiv:1707.02432 (2017)
Siam, M., Mahgoub, H., Zahran, M., Yogamani, S., Jagersand, M., El-Sallab, A.: MODNET: moving object detection network with motion and appearance for autonomous driving. arXiv preprint arXiv:1709.04821 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. arXiv preprint arXiv:1708.06500 (2017)
Wang, W., Neumann, U.: Depth-aware CNN for RGB-D segmentation. arXiv preprint arXiv:1803.06791 (2018)
Whelan, T., Leutenegger, S., Salas-Moreno, R.F., Glocker, B., Davison, A.J.: Elasticfusion: Dense slam without a pose graph. In: Robotics: Science and Systems, vol. 11 (2015)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

CDV AI Research, Cairo, Egypt
Hazem Rashed & Ahmad El-Sallab
Valeo Vision Systems, Valeo, Cairo, Egypt
Senthil Yogamani
Detection Vision Systems, Valeo, Tuam, Ireland
Arindam Das
Nile University, Cairo, Egypt
Mohamed El-Helw

Authors

Hazem Rashed
View author publications
You can also search for this author in PubMed Google Scholar
Senthil Yogamani
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad El-Sallab
View author publications
You can also search for this author in PubMed Google Scholar
Arindam Das
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El-Helw
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senthil Yogamani .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
Chetan Arora
Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, India
Kaushik Mitra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rashed, H., Yogamani, S., El-Sallab, A., Das, A., El-Helw, M. (2019). Depth Augmented Semantic Segmentation Networks for Automated Driving. In: Arora, C., Mitra, K. (eds) Computer Vision Applications. WCVA 2018. Communications in Computer and Information Science, vol 1019. Springer, Singapore. https://doi.org/10.1007/978-981-15-1387-9_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-1387-9_1
Published: 15 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1386-2
Online ISBN: 978-981-15-1387-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics