A Shared Encoder DNN for Integrated Recognition and Segmentation of Traffic Scenes

Oeljeklaus, Malte; Hoffmann, Frank; Bertram, Torsten

doi:10.1007/978-3-319-67789-7_7

Malte Oeljeklaus⁵,
Frank Hoffmann⁵ &
Torsten Bertram⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 739))

1 Citations

Abstract

Detection of traffic related objects in the vehicles surroundings is an important task for future automated cars. Visual object recognition and scene labeling from onboard cameras provides valuable information for the driving task. In computer vision, the task of generating meaningful image regions representing specific object categories such as cars or road area, is denoted as semantic segmentation. In contrast, scene recognition computes a global label that reflects the overall category of the scene. This contribution presents an efficient deep neural network (DNN) capable of solving both problems. The network topology avoids redundant computations, by employing a shared feature encoder stage combined with designated decoders for the two specific tasks. Additionally, element-wise weights in a novel Hadamard layer efficiently exploit spatial priors for the segmentation task. Traffic scene segmentation is examined in conjunction with road topology recognition based on the cityscapes dataset [2] augmented with manually labeled road topology data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This work employs a variant of the architecture published at https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet. Accessed: 18.01.2017.

References

Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: 3rd international conference on learning representations. arXiv:1412.7062
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Google Scholar
Ess A, Müller T, Grabner H, van Gool L (2009) Segmentation-based urban traffic scene understanding. In: Proceedings of the 20th British machine vision conference, pp 84–1
Google Scholar
Fritsch J, Kühnl T, Geiger A (2013) A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of the 16th IEEE conference on intelligent transportation systems, pp 1693–1700
Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Aistats, vol 15, p 275
Google Scholar
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, vol 28. MIT Press, pp 1495–1503
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp. 675–678
Google Scholar
Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
Lin G, Shen C, van den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3194–3203
Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint. arXiv:1312.4400
Liu B, He X, Gould S (2015) Multi-class semantic video segmentation with exemplar-based object reasoning. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 1014–1021
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Google Scholar
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3376–3385
Google Scholar
Papandreou G, Chen LC, Murphy K, Yuille AL (2015) Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 648–656
Google Scholar
Posada LF, Hoffmann F, Bertram T (2014) Visual semantic robot navigation in indoor environments. In: Proceedings of the 41st international symposium on robotics, VDE, pp 1–7
Google Scholar
Posada LF, Narayanan KK, Hoffmann F, Bertram T (2013) Semantic classification of scenes and places with omnidirectional vision. In: Proceedings of the IEEE European conference on mobile robots, pp 113–118
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. doi:10.1007/s11263-015-0816-y
Shuai B, Zuo Z, Wang B, Wang G (2016) Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3620–3629
Google Scholar
Sikirić I, Brkić K, Krapac J, Šegvić S (2014) Image representations on a budget: traffic scene classification in a restricted bandwidth scenario. In: Proceedings of the IEEE intelligent vehicles symposium, pp 845–852
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th international conference on machine learning, pp 1139–1147
Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Google Scholar
Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2016) Multinet: Real-time joint semantic reasoning for autonomous driving. arXiv:1612.07695
Wu Z, Shen C, Hengel Avd (2016) Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, vol 27. MIT Press, pp 3320–3328
Google Scholar
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th International conference on learning representations. arXiv:1511.07122
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations. arXiv:1412.6856

Download references

Acknowledgements

The funding for this work was provided by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Institute of Control Theory and Systems Engineering, TU Dortmund University, Otto-Hahn-Str. 8, 44227, Dortmundrt, Germany
Malte Oeljeklaus, Frank Hoffmann & Torsten Bertram

Authors

Malte Oeljeklaus
View author publications
You can also search for this author in PubMed Google Scholar
Frank Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Bertram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malte Oeljeklaus .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto von Guericke University Magdeburg Faculty of Computer Science, Magdeburg, Germany
Sanaz Mostaghim
Faculty of Computer Science, Otto von Guericke University Magdeburg, Faculty of Computer Science, Magdeburg, Germany
Andreas Nürnberger
Department of Computer and Information Science,, University of Konstanz, Department of Computer and Information Science, Konstanz, Germany
Christian Borgelt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oeljeklaus, M., Hoffmann, F., Bertram, T. (2018). A Shared Encoder DNN for Integrated Recognition and Segmentation of Traffic Scenes. In: Mostaghim, S., Nürnberger, A., Borgelt, C. (eds) Frontiers in Computational Intelligence. Studies in Computational Intelligence, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-319-67789-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-67789-7_7
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67788-0
Online ISBN: 978-3-319-67789-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics