Abstract
Detection of traffic related objects in the vehicles surroundings is an important task for future automated cars. Visual object recognition and scene labeling from onboard cameras provides valuable information for the driving task. In computer vision, the task of generating meaningful image regions representing specific object categories such as cars or road area, is denoted as semantic segmentation. In contrast, scene recognition computes a global label that reflects the overall category of the scene. This contribution presents an efficient deep neural network (DNN) capable of solving both problems. The network topology avoids redundant computations, by employing a shared feature encoder stage combined with designated decoders for the two specific tasks. Additionally, element-wise weights in a novel Hadamard layer efficiently exploit spatial priors for the segmentation task. Traffic scene segmentation is examined in conjunction with road topology recognition based on the cityscapes dataset [2] augmented with manually labeled road topology data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This work employs a variant of the architecture published at https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet. Accessed: 18.01.2017.
References
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: 3rd international conference on learning representations. arXiv:1412.7062
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Ess A, Müller T, Grabner H, van Gool L (2009) Segmentation-based urban traffic scene understanding. In: Proceedings of the 20th British machine vision conference, pp 84–1
Fritsch J, Kühnl T, Geiger A (2013) A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of the 16th IEEE conference on intelligent transportation systems, pp 1693–1700
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Aistats, vol 15, p 275
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, vol 28. MIT Press, pp 1495–1503
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp. 675–678
Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lin G, Shen C, van den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3194–3203
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint. arXiv:1312.4400
Liu B, He X, Gould S (2015) Multi-class semantic video segmentation with exemplar-based object reasoning. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 1014–1021
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3376–3385
Papandreou G, Chen LC, Murphy K, Yuille AL (2015) Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 648–656
Posada LF, Hoffmann F, Bertram T (2014) Visual semantic robot navigation in indoor environments. In: Proceedings of the 41st international symposium on robotics, VDE, pp 1–7
Posada LF, Narayanan KK, Hoffmann F, Bertram T (2013) Semantic classification of scenes and places with omnidirectional vision. In: Proceedings of the IEEE European conference on mobile robots, pp 113–118
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. doi:10.1007/s11263-015-0816-y
Shuai B, Zuo Z, Wang B, Wang G (2016) Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3620–3629
Sikirić I, Brkić K, Krapac J, Šegvić S (2014) Image representations on a budget: traffic scene classification in a restricted bandwidth scenario. In: Proceedings of the IEEE intelligent vehicles symposium, pp 845–852
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th international conference on machine learning, pp 1139–1147
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2016) Multinet: Real-time joint semantic reasoning for autonomous driving. arXiv:1612.07695
Wu Z, Shen C, Hengel Avd (2016) Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, vol 27. MIT Press, pp 3320–3328
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th International conference on learning representations. arXiv:1511.07122
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations. arXiv:1412.6856
Acknowledgements
The funding for this work was provided by the European Regional Development Fund (ERDF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Oeljeklaus, M., Hoffmann, F., Bertram, T. (2018). A Shared Encoder DNN for Integrated Recognition and Segmentation of Traffic Scenes. In: Mostaghim, S., Nürnberger, A., Borgelt, C. (eds) Frontiers in Computational Intelligence. Studies in Computational Intelligence, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-319-67789-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-67789-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67788-0
Online ISBN: 978-3-319-67789-7
eBook Packages: EngineeringEngineering (R0)