Skip to main content

A Shared Encoder DNN for Integrated Recognition and Segmentation of Traffic Scenes

  • Chapter
  • First Online:
Frontiers in Computational Intelligence

Abstract

Detection of traffic related objects in the vehicles surroundings is an important task for future automated cars. Visual object recognition and scene labeling from onboard cameras provides valuable information for the driving task. In computer vision, the task of generating meaningful image regions representing specific object categories such as cars or road area, is denoted as semantic segmentation. In contrast, scene recognition computes a global label that reflects the overall category of the scene. This contribution presents an efficient deep neural network (DNN) capable of solving both problems. The network topology avoids redundant computations, by employing a shared feature encoder stage combined with designated decoders for the two specific tasks. Additionally, element-wise weights in a novel Hadamard layer efficiently exploit spatial priors for the segmentation task. Traffic scene segmentation is examined in conjunction with road topology recognition based on the cityscapes dataset [2] augmented with manually labeled road topology data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This work employs a variant of the architecture published at https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet. Accessed: 18.01.2017.

References

  1. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: 3rd international conference on learning representations. arXiv:1412.7062

  2. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

    Google Scholar 

  3. Ess A, Müller T, Grabner H, van Gool L (2009) Segmentation-based urban traffic scene understanding. In: Proceedings of the 20th British machine vision conference, pp 84–1

    Google Scholar 

  4. Fritsch J, Kühnl T, Geiger A (2013) A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of the 16th IEEE conference on intelligent transportation systems, pp 1693–1700

    Google Scholar 

  5. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Aistats, vol 15, p 275

    Google Scholar 

  6. Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456

    Google Scholar 

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

    Google Scholar 

  8. Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, vol 28. MIT Press, pp 1495–1503

    Google Scholar 

  9. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp. 675–678

    Google Scholar 

  10. Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680

  11. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  12. Lin G, Shen C, van den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3194–3203

    Google Scholar 

  13. Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint. arXiv:1312.4400

  14. Liu B, He X, Gould S (2015) Multi-class semantic video segmentation with exemplar-based object reasoning. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 1014–1021

    Google Scholar 

  15. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

    Google Scholar 

  16. Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3376–3385

    Google Scholar 

  17. Papandreou G, Chen LC, Murphy K, Yuille AL (2015) Weakly-and semi-supervised learning of a dcnn for semantic image segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 648–656

    Google Scholar 

  18. Posada LF, Hoffmann F, Bertram T (2014) Visual semantic robot navigation in indoor environments. In: Proceedings of the 41st international symposium on robotics, VDE, pp 1–7

    Google Scholar 

  19. Posada LF, Narayanan KK, Hoffmann F, Bertram T (2013) Semantic classification of scenes and places with omnidirectional vision. In: Proceedings of the IEEE European conference on mobile robots, pp 113–118

    Google Scholar 

  20. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. doi:10.1007/s11263-015-0816-y

  21. Shuai B, Zuo Z, Wang B, Wang G (2016) Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3620–3629

    Google Scholar 

  22. Sikirić I, Brkić K, Krapac J, Šegvić S (2014) Image representations on a budget: traffic scene classification in a restricted bandwidth scenario. In: Proceedings of the IEEE intelligent vehicles symposium, pp 845–852

    Google Scholar 

  23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  24. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th international conference on machine learning, pp 1139–1147

    Google Scholar 

  25. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261

  26. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

    Google Scholar 

  27. Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2016) Multinet: Real-time joint semantic reasoning for autonomous driving. arXiv:1612.07695

  28. Wu Z, Shen C, Hengel Avd (2016) Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080

  29. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, vol 27. MIT Press, pp 3320–3328

    Google Scholar 

  30. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th International conference on learning representations. arXiv:1511.07122

  31. Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105

  32. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations. arXiv:1412.6856

Download references

Acknowledgements

The funding for this work was provided by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malte Oeljeklaus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Oeljeklaus, M., Hoffmann, F., Bertram, T. (2018). A Shared Encoder DNN for Integrated Recognition and Segmentation of Traffic Scenes. In: Mostaghim, S., Nürnberger, A., Borgelt, C. (eds) Frontiers in Computational Intelligence. Studies in Computational Intelligence, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-319-67789-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67789-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67788-0

  • Online ISBN: 978-3-319-67789-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics