Skip to main content

VIENA\(^2\): A Driving Anticipation Dataset

  • Conference paper
  • First Online:
Book cover Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11361))

Included in the following conference series:

Abstract

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA\(^2\), covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5 s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early. In: ICCV (2017)

    Google Scholar 

  2. Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: CVPR (2016)

    Google Scholar 

  3. Chan, F.H., Chen, Y.T., Xiang, Y., Sun, M.: Anticipating accidents in dashcam videos. In: ACCV (2016)

    Google Scholar 

  4. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)

    Google Scholar 

  5. Dong, C., Dolan, J.M., Litkouhi, B.: Intention estimation for ramp merging control in autonomous driving. In: IV (2017)

    Google Scholar 

  6. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: CVPR (2017)

    Google Scholar 

  7. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)

    Google Scholar 

  8. Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Going deeper: autonomous steering with neural memory networks. In: CVPR (2017)

    Google Scholar 

  9. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)

    Google Scholar 

  10. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)

  11. Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: IV (2015)

    Google Scholar 

  12. Jain, A., Koppula, H.S., Soh, S., Raghavan, B., Singh, A., Saxena, A.: Brain4cars: car that knows before you do via sensory-fusion deep learning architecture. arXiv preprint arXiv:1601.00740 (2016)

  13. Jain, A., Singh, A., Koppula, H.S., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In: ICRA (2016)

    Google Scholar 

  14. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)

    Google Scholar 

  15. Klingelschmitt, S., Damerow, F., Willert, V., Eggert, J.: Probabilistic situation assessment framework for multiple, interacting traffic participants in generic traffic scenes. In: IV (2016)

    Google Scholar 

  16. Kooij, J.F.P., Schneider, N., Flohr, F., Gavrila, D.M.: Context-based pedestrian path prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 618–633. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_40

    Chapter  Google Scholar 

  17. Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. TPAMI 38, 14–29 (2016)

    Article  Google Scholar 

  18. Li, X., et al.: A unified framework for concurrent pedestrian and cyclist detection. T-ITS 18, 269–281 (2017)

    Google Scholar 

  19. Liebner, M., Ruhhammer, C., Klanner, F., Stiller, C.: Generic driver intent inference based on parametric models. In: ITSC (2013)

    Google Scholar 

  20. Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR (2016)

    Google Scholar 

  21. Morris, B., Doshi, A., Trivedi, M.: Lane change intent prediction for driver assistance: on-road design and evaluation. In: IV (2011)

    Google Scholar 

  22. Ohn-Bar, E., Martin, S., Tawari, A., Trivedi, M.M.: Head, eye, and hand patterns for driver activity recognition. In: ICPR (2014)

    Google Scholar 

  23. Olabiyi, O., Martinson, E., Chintalapudi, V., Guo, R.: Driver action prediction using deep (bidirectional) recurrent neural network. arXiv preprint arXiv:1706.02257 (2017)

  24. Pentland, A., Liu, A.: Modeling and prediction of human behavior. Neural Comput. 11, 229–242 (1999)

    Article  Google Scholar 

  25. Pool, E.A., Kooij, J.F., Gavrila, D.M.: Using road topology to improve cyclist path prediction. In: IV (2017)

    Google Scholar 

  26. Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR (2016)

    Google Scholar 

  27. Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Agreeing to cross: how drivers and pedestrians communicate. arXiv preprint arXiv:1702.03555 (2017)

  28. Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)

    Google Scholar 

  29. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  30. Rockstar-Games: Grand Theft Auto V: PC single-player mods (2018). http://tinyurl.com/yc8kq7vn

  31. Rockstar-Games: Policy on posting copyrighted Rockstar Games material (2018). http://tinyurl.com/yc8kq7vn

  32. Ros, G., et al.: Semantic segmentation of urban scenes via domain adaptation of SYNTHIA. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 227–241. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_12

    Chapter  Google Scholar 

  33. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)

    Google Scholar 

  34. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV (2009)

    Google Scholar 

  35. Saleh, F.S., Aliakbarian, M.S., Salzmann, M., Petersson, L., Alvarez, J.M.: Effective use of synthetic data for urban scene semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 86–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_6

    Chapter  Google Scholar 

  36. Schulz, A.T., Stiefelhagen, R.: A controlled interactive multiple model filter for combined pedestrian intention recognition and path prediction. In: ITSC (2015)

    Google Scholar 

  37. Smith, R.: An overview of the tesseract OCR engine. In: ICDAR. IEEE (2007)

    Google Scholar 

  38. Soomro, K., Idrees, H., Shah, M.: Online localization and prediction of actions and interactions. arXiv preprint arXiv:1612.01194 (2016)

  39. Soomro, K., Idrees, H., Shah, M.: Predicting the where and what of actors and actions through online action localization. In: CVPR (2016)

    Google Scholar 

  40. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  41. Tawari, A., Sivaraman, S., Trivedi, M.M., Shannon, T., Tippelhofer, M.: Looking-in and looking-out vision for urban intelligent assistance: estimation of driver attentive state and dynamic surround for safe merging and braking. In: IV (2014)

    Google Scholar 

  42. Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR (2016)

    Google Scholar 

  43. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  44. Wang, X., Ji, Q.: Hierarchical context modeling for video event recognition. TPAMI 39, 1770–1782 (2017)

    Article  Google Scholar 

  45. Zyner, A., Worrall, S., Ward, J., Nebot, E.: Long short term memory for driver intent prediction. In: IV (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Sadegh Aliakbarian .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11594 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L. (2019). VIENA\(^2\): A Driving Anticipation Dataset. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20887-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20886-8

  • Online ISBN: 978-3-030-20887-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics