VIENA $$^2$$ : A Driving Anticipation Dataset

Aliakbarian, Mohammad Sadegh; Saleh, Fatemeh Sadat; Salzmann, Mathieu; Fernando, Basura; Petersson, Lars; Andersson, Lars

doi:10.1007/978-3-030-20887-5_28

Mohammad Sadegh Aliakbarian^18,19,21,
Fatemeh Sadat Saleh^18,21,
Mathieu Salzmann²⁰,
Basura Fernando¹⁹,
Lars Petersson^18,21 &
…
Lars Andersson²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11361))

Included in the following conference series:

Asian Conference on Computer Vision

2139 Accesses
10 Citations

Abstract

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA$^2$, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5 s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early. In: ICCV (2017)
Google Scholar
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: CVPR (2016)
Google Scholar
Chan, F.H., Chen, Y.T., Xiang, Y., Sun, M.: Anticipating accidents in dashcam videos. In: ACCV (2016)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)
Google Scholar
Dong, C., Dolan, J.M., Litkouhi, B.: Intention estimation for ramp merging control in autonomous driving. In: IV (2017)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: CVPR (2017)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
Google Scholar
Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Going deeper: autonomous steering with neural memory networks. In: CVPR (2017)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: IV (2015)
Google Scholar
Jain, A., Koppula, H.S., Soh, S., Raghavan, B., Singh, A., Saxena, A.: Brain4cars: car that knows before you do via sensory-fusion deep learning architecture. arXiv preprint arXiv:1601.00740 (2016)
Jain, A., Singh, A., Koppula, H.S., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In: ICRA (2016)
Google Scholar
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)
Google Scholar
Klingelschmitt, S., Damerow, F., Willert, V., Eggert, J.: Probabilistic situation assessment framework for multiple, interacting traffic participants in generic traffic scenes. In: IV (2016)
Google Scholar
Kooij, J.F.P., Schneider, N., Flohr, F., Gavrila, D.M.: Context-based pedestrian path prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 618–633. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_40
Chapter Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. TPAMI 38, 14–29 (2016)
Article Google Scholar
Li, X., et al.: A unified framework for concurrent pedestrian and cyclist detection. T-ITS 18, 269–281 (2017)
Google Scholar
Liebner, M., Ruhhammer, C., Klanner, F., Stiller, C.: Generic driver intent inference based on parametric models. In: ITSC (2013)
Google Scholar
Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR (2016)
Google Scholar
Morris, B., Doshi, A., Trivedi, M.: Lane change intent prediction for driver assistance: on-road design and evaluation. In: IV (2011)
Google Scholar
Ohn-Bar, E., Martin, S., Tawari, A., Trivedi, M.M.: Head, eye, and hand patterns for driver activity recognition. In: ICPR (2014)
Google Scholar
Olabiyi, O., Martinson, E., Chintalapudi, V., Guo, R.: Driver action prediction using deep (bidirectional) recurrent neural network. arXiv preprint arXiv:1706.02257 (2017)
Pentland, A., Liu, A.: Modeling and prediction of human behavior. Neural Comput. 11, 229–242 (1999)
Article Google Scholar
Pool, E.A., Kooij, J.F., Gavrila, D.M.: Using road topology to improve cyclist path prediction. In: IV (2017)
Google Scholar
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR (2016)
Google Scholar
Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Agreeing to cross: how drivers and pedestrians communicate. arXiv preprint arXiv:1702.03555 (2017)
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Rockstar-Games: Grand Theft Auto V: PC single-player mods (2018). http://tinyurl.com/yc8kq7vn
Rockstar-Games: Policy on posting copyrighted Rockstar Games material (2018). http://tinyurl.com/yc8kq7vn
Ros, G., et al.: Semantic segmentation of urban scenes via domain adaptation of SYNTHIA. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 227–241. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1_12
Chapter Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV (2009)
Google Scholar
Saleh, F.S., Aliakbarian, M.S., Salzmann, M., Petersson, L., Alvarez, J.M.: Effective use of synthetic data for urban scene semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 86–103. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_6
Chapter Google Scholar
Schulz, A.T., Stiefelhagen, R.: A controlled interactive multiple model filter for combined pedestrian intention recognition and path prediction. In: ITSC (2015)
Google Scholar
Smith, R.: An overview of the tesseract OCR engine. In: ICDAR. IEEE (2007)
Google Scholar
Soomro, K., Idrees, H., Shah, M.: Online localization and prediction of actions and interactions. arXiv preprint arXiv:1612.01194 (2016)
Soomro, K., Idrees, H., Shah, M.: Predicting the where and what of actors and actions through online action localization. In: CVPR (2016)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Tawari, A., Sivaraman, S., Trivedi, M.M., Shannon, T., Tippelhofer, M.: Looking-in and looking-out vision for urban intelligent assistance: estimation of driver attentive state and dynamic surround for safe merging and braking. In: IV (2014)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR (2016)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Wang, X., Ji, Q.: Hierarchical context modeling for video event recognition. TPAMI 39, 1770–1782 (2017)
Article Google Scholar
Zyner, A., Worrall, S., Ward, J., Nebot, E.: Long short term memory for driver intent prediction. In: IV (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

ANU, Canberra, Australia
Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh & Lars Petersson
ACRV, Canberra, Australia
Mohammad Sadegh Aliakbarian & Basura Fernando
CVLab, EPFL, Lausanne, Switzerland
Mathieu Salzmann
Data61-CSIRO, Canberra, Australia
Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Lars Petersson & Lars Andersson

Authors

Mohammad Sadegh Aliakbarian
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Sadat Saleh
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Salzmann
View author publications
You can also search for this author in PubMed Google Scholar
Basura Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Lars Petersson
View author publications
You can also search for this author in PubMed Google Scholar
Lars Andersson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Sadegh Aliakbarian .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11594 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L. (2019). VIENA$^2$: A Driving Anticipation Dataset. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-20887-5_28
Published: 28 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20886-8
Online ISBN: 978-3-030-20887-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VIENA\(^2\): A Driving Anticipation Dataset

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 11594 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

VIENA\(^2\): A Driving Anticipation Dataset

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 11594 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation