Skip to main content

Multi-view Vision Transformer for Driver Action Recognition

  • Conference paper
  • First Online:
2021 6th International Conference on Intelligent Transportation Engineering (ICITE 2021) (ICITE 2021)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 901))

Included in the following conference series:

Abstract

Distracted driving causes many accidents every year, most of which can be avoided with automatic recognition. As a result, vision-based driver action recognition is receiving increasing research attention. In a limited in-vehicle space, actions can be very ambiguous from an individual view. Therefore exploring efficient multi-view action recognition architecture is meaningful. This study aims to detect the distraction of drivers while identifying the cause. A novel driver action recognition architecture named multi-view vision transformer (MVVT) is proposed, which combines classical convolutional neural networks (CNNs) with vision transformer. Self-attention mechanism is utilized to dynamically aggregate temporal information and fuse features of different views jointly. Experiments demonstrate that MVVT can effectively recognize drivers’ behaviors with multi-view input. A promising result of 84.9% accuracy is achieved on a large public driver action dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. National Center for Statistics and Analysis, Distracted driving 2018 (2020)

    Google Scholar 

  2. National Center for Statistics and Analysis, Overview of motor vehicle crashes in 2019 (2020)

    Google Scholar 

  3. Xing, Y., Lv, C., Wang, H., Cao, D., Velenis, E., Wang, F.-Y.: Driver activity recognition for intelligent vehicles: A deep learning approach. IEEE Trans. Veh. Technol. 68(6), 5379–5390 (2019)

    Article  Google Scholar 

  4. Dumitru, I., Girbacia, T., Boboc, R.G., Postelnicu, C.-C., Mogan, G.-L.: Effects of smartphone based advanced driver assistance system on distracted driving behavior: A simulator study. Comput. Hum. Behav. 83, 1–7 (2018)

    Article  Google Scholar 

  5. Kircher, K., Ahlström, C.: Issues related to the driver distraction detection algorithm attend, in First international conference on driver distraction and inattention. Gothenburg, Sweden (2009)

    Google Scholar 

  6. Liu, T., Yang, Y., Huang, G.-B., Yeo, Y.K., Lin, Z.: Driver distraction detection using semi-supervised machine learning. IEEE Trans. Intell. Transp. Syst. 17(4), 1108–1120 (2015)

    Article  Google Scholar 

  7. Craye, C., Karray, F.: Driver distraction detection and recognition using rgb-d sensor, arXiv preprint arXiv:1502.00250 (2015)

  8. Kose, N., Kopuklu, O., Unnervik, A., Rigoll, G.: Real-time driver state monitoring using a cnn based spatio-temporal approach, in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp. 3236–3242 (2019)

    Google Scholar 

  9. Baheti, B., Gajre, S., Talbar, S.: Detection of distracted driver using convolutional neural network, in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1032– 1038 (2018)

    Google Scholar 

  10. Lu, M., Hu, Y., Lu, X.: Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Appl. Intell. 50(4), 1100–1111 (2020)

    Article  Google Scholar 

  11. Yan, S., Teng, Y., Smith, J.S., Zhang, B.: Driver behavior recognition based on deep convolutional neural networks, in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, pp. 636–641 (2016)

    Google Scholar 

  12. Lemley, J., Bazrafkan, S., Corcoran, P.: Transfer learning of temporal information for driver action classification. in MAICS, pp. 123– 128 (2017)

    Google Scholar 

  13. Moslemi, N., Azmi, R., Soryani, M.: Driver distraction recognition using 3d convolutional neural networks, in 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA). IEEE, pp. 145–151 (2019)

    Google Scholar 

  14. Martin, M., et al.: Drive&act: a multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2801–2810 (2019)

    Google Scholar 

  15. Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Soft spatial attention-based multimodal driver action recognition using deep learning, IEEE Sensors Journal (2020)

    Google Scholar 

  16. Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: A novel public dataset for multimodal multiview and multi- spectral driver distraction analysis: 3mdad, Signal Processing: Image Communication 88, 115960 (2020)

    Google Scholar 

  17. Mase, J.M., Chapman, P., Figueredo, G.P., Torres, M.T.: A hybrid deep learning approach for driver distraction detection, in 2020 In- ternational Conference on Information and Communication Technology Convergence (ICTC). IEEE, pp. 1–6 (2020)

    Google Scholar 

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. He, I., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

    Google Scholar 

  20. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  21. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift, in International conference on machine learning. PMLR, pp. 448–456 (2015)

    Google Scholar 

  22. Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification, arXiv preprint arXiv:1706.09498 (2017)

  23. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset, in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  24. State Farm Corporate, State farm distracted driver detection, https://www.kaggle.com/c/state-farm-distracted-driver-detection (2016)

  25. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description, in Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2625–2634 (2015)

    Google Scholar 

  26. Simonyan, I., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

  27. Yuan, J., et al.: Tokens-to-token vit: Training vision transformers from scratch on imagenet” arXiv preprint arXiv:2101.11986 (2021)

  28. Wu, H., et al.: Cvt: Introducing convolutions to vision transformers, arXiv preprint arXiv:2103.15808 (2021)

  29. Vaswani, A., et al.: Attention is all you need, arXiv preprint arXiv:1706.03762 (2017)

  30. Dey, A.K., Goel, B., Chellappan, S.: Context-driven detection of distracted driving using images from in-car cameras, Internet of Things 14, 100380 (2021)

    Google Scholar 

  31. Leekha, I., Goswami, M., Shah, R.R., Yin, Y., Zimmermann, R.: Are you paying attention? detecting distracted driving in real-time, in 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, pp. 171–180 (2019)

    Google Scholar 

  32. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks, in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)

    Google Scholar 

  33. Deng, J., et al.: IEEE conference on computer vision and pattern recognition. Ieee 2009, 248–255 (2009)

    Google Scholar 

  34. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  35. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)

    Google Scholar 

Download references

Acknowledgment

The research was funded by Key R&D Program of Guangdong Province, grant number 2018B010107005 and the Natural Science Foundation of Guangdong Province, grant number 2016A030313288.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingge Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shan, G., Ji, Q., Xie, Y. (2022). Multi-view Vision Transformer for Driver Action Recognition. In: Zhang, Z. (eds) 2021 6th International Conference on Intelligent Transportation Engineering (ICITE 2021). ICITE 2021. Lecture Notes in Electrical Engineering, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-19-2259-6_85

Download citation

Publish with us

Policies and ethics