Multi-view Vision Transformer for Driver Action Recognition

Shan, Guangwei; Ji, Qingge; Xie, Yuguang

doi:10.1007/978-981-19-2259-6_85

Guangwei Shan^38,39,
Qingge Ji^38,39 &
Yuguang Xie^38,39

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 901))

Included in the following conference series:

International Conference on Intelligent Transportation Engineering

1524 Accesses
1 Citations

Abstract

Distracted driving causes many accidents every year, most of which can be avoided with automatic recognition. As a result, vision-based driver action recognition is receiving increasing research attention. In a limited in-vehicle space, actions can be very ambiguous from an individual view. Therefore exploring efficient multi-view action recognition architecture is meaningful. This study aims to detect the distraction of drivers while identifying the cause. A novel driver action recognition architecture named multi-view vision transformer (MVVT) is proposed, which combines classical convolutional neural networks (CNNs) with vision transformer. Self-attention mechanism is utilized to dynamically aggregate temporal information and fuse features of different views jointly. Experiments demonstrate that MVVT can effectively recognize drivers’ behaviors with multi-view input. A promising result of 84.9% accuracy is achieved on a large public driver action dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

National Center for Statistics and Analysis, Distracted driving 2018 (2020)
Google Scholar
National Center for Statistics and Analysis, Overview of motor vehicle crashes in 2019 (2020)
Google Scholar
Xing, Y., Lv, C., Wang, H., Cao, D., Velenis, E., Wang, F.-Y.: Driver activity recognition for intelligent vehicles: A deep learning approach. IEEE Trans. Veh. Technol. 68(6), 5379–5390 (2019)
Article Google Scholar
Dumitru, I., Girbacia, T., Boboc, R.G., Postelnicu, C.-C., Mogan, G.-L.: Effects of smartphone based advanced driver assistance system on distracted driving behavior: A simulator study. Comput. Hum. Behav. 83, 1–7 (2018)
Article Google Scholar
Kircher, K., Ahlström, C.: Issues related to the driver distraction detection algorithm attend, in First international conference on driver distraction and inattention. Gothenburg, Sweden (2009)
Google Scholar
Liu, T., Yang, Y., Huang, G.-B., Yeo, Y.K., Lin, Z.: Driver distraction detection using semi-supervised machine learning. IEEE Trans. Intell. Transp. Syst. 17(4), 1108–1120 (2015)
Article Google Scholar
Craye, C., Karray, F.: Driver distraction detection and recognition using rgb-d sensor, arXiv preprint arXiv:1502.00250 (2015)
Kose, N., Kopuklu, O., Unnervik, A., Rigoll, G.: Real-time driver state monitoring using a cnn based spatio-temporal approach, in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp. 3236–3242 (2019)
Google Scholar
Baheti, B., Gajre, S., Talbar, S.: Detection of distracted driver using convolutional neural network, in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1032– 1038 (2018)
Google Scholar
Lu, M., Hu, Y., Lu, X.: Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Appl. Intell. 50(4), 1100–1111 (2020)
Article Google Scholar
Yan, S., Teng, Y., Smith, J.S., Zhang, B.: Driver behavior recognition based on deep convolutional neural networks, in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, pp. 636–641 (2016)
Google Scholar
Lemley, J., Bazrafkan, S., Corcoran, P.: Transfer learning of temporal information for driver action classification. in MAICS, pp. 123– 128 (2017)
Google Scholar
Moslemi, N., Azmi, R., Soryani, M.: Driver distraction recognition using 3d convolutional neural networks, in 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA). IEEE, pp. 145–151 (2019)
Google Scholar
Martin, M., et al.: Drive&act: a multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2801–2810 (2019)
Google Scholar
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Soft spatial attention-based multimodal driver action recognition using deep learning, IEEE Sensors Journal (2020)
Google Scholar
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: A novel public dataset for multimodal multiview and multi- spectral driver distraction analysis: 3mdad, Signal Processing: Image Communication 88, 115960 (2020)
Google Scholar
Mase, J.M., Chapman, P., Figueredo, G.P., Torres, M.T.: A hybrid deep learning approach for driver distraction detection, in 2020 In- ternational Conference on Information and Communication Technology Convergence (ICTC). IEEE, pp. 1–6 (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
He, I., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift, in International conference on machine learning. PMLR, pp. 448–456 (2015)
Google Scholar
Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification, arXiv preprint arXiv:1706.09498 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset, in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
State Farm Corporate, State farm distracted driver detection, https://www.kaggle.com/c/state-farm-distracted-driver-detection (2016)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description, in Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2625–2634 (2015)
Google Scholar
Simonyan, I., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)
Yuan, J., et al.: Tokens-to-token vit: Training vision transformers from scratch on imagenet” arXiv preprint arXiv:2101.11986 (2021)
Wu, H., et al.: Cvt: Introducing convolutions to vision transformers, arXiv preprint arXiv:2103.15808 (2021)
Vaswani, A., et al.: Attention is all you need, arXiv preprint arXiv:1706.03762 (2017)
Dey, A.K., Goel, B., Chellappan, S.: Context-driven detection of distracted driving using images from in-car cameras, Internet of Things 14, 100380 (2021)
Google Scholar
Leekha, I., Goswami, M., Shah, R.R., Yin, Y., Zimmermann, R.: Are you paying attention? detecting distracted driving in real-time, in 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, pp. 171–180 (2019)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks, in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)
Google Scholar
Deng, J., et al.: IEEE conference on computer vision and pattern recognition. Ieee 2009, 248–255 (2009)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar

Download references

Acknowledgment

The research was funded by Key R&D Program of Guangdong Province, grant number 2018B010107005 and the Natural Science Foundation of Guangdong Province, grant number 2016A030313288.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Guangwei Shan, Qingge Ji & Yuguang Xie
Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China
Guangwei Shan, Qingge Ji & Yuguang Xie

Authors

Guangwei Shan
View author publications
You can also search for this author in PubMed Google Scholar
Qingge Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yuguang Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingge Ji .

Editor information

Editors and Affiliations

Department of Mechanical and Electrical Engineering, University of Electronic Science and Technology, Chengdu, Sichuan, China
Zhenyuan Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shan, G., Ji, Q., Xie, Y. (2022). Multi-view Vision Transformer for Driver Action Recognition. In: Zhang, Z. (eds) 2021 6th International Conference on Intelligent Transportation Engineering (ICITE 2021). ICITE 2021. Lecture Notes in Electrical Engineering, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-19-2259-6_85

Download citation

DOI: https://doi.org/10.1007/978-981-19-2259-6_85
Published: 01 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2258-9
Online ISBN: 978-981-19-2259-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics