Abstract
Distracted driving causes many accidents every year, most of which can be avoided with automatic recognition. As a result, vision-based driver action recognition is receiving increasing research attention. In a limited in-vehicle space, actions can be very ambiguous from an individual view. Therefore exploring efficient multi-view action recognition architecture is meaningful. This study aims to detect the distraction of drivers while identifying the cause. A novel driver action recognition architecture named multi-view vision transformer (MVVT) is proposed, which combines classical convolutional neural networks (CNNs) with vision transformer. Self-attention mechanism is utilized to dynamically aggregate temporal information and fuse features of different views jointly. Experiments demonstrate that MVVT can effectively recognize drivers’ behaviors with multi-view input. A promising result of 84.9% accuracy is achieved on a large public driver action dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
National Center for Statistics and Analysis, Distracted driving 2018 (2020)
National Center for Statistics and Analysis, Overview of motor vehicle crashes in 2019 (2020)
Xing, Y., Lv, C., Wang, H., Cao, D., Velenis, E., Wang, F.-Y.: Driver activity recognition for intelligent vehicles: A deep learning approach. IEEE Trans. Veh. Technol. 68(6), 5379–5390 (2019)
Dumitru, I., Girbacia, T., Boboc, R.G., Postelnicu, C.-C., Mogan, G.-L.: Effects of smartphone based advanced driver assistance system on distracted driving behavior: A simulator study. Comput. Hum. Behav. 83, 1–7 (2018)
Kircher, K., Ahlström, C.: Issues related to the driver distraction detection algorithm attend, in First international conference on driver distraction and inattention. Gothenburg, Sweden (2009)
Liu, T., Yang, Y., Huang, G.-B., Yeo, Y.K., Lin, Z.: Driver distraction detection using semi-supervised machine learning. IEEE Trans. Intell. Transp. Syst. 17(4), 1108–1120 (2015)
Craye, C., Karray, F.: Driver distraction detection and recognition using rgb-d sensor, arXiv preprint arXiv:1502.00250 (2015)
Kose, N., Kopuklu, O., Unnervik, A., Rigoll, G.: Real-time driver state monitoring using a cnn based spatio-temporal approach, in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp. 3236–3242 (2019)
Baheti, B., Gajre, S., Talbar, S.: Detection of distracted driver using convolutional neural network, in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1032– 1038 (2018)
Lu, M., Hu, Y., Lu, X.: Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Appl. Intell. 50(4), 1100–1111 (2020)
Yan, S., Teng, Y., Smith, J.S., Zhang, B.: Driver behavior recognition based on deep convolutional neural networks, in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, pp. 636–641 (2016)
Lemley, J., Bazrafkan, S., Corcoran, P.: Transfer learning of temporal information for driver action classification. in MAICS, pp. 123– 128 (2017)
Moslemi, N., Azmi, R., Soryani, M.: Driver distraction recognition using 3d convolutional neural networks, in 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA). IEEE, pp. 145–151 (2019)
Martin, M., et al.: Drive&act: a multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2801–2810 (2019)
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Soft spatial attention-based multimodal driver action recognition using deep learning, IEEE Sensors Journal (2020)
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: A novel public dataset for multimodal multiview and multi- spectral driver distraction analysis: 3mdad, Signal Processing: Image Communication 88, 115960 (2020)
Mase, J.M., Chapman, P., Figueredo, G.P., Torres, M.T.: A hybrid deep learning approach for driver distraction detection, in 2020 In- ternational Conference on Information and Communication Technology Convergence (ICTC). IEEE, pp. 1–6 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
He, I., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift, in International conference on machine learning. PMLR, pp. 448–456 (2015)
Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification, arXiv preprint arXiv:1706.09498 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset, in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
State Farm Corporate, State farm distracted driver detection, https://www.kaggle.com/c/state-farm-distracted-driver-detection (2016)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description, in Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2625–2634 (2015)
Simonyan, I., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)
Yuan, J., et al.: Tokens-to-token vit: Training vision transformers from scratch on imagenet” arXiv preprint arXiv:2101.11986 (2021)
Wu, H., et al.: Cvt: Introducing convolutions to vision transformers, arXiv preprint arXiv:2103.15808 (2021)
Vaswani, A., et al.: Attention is all you need, arXiv preprint arXiv:1706.03762 (2017)
Dey, A.K., Goel, B., Chellappan, S.: Context-driven detection of distracted driving using images from in-car cameras, Internet of Things 14, 100380 (2021)
Leekha, I., Goswami, M., Shah, R.R., Yin, Y., Zimmermann, R.: Are you paying attention? detecting distracted driving in real-time, in 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, pp. 171–180 (2019)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks, in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)
Deng, J., et al.: IEEE conference on computer vision and pattern recognition. Ieee 2009, 248–255 (2009)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition, in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Acknowledgment
The research was funded by Key R&D Program of Guangdong Province, grant number 2018B010107005 and the Natural Science Foundation of Guangdong Province, grant number 2016A030313288.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shan, G., Ji, Q., Xie, Y. (2022). Multi-view Vision Transformer for Driver Action Recognition. In: Zhang, Z. (eds) 2021 6th International Conference on Intelligent Transportation Engineering (ICITE 2021). ICITE 2021. Lecture Notes in Electrical Engineering, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-19-2259-6_85
Download citation
DOI: https://doi.org/10.1007/978-981-19-2259-6_85
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2258-9
Online ISBN: 978-981-19-2259-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)