Skip to main content
Log in

Driver intention prediction based on multi-dimensional cross-modality information interaction

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Driver intention prediction allows drivers to perceive possible dangers in the fastest time and has become one of the most important research topics in the field of self-driving in recent years. In this study, we propose a driver intention prediction method based on multi-dimensional cross-modality information interaction. First, an efficient video recognition network is designed to extract channel-temporal features of in-side (driver) and out-side (road) videos, respectively, in which we design a cross-modality channel-spatial weight mechanism to achieve information interaction between the two feature extraction networks corresponding, respectively, to the two modalities, and we also introduce a contrastive learning module by which we force the two feature extraction networks to enhance structural knowledge interaction. Then, the obtained representations of in- and outside videos are fused using a ResLayer-based module to get a preliminary prediction which is then corrected by incorporating the GPS information to obtain a final decision. Besides, we employ a multi-task framework to train the entire network. We validate the proposed method on the public dataset Brain4Car, and the results show that the proposed method achieves competitive results in accuracy while balancing performance and computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Bonyani, M.; Rahmanian, M.; Jahangard, S. Predicting Driver Intention Using Deep Neural Network 2021.

  2. Rezaei, M.; Klette, R. Look at the Driver, Look at the Road: No Distraction! No Accident! In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Columbus, OH, USA, June 2014; pp. 129–136.

  3. Gite, S., Agrawal, H., Kotecha, K.: Early anticipation of driver’s maneuver in semiautonomous vehicles using deep learning. Prog Artif Intell. 8, 293–305 (2019). https://doi.org/10.1007/s13748-019-00177-z

    Article  Google Scholar 

  4. Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A survey of autonomous driving: common practices and emerging technologies. IEEE Access. 8, 58443–58469 (2020). https://doi.org/10.1109/ACCESS.2020.2983149

    Article  Google Scholar 

  5. Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. IEEE Trans. Pattern Anal. Mach. Intell. 38, 14–29 (2016). https://doi.org/10.1109/TPAMI.2015.2430335

    Article  Google Scholar 

  6. Gite, S., Pradhan, B., Alamri, A., Kotecha, K.: ADMT: advanced driver’s movement tracking system using spatio-temporal interest points and maneuver anticipation using deep neural networks. IEEE Access. 9, 99312–99326 (2021). https://doi.org/10.1109/ACCESS.2021.3096032

    Article  Google Scholar 

  7. Jain, A.; Koppula, H.S.; Soh, S.; Raghavan, B.; Singh, A.; Saxena, A. Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture 2016.

  8. Zhou, D.; Ma, H.; Dong, Y. Driving Maneuvers Prediction Based on Cognition-Driven and Data-Driven Method. In Proceedings of the 2018 IEEE Visual Communications and Image Processing (VCIP); IEEE: Taichung, Taiwan, December 2018; pp. 1–4.

  9. Tonutti, M., Ruffaldi, E., Cattaneo, A., Avizzano, C.A.: Robust and subject-independent driving manoeuvre anticipation through domain-adversarial recurrent neural networks. Robot. Auton. Syst. 115, 162–173 (2019). https://doi.org/10.1016/j.robot.2019.02.007

    Article  Google Scholar 

  10. Rong, Y.; Akata, Z.; Kasneci, E. Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC); IEEE: Rhodes, Greece, September 20 2020; pp. 1–8.

  11. Braunagel, C., Rosenstiel, W., Kasneci, E.: Ready for take-over? A new driver assistance system for an automated classification of driver take-over readiness. IEEE Intell. Transport. Syst. Mag. 9, 10–22 (2017). https://doi.org/10.1109/MITS.2017.2743165

    Article  Google Scholar 

  12. Jang, Y.-M.; Mallipeddi, R.; Lee, M. Driver’s Lane-Change Intent Identification Based on Pupillary Variation. In Proceedings of the 2014 IEEE International Conference on Consumer Electronics (ICCE); IEEE: Las Vegas, NV, USA, January 2014; pp. 197–198.

  13. Amsalu, S.B.; Homaifar, A. Driver Behavior Modeling near Intersections Using Hidden Markov Model Based on Genetic Algorithm. In Proceedings of the 2016 IEEE International Conference on Intelligent Transportation Engineering (ICITE); IEEE: Singapore, August 2016; pp. 193–200.

  14. Zheng, Y., Hansen, J.H.L.: Lane-change detection from steering signal using spectral segmentation and learning-based classification. IEEE Trans. Intell. Veh. 2, 14–24 (2017). https://doi.org/10.1109/TIV.2017.2708600

    Article  Google Scholar 

  15. Kim, I.-H., Bong, J.-H., Park, J., Park, S.: Prediction of driver’s intention of lane change by augmenting sensor information using machine learning techniques. Sensors. 17, 1350 (2017). https://doi.org/10.3390/s17061350

    Article  Google Scholar 

  16. Chen, H., Chen, H., Liu, H., Feng, X.: Spatiotemporal Feature Enhancement Aids the Driving Intention Inference of Intelligent Vehicles. IJERPH 19, 11819 (2022). https://doi.org/10.3390/ijerph191811819

    Article  Google Scholar 

  17. Gite, S.; Agrawal, H. Early Prediction of Driver’s Action Using Deep Neural Networks: International Journal of Information Retrieval Research 2019, 9, 11–27, doi:https://doi.org/10.4018/IJIRR.2019040102

  18. Xing, Y.; Hu, Z.; Huang, Z.; Lv, C.; Cao, D.; Velenis, E. Multi-Scale Driver Behaviors Reasoning System for Intelligent Vehicles Based on a Joint Deep Learning Framework. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: Toronto, ON, Canada, October 11 2020; pp. 4410–4415.

  19. Bonyani, M.; Rahmanian, M.; Jahangard, S.; Rezaei, M. DIPNet: Driver Intention Prediction for a Safe Takeover Transition in Autonomous Vehicles. IET Intelligent Trans Sys 2023, itr2.12370, https://doi.org/10.1049/itr2.12370.

  20. Zhou, D., Liu, H., Ma, H., Wang, X., Zhang, X., Dong, Y.: Driving behavior prediction considering cognitive prior and driving context. IEEE Trans. Intell. Transport. Syst. 22, 2669–2678 (2021). https://doi.org/10.1109/TITS.2020.2973751

    Article  Google Scholar 

  21. O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks 2015.

  22. Fan, Y.; Lu, X.; Li, D.; Liu, Y. Video-Based Emotion Recognition Using CNN-RNN and C3D Hybrid Networks. In Proceedings of the Proceedings of the 18th ACM International Conference on Multimodal Interaction; Association for Computing Machinery: New York, NY, USA, October 31 2016; pp. 445–450.

  23. Liu, J.; Shahroudy, A.; Xu, D.; Wang, G. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. In Proceedings of the Computer Vision – ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, 2016; pp. 816–833.

  24. Stroud, J.; Ross, D.; Sun, C.; Deng, J.; Sukthankar, R. D3D: Distilled 3D Networks for Video Action Recognition.; 2020; pp. 625–634.

  25. Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features With 3D Convolutional Networks.; 2015; pp. 4489–4497.

  26. Lin, J.; Gan, C.; Han, S. TSM: Temporal Shift Module for Efficient Video Understanding.; 2019; pp. 7083–7093.

  27. Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc., 2014; Vol. 27.

  28. Zhu, Y.; Lan, Z.; Newsam, S.; Hauptmann, A. Hidden Two-Stream Convolutional Networks for Action Recognition. In Proceedings of the Computer Vision – ACCV 2018; Jawahar, C.V., Li, H., Mori, G., Schindler, K., Eds.; Springer International Publishing: Cham, 2019; pp. 363–378.

  29. Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition.; 2016; pp. 1933–1941.

  30. Li, Y.; Ji, B.; Shi, X.; Zhang, J.; Kang, B.; Wang, L. TEA: Temporal Excitation and Aggregation for Action Recognition.; 2020; pp. 909–918.

  31. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  32. Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., Sun, W.: PD2SE-net: computer-assisted plant disease diagnosis and severity estimation network. Comput. Electron. Agric. 157, 518–529 (2019). https://doi.org/10.1016/j.compag.2019.01.034

    Article  Google Scholar 

  33. Liu, Y.; Ni, K.; Zhang, Y.; Zhou, L.; Zhao, K. Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification 2022.

  34. T, R.; Valsalan, P.; J, A.; M, J.; S, R.; Latha G, C.P.; T, A. Hyperspectral Image Classification Model Using Squeeze and Excitation Network with Deep Learning. Comput Intell Neurosci 2022, 2022, 9430779, doi:https://doi.org/10.1155/2022/9430779.

  35. Perez-Rua, J.-M.; Martinez, B.; Zhu, X.; Toisoul, A.; Escorcia, V.; Xiang, T. Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention 2020.

  36. Wang, Z.; She, Q.; Smolic, A. ACTION-Net: Multipath Excitation for Action Recognition.; 2021; pp. 13214–13223.

  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition.; 2016; pp. 770–778.

  38. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal Segment Networks for Action Recognition in Videos. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2740–2755 (2019). https://doi.org/10.1109/TPAMI.2018.2868668

    Article  Google Scholar 

  39. Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks 2016.

  40. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module 2018.

  41. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning 2020.

  42. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations 2020.

  43. Wu, Z.; Xiong, Y.; Yu, S.; Lin, D. Unsupervised Feature Learning via Non-Parametric Instance-Level Discrimination 2018.

  44. Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.-T.; Parekh, Z.; Pham, H.; Le, Q.V.; Sung, Y.; Li, Z.; Duerig, T. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 2021.

  45. Bao, H.; Wang, W.; Dong, L.; Liu, Q.; Mohammed, O.K.; Aggarwal, K.; Som, S.; Wei, F. VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts 2022.

  46. Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation 2014.

  47. Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimedia 20, 1038–1050 (2018). https://doi.org/10.1109/TMM.2018.2808769

    Article  Google Scholar 

  48. Materzynska, J.; Berger, G.; Bax, I.; Memisevic, R. The Jester Dataset: A Large-Scale Video Dataset of Human Gestures.; 2019; pp. 0–0.

  49. Goyal, R.; Ebrahimi Kahou, S.; Michalski, V.; Materzynska, J.; Westphal, S.; Kim, H.; Haenel, V.; Fruend, I.; Yianilos, P.; Mueller-Freitag, M.; et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense.; 2017; pp. 5842–5850.

  50. Jiang, B.; Wang, M.; Gan, W.; Wu, W.; Yan, J. STM: SpatioTemporal and Motion Encoding for Action Recognition 2019.

  51. Zhang, C.; Zou, Y.; Chen, G.; Gan, L. PAN: Towards Fast Action Recognition via Learning Persistence of Appearance 2020.

  52. Wang, F.; Su, Y.; Wang, R.; Sun, J.; Sun, F.; Li, H. Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection. IEEE Trans. Artif. Intell. 2023, 1–15, https://doi.org/10.1109/TAI.2023.3333827.

  53. Wang, R., Wang, F., Su, Y., Sun, J., Sun, F., Li, H.: Attention-guided multi-modality interaction network for RGB-D salient object detection. ACM Trans. Multimedia Comput. Commun. Appl. 20, 1–22 (2024). https://doi.org/10.1145/3624747

    Article  Google Scholar 

  54. Wang, F., Wang, R., Sun, F.: DCMNet: Discriminant and Cross-Modality Network for RGB-D Salient Object Detection. Expert Syst. Appl. 214, 119047 (2023). https://doi.org/10.1016/j.eswa.2022.119047

    Article  Google Scholar 

  55. Ye, T.; Jing, W.; Hu, C.; Huang, S.; Gao, L.; Li, F.; Wang, J.; Guo, K.; Xiao, W.; Mao, W.; et al. FusionAD: Multi-Modality Fusion for Prediction and Planning Tasks of Autonomous Driving 2023.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Zengkui Xu, Shaohua Qiao and Jiannan Zheng wrote the main manuscript text, Dongliang Peng, Mengfan Xue and Tao Li revised and checked the innovative points of the article and Yuerong Wang designed a comparative learning module in the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dongliang Peng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by H. Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, M., Xu, Z., Qiao, S. et al. Driver intention prediction based on multi-dimensional cross-modality information interaction. Multimedia Systems 30, 83 (2024). https://doi.org/10.1007/s00530-024-01282-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01282-3

Keywords

Navigation