Skip to main content
Log in

Recent advancements in driver’s attention prediction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Ensuring the precise anticipation of a driver’s attention is crucial for upholding safety in diverse human-centric transportation scenarios. This capability proves invaluable for discerning and evaluating accident risks in driver assistance systems, as well as in autonomous driving scenarios. Beyond traditional approaches that consider contextual variables and non-visual cues, such as the driver’s mental state and brain activity patterns, recent groundbreaking advancements in deep learning-based computer vision have propelled the momentum of visual attention prediction. The fundamental concept behind these techniques is to replicate the human visual system’s ability to identify critical objects or regions capturing a driver’s attention. Notably, the integration of salient object detection and attention mechanisms has gained popularity across various application domains. This paper provides a comprehensive overview of the current state of the art in this field. The primary focus of the study is to explore the architectures and workflows of deep learning-based models dedicated to the attention prediction process. To achieve this, we examine recent works in the field, specifically selecting those that have leveraged (simulated) attention mechanisms to estimate the driver’s attention area. Our analysis delves into the contributions of each model, elucidating their architectural design and underlying organization. Additionally, we scrutinize the effectiveness of each model by explaining the reported experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Ablaßmeier M, Poitschke T, Wallhoff F, et al (2007) Eye gaze studies comparing head-up and head-down displays in vehicles. In: 2007 IEEE International Conference on Multimedia and Expo, IEEE, pp 2250–2252

  2. Aksoy E, Yazici A, Kasap M (2020) See, attend and brake: An attention-based saliency map prediction model for end-to-end driving. CoRR abs/2002.11020. https://arxiv.org/abs/2002.11020, 2002.11020

  3. Almahasneh H, Chooi WT, Kamel N et al (2014) Deep in thought while driving: An eeg study on drivers’ cognitive distraction. Transportation research part F: traffic psychology and behaviour 26:218–226

  4. Ameyoe A, Mars F, Chevrel P et al (2015) Estimation of driver distraction using the prediction error of a cybernetic driver model. DSC, Simulation Design and Architecture, Germany

  5. Araluce J, Bergasa LM, Ocaña M, et al (2022) Aragan: A driver attention estimation model based on conditional generative adversarial network. In: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 1066–1072

  6. Azman A, Meng Q, Edirisinghe E (2010) Non intrusive physiological measurement for driver cognitive distraction detection: Eye and mouth movements. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), IEEE, pp V3–595

  7. Bach KM, Jæger MG, Skov MB, et al (2009) Interacting with in-vehicle systems: understanding, measuring, and evaluating attention. In: Proceedings of the 2009 British Computer Society Conference on Human-Computer Interaction, BCS-HCI 2009, Cambridge, United Kingdom, 1-5 September 2009. ACM, pp 453–462, https://dl.acm.org/citation.cfm?id=1671070

  8. Baee S, Pakdamanian E, Kim I, et al (2021) Medirl: Predicting the visual attention of drivers via maximum entropy deep inverse reinforcement learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13178–13188

  9. Bao W, Yu Q, Kong Y (2021) Drive: Deep reinforced accident anticipation with visual explanation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7619–7628

  10. Bazzani L, Larochelle H, Torresani L (2016) Recurrent mixture density network for spatiotemporal visual attention. arXiv preprint arXiv:1603.08199

  11. Borji A, Sihite DN, Itti L (2012) Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Trans Image Process 22(1):55–69

    Article  MathSciNet  Google Scholar 

  12. Borji A, Tavakoli HR, Sihite DN, et al (2013) Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of the IEEE international conference on computer vision, pp 921–928

  13. Brunkhorst-Kanaan N, Libutzki B, Reif A et al (2021) Adhd and accidents over the life span-a systematic review. Neurosci Biobehav Rev 125:582–591

    Article  Google Scholar 

  14. Bylinskii Z, Judd T, Oliva A et al (2019) What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41(3):740–757. https://doi.org/10.1109/TPAMI.2018.2815601

    Article  Google Scholar 

  15. Cai J, Hu J, Tang X et al (2020) Deep historical long short-term memory network for action recognition. Neurocomputing 407:428–438

    Article  Google Scholar 

  16. Chaabane M, Trabelsi A, Blanchard N, et al (2020) Looking ahead: Anticipating pedestrians crossing with future frames prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2297–2306

  17. Chang Z, Zhang X, Wang S, et al (2022) Stam: A spatiotemporal attention based memory for video prediction. IEEE Transactions on Multimedia

  18. Chen L, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. https://arxiv.org/abs/1706.05587, 1706.05587

  19. Chen Y, Nan Z (2022) Inspired by norbert wiener: Feedback loop network learning incremental knowledge for driver attention prediction and beyond. CoRR abs/2212.02096. https://doi.org/10.48550/arXiv.2212.02096,2212.02096ff

  20. Chen Y, Wang J, Li J, et al (2018) Lidar-video driving dataset: Learning driving policies effectively. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5870–5878, https://doi.org/10.1109/CVPR.2018.00615

  21. Cheng F, Bertasius G (2022) Tallformer: Temporal action localization with a long-memory transformer. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIV, Springer, pp 503–521

  22. Choi J, Chun D, Kim H, et al (2019) Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 502–511

  23. Cognolato M, Atzori M, Müller H (2018) Head-mounted eye gaze tracking devices: An overview of modern devices and recent advances. J Rehabil Assist Technol Eng 5:2055668318773991

    Google Scholar 

  24. Cornia M, Baraldi L, Serra G, et al (2016) A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 3488–3493

  25. Cornia M, Baraldi L, Serra G et al (2018) Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans Image Process 27(10):5142–5154

    Article  MathSciNet  Google Scholar 

  26. Cvahte Ojsteršek T, Topolšek D (2019) Influence of drivers’ visual and cognitive attention on their perception of changes in the traffic environment. Eur Transp Res Rev 11(1):1–9

    Article  Google Scholar 

  27. Dai R, Minciullo L, Garattoni L, et al (2019) Self-attention temporal convolutional network for long-term daily living activity detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–7

  28. Deng J, Dong W, Socher R, et al (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255

  29. Deng T, Yan H, Qin L et al (2020) How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks. IEEE Trans Intell Transp Syst 21(5):2146–2154. https://doi.org/10.1109/TITS.2019.2915540

    Article  Google Scholar 

  30. Deng T, Yan F, Yan H (2021) Driving video fixation prediction model via spatio-temporal networks and attention gates. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6

  31. Droste R, Jiao J, Noble JA (2020) Unified image and video saliency modeling. In: Vedaldi A, Bischof H, Brox T, et al (eds) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V, Lecture Notes in Computer Science, vol 12350. Springer, pp 419–435, https://doi.org/10.1007/978-3-030-58558-7_25

  32. Fang F, He S (2005) Cortical responses to invisible objects in the human dorsal and ventral pathways. Nat Neurosci 8(10):1380–1385

    Article  Google Scholar 

  33. Fang J, Yan D, Qiao J et al (2019) Dada-2000: Can driving accident be predicted by driver attentionf analyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, pp 4303–4309

  34. Fang J, Yan D, Qiao J et al (2021) Dada: Driver attention prediction in driving accident scenarios. IEEE Trans Intell Transp Syst 23(6):4959–4971

    Article  Google Scholar 

  35. Fu R, Huang T, Li M et al (2023) A multimodal deep neural network for prediction of the driver’s focus of attention based on anthropomorphic attention mechanism and prior knowledge. Expert Syst Appl 214:119157

  36. Fu Z, Liu Q, Fu Z, et al (2021) Stmtrack: Template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13774–13783

  37. Gan S, Li Q, Wang Q, et al (2021) Constructing personalized situation awareness dataset for hazard perception, comprehension, projection, and action of drivers. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, Indianapolis, IN, USA, September 19-22, 2021. IEEE, pp 1697–1704, https://doi.org/10.1109/ITSC48978.2021.9564543

  38. Gan S, Pei X, Ge Y et al (2022) Multisource adaption for driver attention prediction in arbitrary driving scenes. IEEE Trans Intell Transp Syst 23(11):20912–20925

    Article  Google Scholar 

  39. Giang WC, Shanti I, Chen HYW, et al (2015) Smartwatches vs. smartphones: A preliminary report of driver behavior and perceived risk while responding to notifications. In: Proceedings of the 7th international conference on automotive user interfaces and interactive vehicular applications, pp 154–161

  40. Girma A, Amsalu S, Workineh A, et al (2020) Deep learning with attention mechanism for predicting driver intention at intersection. In: 2020 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 1183–1188

  41. Gou C, Zhou Y, Li D (2022) Driver attention prediction based on convolution and transformers. J Supercomput 78(6):8268–8284

    Article  Google Scholar 

  42. Hansen JH, Busso C, Zheng Y et al (2017) Driver modeling for detection and assessment of driver distraction: Examples from the utdrive test bed. IEEE Signal Process Mag 34(4):130–142

    Article  Google Scholar 

  43. Harel J, Koch C, Perona P (2006) Graph-based visual saliency. Adv Neural Inf Process Syst 19

  44. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  45. He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer, pp 630–645

  46. Hoehl S, Striano T (2010) The development of emotional face and eye gaze processing. Dev Sci 13(6):813–825

    Article  Google Scholar 

  47. Hou X, Harel J, Koch C (2011) Image signature: Highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201

    Google Scholar 

  48. Hu Y, Lu M, Lu X (2020) Feature refinement for image-based driver action recognition via multi-scale attention convolutional neural network. Signal Process Image Commun 81:115697

    Article  Google Scholar 

  49. Hu Z, Lv C, Hang P et al (2021) Data-driven estimation of driver attention using calibration-free eye gaze and scene features. IEEE Trans Ind Electron 69(2):1800–1808

    Article  Google Scholar 

  50. Hu Z, Zhang Y, Li Q et al (2022) A novel heterogeneous network for modeling driver attention with multi-level visual content. IEEE Trans Intell Transp Syst 23(12):24343–24354

    Article  Google Scholar 

  51. Huang T, Fu R (2022) Driver distraction detection based on the true driver’s focus of attention. IEEE Trans Intell Transp Syst 23(10):19374–19386. https://doi.org/10.1109/TITS.2022.3166208

    Article  Google Scholar 

  52. Huang T, Fu R (2022) Prediction of the driver’s focus of attention based on feature visualization of a deep autonomous driving model. Knowl Based Syst 251:109006. https://doi.org/10.1016/j.knosys.2022.109006

    Article  Google Scholar 

  53. Huang T, Zhao S, Geng L et al (2019) Unsupervised monocular depth estimation based on residual neural network of coarse-refined feature extractions for drone. Electronics 8(10):1179

    Article  Google Scholar 

  54. Huang X, Shen C, Boix X, et al (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 262–270

  55. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach FR, Blei DM (eds) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR Workshop and Conference Proceedings, vol 37. JMLR.org, pp 448–456, http://proceedings.mlr.press/v37/ioffe15.html

  56. Isola P, Zhu JY, Zhou T, et al (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  57. Itier RJ, Batty M (2009) Neural bases of eye and gaze processing: the core of social cognition. Neurosci Biobehav Rev 33(6):843–863

    Article  Google Scholar 

  58. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach. Intell 20(11):1254–1259

    Article  Google Scholar 

  59. Jegham I, Khalifa AB, Alouani I et al (2020) Soft spatial attention-based multimodal driver action recognition using deep learning. IEEE Sens J 21(2):1918–1925

    Article  Google Scholar 

  60. Jocher G, Chaurasia A, Stoken A, et al (2022) ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. https://doi.org/10.5281/zenodo.7347926

  61. Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. MIT computer science and artificial intelligence laboratory technical report (Retrieved from https://www.hdlhandlenet/17211/68590)

  62. Kalda K, Pizzagalli SL, Soe RM et al (2022) Language of driving for autonomous vehicles. Appl Sci 12(11):5406

    Article  Google Scholar 

  63. Kang B, Lee Y (2020) High-resolution neural network for driver visual attention prediction. Sensors 20(7):2030

    Article  Google Scholar 

  64. Kang B, Lee Y (2021) A driver’s visual attention prediction using optical flow. Sensors 21(11):3722

  65. Kasahara I, Stent S, Park HS (2022) Look both ways: Self-supervising driver gaze estimation and road scene saliency. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, Springer, pp 126–142

  66. Kay W, Carreira J, Simonyan K, et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950

  67. Kellnhofer P, Recasens A, Stent S, et al (2019) Gaze360: Physically unconstrained gaze estimation in the wild. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, pp 6911–6920, https://doi.org/10.1109/ICCV.2019.00701

  68. Kim J, Rohrbach A, Darrell T, et al (2018) Textual explanations for self-driving vehicles. In: Proceedings of the European conference on computer vision (ECCV), pp 563–578

  69. Kim J, Ma M, Kim K, et al (2019) Progressive attention memory network for movie story question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8337–8346

  70. Kotseruba I, Tsotsos JK (2021) Behavioral research and practical models of drivers’ attention. CoRR abs/2104.05677. https://arxiv.org/abs/2104.05677,2104.05677

  71. Kouchak SM, Gaffar A (2020) Detecting driver behavior using stacked long short term memory network with attention layer. IEEE Trans Intell Transp Syst 22(6):3420–3429

    Article  Google Scholar 

  72. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6):84–90

    Article  Google Scholar 

  73. Kroner A, Senden M, Driessens K et al (2020) Contextual encoder-decoder network for visual saliency prediction. Neural Netw 129:261–270. https://doi.org/10.1016/j.neunet.2020.05.004

    Article  Google Scholar 

  74. Kümmerer M, Wallis T, Bethge M (2014) How close are we to understanding image-based saliency? arXiv preprint arXiv:1409.7686

  75. Kümmerer M, Wallis TS, Bethge M (2015) Information-theoretic model comparison unifies saliency metrics. Proc Natl Acad Sci 112(52):16054–16059

    Article  Google Scholar 

  76. Kwon YH, Park MG (2019) Predicting future frames using retrospective cycle gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1811–1820

  77. Le Meur O, Le Callet P, Barba D (2007) Predicting visual fixations on video based on low-level visual features. Vis Res 47(19):2483–2498

    Article  Google Scholar 

  78. Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European conference on computer vision, Springer, pp 702–716

  79. Li J, Levine MD, An X et al (2012) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010

    Article  Google Scholar 

  80. Li Q, Liu C, Chang F, et al (2022) Adaptive short-temporal induced aware fusion network for predicting attention regions like a driver. IEEE Trans Intell Transp Syst

  81. Li T, Zhang Y, Li Q et al (2022) Ab-dlm: An improved deep learning model based on attention mechanism and bifpn for driver distraction behavior detection. IEEE Access 10:83138–83151

    Article  Google Scholar 

  82. Lin L, Li W, Bi H et al (2021) Vehicle trajectory prediction using lstms with spatial-temporal attention mechanisms. IEEE Intell Transp Syst Mag 14(2):197–208

    Article  Google Scholar 

  83. Lin Y, Cao D, Fu Z et al (2022) A lightweight attention-based network towards distracted driving behavior recognition. Appl Sci 12(9):4191

    Article  Google Scholar 

  84. Linardos P, Mohedano E, Nieto JJ, et al (2019) Simple vs complex temporal recurrences for video saliency prediction. arXiv preprint arXiv:1907.01869

  85. Lisheng J, Bingdong J, Baicang G et al (2022) Mtsf: Multi-scale temporal-spatial fusion network for driver attention prediction. Available at SSRN: https://www.ssrncom/abstract=4167535

  86. Liu Z, Mao H, Wu CY, et al (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11976–11986

  87. Lou J, Lin H, Marshall D, et al (2021) Transalnet: Visual saliency prediction using transformers. CoRR abs/2110.03593. https://arxiv.org/abs/2110.03593, 2110.03593

  88. Lv K, Sheng H, Xiong Z et al (2020) Improving driver gaze prediction with reinforced attention. IEEE Trans Multimedia 23:4198–4207

    Article  Google Scholar 

  89. Magán E, Sesmero MP, Alonso-Weber JM et al (2022) Driver drowsiness detection by applying deep learning techniques to sequences of images. Appl Sci 12(3):1145

    Article  Google Scholar 

  90. Min K, Corso JJ (2019) Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2394–2403

  91. Min X, Zhai G, Zhou J et al (2020) A multimodal saliency model for videos with high audio-visual correspondence. IEEE Trans Image Process 29:3805–3819

    Article  MathSciNet  Google Scholar 

  92. Oktay O, Schlemper J, Folgoc LL, et al (2018) Attention u-net: Learning where to look for the pancreas. CoRR abs/1804.03999. http://arxiv.org/abs/1804.03999, 1804.03999

  93. Ortiz C, Ortiz-Peregrina S, Castro J et al (2018) Driver distraction by smartphone use (whatsapp) in different age groups. Accid Anal Prev 117:239–249

    Article  Google Scholar 

  94. Palazzi A, Abati D, Solera F et al (2018) Predicting the driver’s focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733

    Article  Google Scholar 

  95. Pan J, Ferrer CC, McGuinness K, et al (2017) Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081

  96. Pan X, Ge C, Lu R, et al (2022) On the integration of self-attention and convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, pp 805–815, https://doi.org/10.1109/CVPR52688.2022.00089

  97. Pandey NN, Muppalaneni NB (2022) A survey on visual and non-visual features in driver’s drowsiness detection. Multimed Tools Appl 81(26):38175–38215

  98. Pang Y, Zhao X, Zhang L, et al (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9413–9422

  99. Pele O, Werman M (2008) A linear time histogram metric for improved SIFT matching. In: Forsyth DA, Torr PHS, Zisserman A (eds) Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III, Lecture Notes in Computer Science, vol 5304. Springer, pp 495–508, https://doi.org/10.1007/978-3-540-88690-7_37

  100. Peng X, Zhao A, Wang S, et al (2019) Attention-driven driving maneuver detection system. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  101. Perlman D, Samost A, Domel AG et al (2019) The relative impact of smartwatch and smartphone use while driving on workload, attention, and driving performance. Appl Ergon 75:8–16

    Article  Google Scholar 

  102. Peters RJ, Iyer A, Itti L et al (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416

    Article  Google Scholar 

  103. Posner MI (1980) Orienting of attention. Q J Exp Psychol 32(1):3–25

    Article  Google Scholar 

  104. Pradhan AK, Hammel KR, DeRamus R et al (2005) Using eye movements to evaluate effects of driver age on risk perception in a driving simulator. Hum Factors 47(4):840–852

    Article  Google Scholar 

  105. Qin X, Zhang Z, Huang C et al (2020) U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognit 106:107404

    Article  Google Scholar 

  106. Ramos J (2022) Autonomous vehicles and accidents: are they safer than vehicles operated by drivers? https://tomorrow.city/a/self-driving-car-accident-rate

  107. Reddy N, Jain S, Yarlagadda P, et al (2020) Tidying deep saliency prediction architectures. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021. IEEE, pp 10241–10247, https://doi.org/10.1109/IROS45743.2020.9341574

  108. Rong Y, Kassautzki NR, Fuhl W, et al (2022) Where and what: Driver attention-based object detection. Proc ACM Hum Comput Interact 6(ETRA):1–22

  109. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241

  110. Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99

    Article  Google Scholar 

  111. Rundo F, Spampinato C, Battiato S, et al (2020) Advanced 1d temporal deep dilated convolutional embedded perceptual system for fast car-driver drowsiness monitoring. In: 2020 AEIT International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE), IEEE, pp 1–6

  112. Shao H, Wang L, Chen R, et al (2022) Safety-enhanced autonomous driving using interpretable sensor fusion transformer. arXiv preprint arXiv:2207.14024

  113. Sharma S, Kumar V (2023) Distracted driver detection using learning representations. Multimed Tools Appl pp 1–18

  114. Shi B, Dong W, Zhan Z (2022) Adafi-fcn: an adaptive feature integration fully convolutional network for predicting driver’s visual attention. Geo-spatial Information Science pp 1–17

  115. Son J, Park M, Park BB (2015) The effect of age, gender and roadway environment on the acceptance and effectiveness of advanced driver assistance systems. Transportation research part F: traffic psychology and behaviour 31:12–24

    Article  Google Scholar 

  116. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32. https://doi.org/10.1007/BF00130487

    Article  Google Scholar 

  117. Takeda Y, Sato T, Kimura K et al (2016) Electrophysiological evaluation of attention in drivers and passengers: Toward an understanding of drivers’ attentional state in autonomous vehicles. Transportation research part F: traffic psychology and behaviour 42:140–150

  118. Tavakoli HR, Borji A, Rahtu E, et al (2019) Dave: A deep audio-visual embedding for dynamic saliency prediction. arXiv preprint arXiv:1905.10693

  119. Tawari A, Kang B (2017) A computational framework for driver’s visual attention using a fully convolutional architecture. In: 2017 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 887–894

  120. Tian H, Deng T, Yan H (2022) Driving as well as on a sunny day? predicting driver’s fixation in rainy weather conditions via a dual-branch visual model. IEEE/CAA J Autom Sin 9(7):1335–1338

    Article  Google Scholar 

  121. Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  122. Tseng PH, Carmi R, Cameron IG et al (2009) Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis 9(7):4–4

    Article  Google Scholar 

  123. Ulrich L, Nonis F, Vezzetti E et al (2021) Can adas distract driver’s attention? an rgb-d camera and deep learning-based analysis. Appl Sci 11(24):11587

  124. Kastner S, Ungerleider LG (2000) Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23(1):315–341

  125. Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  126. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402

  127. Wang W, Shen J, Xie J et al (2019) Revisiting video saliency prediction in the deep learning era. IEEE Trans Pattern Anal Mach Intell 43(1):220–237

    Article  Google Scholar 

  128. Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417

  129. Webb BS, Dhruv NT, Solomon SG et al (2005) Early and late mechanisms of surround suppression in striate cortex of macaque. J Neurosci 25(50):11666–11675

    Article  Google Scholar 

  130. Woo S, Park J, Lee JY, et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  131. Woodman GF, Luck SJ (2003) Serial deployment of attention during visual search. J Exp Psychol: Hum Percept. Perform 29(1):121

    Google Scholar 

  132. Wu CY, Krahenbuhl P (2021) Towards long-form video understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1884–1894

  133. Wu CY, Li Y, Mangalam K, et al (2022) Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13587–13597

  134. Wundersitz L (2019) Driver distraction and inattention in fatal and injury crashes: Findings from in-depth road crash data. Traffic Inj Prev 20(7):696–701

    Article  Google Scholar 

  135. Xia Y, Zhang D, Kim J, et al (2018) Predicting driver attention in critical situations. In: Asian conference on computer vision, Springer, pp 658–674

  136. Xie C, Xia C, Ma M, et al (2022) Pyramid grafting network for one-stage high resolution saliency detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, pp 11707–11716, https://doi.org/10.1109/CVPR52688.2022.01142

  137. Xu F, Li J, Yuan Q, et al (2020) A dual-attention-based neural network for see-through driving decision. In: 92nd IEEE Vehicular Technology Conference, VTC Fall 2020, Victoria, BC, Canada, November 18 - December 16, 2020. IEEE, pp 1–6, https://doi.org/10.1109/VTC2020-Fall49728.2020.9348588

  138. Xu H, Gao Y, Yu F, et al (2017) End-to-end learning of driving models from large-scale video datasets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, pp 3530–3538, https://doi.org/10.1109/CVPR.2017.376

  139. Yan F, Chen C, Xiao P et al (2022) Review of visual saliency prediction: Development process from neurobiological basis to deep models. Applied Sciences 12(1):309

    Article  Google Scholar 

  140. Yang D, Zhang H, Yurtsever E et al (2022) Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention. IEEE Trans Intell Transp Syst 7(2):221–230

    Article  Google Scholar 

  141. Yi P, Wang Z, Jiang K et al (2019) Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans Circ Syst Video Tech 30(8):2503–2516

    Article  Google Scholar 

  142. Yu C, Wang J, Peng C, et al (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Ferrari V, Hebert M, Sminchisescu C, et al (eds) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII, Lecture Notes in Computer Science, vol 11217. Springer, pp 334–349, https://doi.org/10.1007/978-3-030-01261-8_20

  143. Zahabi M, Pankok C Jr, Kaber DB et al (2017) On-road visual sign salience, driver attention allocation, and target detection accuracy. Transp Res Rec 2663(1):40–47

    Article  Google Scholar 

  144. Zatsarynna O, Abu Farha Y, Gall J (2021) Multi-modal temporal convolutional network for anticipating actions in egocentric videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2249–2258

  145. Zeng Y, Zhuge Y, Lu H, et al (2019) Joint learning of saliency detection and weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7223–7233

  146. Zhang G, Etemad A (2021) Capsule attention for multimodal eeg-eog representation learning with application to driver vigilance estimation. IEEE Trans Neural Syst Rehabil Eng 29:1138–1149

    Article  Google Scholar 

  147. Zhang H, Goodfellow I, Metaxas D, et al (2019) Self-attention generative adversarial networks. In: International conference on machine learning, PMLR, pp 7354–7363

  148. Zhang X, Park S, Beeler T, et al (2020) Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, Springer, pp 365–381

  149. Zhao L, Yang F, Bu L et al (2021) Driver behavior detection via adaptive spatial attention mechanism. Adv Eng Inform 48:101280

    Article  Google Scholar 

  150. Zhao S, Han G, Zhao Q et al (2020) Prediction of driver’s attention points based on attention model. Appl Sci 10(3):1083

  151. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp 3085–3094, https://doi.org/10.1109/CVPR.2019.00320, http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_Pyramid_Feature_Attention_Network_for_Saliency_Detection_CVPR_2019_paper.html

  152. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490

  153. Zhu D, Zhao D, Min X, et al (2021) Lavs: A lightweight audio-visual saliency prediction model. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Moradi.

Ethics declarations

Conflict of Interest

The authors declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moradi, M., Palazzo, S., Rundo, F. et al. Recent advancements in driver’s attention prediction. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19368-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19368-5

Keywords

Navigation