Abstract
Ensuring the precise anticipation of a driver’s attention is crucial for upholding safety in diverse human-centric transportation scenarios. This capability proves invaluable for discerning and evaluating accident risks in driver assistance systems, as well as in autonomous driving scenarios. Beyond traditional approaches that consider contextual variables and non-visual cues, such as the driver’s mental state and brain activity patterns, recent groundbreaking advancements in deep learning-based computer vision have propelled the momentum of visual attention prediction. The fundamental concept behind these techniques is to replicate the human visual system’s ability to identify critical objects or regions capturing a driver’s attention. Notably, the integration of salient object detection and attention mechanisms has gained popularity across various application domains. This paper provides a comprehensive overview of the current state of the art in this field. The primary focus of the study is to explore the architectures and workflows of deep learning-based models dedicated to the attention prediction process. To achieve this, we examine recent works in the field, specifically selecting those that have leveraged (simulated) attention mechanisms to estimate the driver’s attention area. Our analysis delves into the contributions of each model, elucidating their architectural design and underlying organization. Additionally, we scrutinize the effectiveness of each model by explaining the reported experimental results.
Similar content being viewed by others
Data Availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Ablaßmeier M, Poitschke T, Wallhoff F, et al (2007) Eye gaze studies comparing head-up and head-down displays in vehicles. In: 2007 IEEE International Conference on Multimedia and Expo, IEEE, pp 2250–2252
Aksoy E, Yazici A, Kasap M (2020) See, attend and brake: An attention-based saliency map prediction model for end-to-end driving. CoRR abs/2002.11020. https://arxiv.org/abs/2002.11020, 2002.11020
Almahasneh H, Chooi WT, Kamel N et al (2014) Deep in thought while driving: An eeg study on drivers’ cognitive distraction. Transportation research part F: traffic psychology and behaviour 26:218–226
Ameyoe A, Mars F, Chevrel P et al (2015) Estimation of driver distraction using the prediction error of a cybernetic driver model. DSC, Simulation Design and Architecture, Germany
Araluce J, Bergasa LM, Ocaña M, et al (2022) Aragan: A driver attention estimation model based on conditional generative adversarial network. In: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 1066–1072
Azman A, Meng Q, Edirisinghe E (2010) Non intrusive physiological measurement for driver cognitive distraction detection: Eye and mouth movements. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), IEEE, pp V3–595
Bach KM, Jæger MG, Skov MB, et al (2009) Interacting with in-vehicle systems: understanding, measuring, and evaluating attention. In: Proceedings of the 2009 British Computer Society Conference on Human-Computer Interaction, BCS-HCI 2009, Cambridge, United Kingdom, 1-5 September 2009. ACM, pp 453–462, https://dl.acm.org/citation.cfm?id=1671070
Baee S, Pakdamanian E, Kim I, et al (2021) Medirl: Predicting the visual attention of drivers via maximum entropy deep inverse reinforcement learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13178–13188
Bao W, Yu Q, Kong Y (2021) Drive: Deep reinforced accident anticipation with visual explanation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 7619–7628
Bazzani L, Larochelle H, Torresani L (2016) Recurrent mixture density network for spatiotemporal visual attention. arXiv preprint arXiv:1603.08199
Borji A, Sihite DN, Itti L (2012) Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Trans Image Process 22(1):55–69
Borji A, Tavakoli HR, Sihite DN, et al (2013) Analysis of scores, datasets, and models in visual saliency prediction. In: Proceedings of the IEEE international conference on computer vision, pp 921–928
Brunkhorst-Kanaan N, Libutzki B, Reif A et al (2021) Adhd and accidents over the life span-a systematic review. Neurosci Biobehav Rev 125:582–591
Bylinskii Z, Judd T, Oliva A et al (2019) What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41(3):740–757. https://doi.org/10.1109/TPAMI.2018.2815601
Cai J, Hu J, Tang X et al (2020) Deep historical long short-term memory network for action recognition. Neurocomputing 407:428–438
Chaabane M, Trabelsi A, Blanchard N, et al (2020) Looking ahead: Anticipating pedestrians crossing with future frames prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2297–2306
Chang Z, Zhang X, Wang S, et al (2022) Stam: A spatiotemporal attention based memory for video prediction. IEEE Transactions on Multimedia
Chen L, Papandreou G, Schroff F, et al (2017) Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. https://arxiv.org/abs/1706.05587, 1706.05587
Chen Y, Nan Z (2022) Inspired by norbert wiener: Feedback loop network learning incremental knowledge for driver attention prediction and beyond. CoRR abs/2212.02096. https://doi.org/10.48550/arXiv.2212.02096,2212.02096ff
Chen Y, Wang J, Li J, et al (2018) Lidar-video driving dataset: Learning driving policies effectively. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5870–5878, https://doi.org/10.1109/CVPR.2018.00615
Cheng F, Bertasius G (2022) Tallformer: Temporal action localization with a long-memory transformer. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIV, Springer, pp 503–521
Choi J, Chun D, Kim H, et al (2019) Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 502–511
Cognolato M, Atzori M, Müller H (2018) Head-mounted eye gaze tracking devices: An overview of modern devices and recent advances. J Rehabil Assist Technol Eng 5:2055668318773991
Cornia M, Baraldi L, Serra G, et al (2016) A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 3488–3493
Cornia M, Baraldi L, Serra G et al (2018) Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans Image Process 27(10):5142–5154
Cvahte Ojsteršek T, Topolšek D (2019) Influence of drivers’ visual and cognitive attention on their perception of changes in the traffic environment. Eur Transp Res Rev 11(1):1–9
Dai R, Minciullo L, Garattoni L, et al (2019) Self-attention temporal convolutional network for long-term daily living activity detection. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp 1–7
Deng J, Dong W, Socher R, et al (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Deng T, Yan H, Qin L et al (2020) How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks. IEEE Trans Intell Transp Syst 21(5):2146–2154. https://doi.org/10.1109/TITS.2019.2915540
Deng T, Yan F, Yan H (2021) Driving video fixation prediction model via spatio-temporal networks and attention gates. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6
Droste R, Jiao J, Noble JA (2020) Unified image and video saliency modeling. In: Vedaldi A, Bischof H, Brox T, et al (eds) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V, Lecture Notes in Computer Science, vol 12350. Springer, pp 419–435, https://doi.org/10.1007/978-3-030-58558-7_25
Fang F, He S (2005) Cortical responses to invisible objects in the human dorsal and ventral pathways. Nat Neurosci 8(10):1380–1385
Fang J, Yan D, Qiao J et al (2019) Dada-2000: Can driving accident be predicted by driver attentionf analyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, pp 4303–4309
Fang J, Yan D, Qiao J et al (2021) Dada: Driver attention prediction in driving accident scenarios. IEEE Trans Intell Transp Syst 23(6):4959–4971
Fu R, Huang T, Li M et al (2023) A multimodal deep neural network for prediction of the driver’s focus of attention based on anthropomorphic attention mechanism and prior knowledge. Expert Syst Appl 214:119157
Fu Z, Liu Q, Fu Z, et al (2021) Stmtrack: Template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13774–13783
Gan S, Li Q, Wang Q, et al (2021) Constructing personalized situation awareness dataset for hazard perception, comprehension, projection, and action of drivers. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, Indianapolis, IN, USA, September 19-22, 2021. IEEE, pp 1697–1704, https://doi.org/10.1109/ITSC48978.2021.9564543
Gan S, Pei X, Ge Y et al (2022) Multisource adaption for driver attention prediction in arbitrary driving scenes. IEEE Trans Intell Transp Syst 23(11):20912–20925
Giang WC, Shanti I, Chen HYW, et al (2015) Smartwatches vs. smartphones: A preliminary report of driver behavior and perceived risk while responding to notifications. In: Proceedings of the 7th international conference on automotive user interfaces and interactive vehicular applications, pp 154–161
Girma A, Amsalu S, Workineh A, et al (2020) Deep learning with attention mechanism for predicting driver intention at intersection. In: 2020 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 1183–1188
Gou C, Zhou Y, Li D (2022) Driver attention prediction based on convolution and transformers. J Supercomput 78(6):8268–8284
Hansen JH, Busso C, Zheng Y et al (2017) Driver modeling for detection and assessment of driver distraction: Examples from the utdrive test bed. IEEE Signal Process Mag 34(4):130–142
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. Adv Neural Inf Process Syst 19
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, Springer, pp 630–645
Hoehl S, Striano T (2010) The development of emotional face and eye gaze processing. Dev Sci 13(6):813–825
Hou X, Harel J, Koch C (2011) Image signature: Highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201
Hu Y, Lu M, Lu X (2020) Feature refinement for image-based driver action recognition via multi-scale attention convolutional neural network. Signal Process Image Commun 81:115697
Hu Z, Lv C, Hang P et al (2021) Data-driven estimation of driver attention using calibration-free eye gaze and scene features. IEEE Trans Ind Electron 69(2):1800–1808
Hu Z, Zhang Y, Li Q et al (2022) A novel heterogeneous network for modeling driver attention with multi-level visual content. IEEE Trans Intell Transp Syst 23(12):24343–24354
Huang T, Fu R (2022) Driver distraction detection based on the true driver’s focus of attention. IEEE Trans Intell Transp Syst 23(10):19374–19386. https://doi.org/10.1109/TITS.2022.3166208
Huang T, Fu R (2022) Prediction of the driver’s focus of attention based on feature visualization of a deep autonomous driving model. Knowl Based Syst 251:109006. https://doi.org/10.1016/j.knosys.2022.109006
Huang T, Zhao S, Geng L et al (2019) Unsupervised monocular depth estimation based on residual neural network of coarse-refined feature extractions for drone. Electronics 8(10):1179
Huang X, Shen C, Boix X, et al (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 262–270
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach FR, Blei DM (eds) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR Workshop and Conference Proceedings, vol 37. JMLR.org, pp 448–456, http://proceedings.mlr.press/v37/ioffe15.html
Isola P, Zhu JY, Zhou T, et al (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Itier RJ, Batty M (2009) Neural bases of eye and gaze processing: the core of social cognition. Neurosci Biobehav Rev 33(6):843–863
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach. Intell 20(11):1254–1259
Jegham I, Khalifa AB, Alouani I et al (2020) Soft spatial attention-based multimodal driver action recognition using deep learning. IEEE Sens J 21(2):1918–1925
Jocher G, Chaurasia A, Stoken A, et al (2022) ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. https://doi.org/10.5281/zenodo.7347926
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. MIT computer science and artificial intelligence laboratory technical report (Retrieved from https://www.hdlhandlenet/17211/68590)
Kalda K, Pizzagalli SL, Soe RM et al (2022) Language of driving for autonomous vehicles. Appl Sci 12(11):5406
Kang B, Lee Y (2020) High-resolution neural network for driver visual attention prediction. Sensors 20(7):2030
Kang B, Lee Y (2021) A driver’s visual attention prediction using optical flow. Sensors 21(11):3722
Kasahara I, Stent S, Park HS (2022) Look both ways: Self-supervising driver gaze estimation and road scene saliency. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, Springer, pp 126–142
Kay W, Carreira J, Simonyan K, et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950
Kellnhofer P, Recasens A, Stent S, et al (2019) Gaze360: Physically unconstrained gaze estimation in the wild. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, pp 6911–6920, https://doi.org/10.1109/ICCV.2019.00701
Kim J, Rohrbach A, Darrell T, et al (2018) Textual explanations for self-driving vehicles. In: Proceedings of the European conference on computer vision (ECCV), pp 563–578
Kim J, Ma M, Kim K, et al (2019) Progressive attention memory network for movie story question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8337–8346
Kotseruba I, Tsotsos JK (2021) Behavioral research and practical models of drivers’ attention. CoRR abs/2104.05677. https://arxiv.org/abs/2104.05677,2104.05677
Kouchak SM, Gaffar A (2020) Detecting driver behavior using stacked long short term memory network with attention layer. IEEE Trans Intell Transp Syst 22(6):3420–3429
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6):84–90
Kroner A, Senden M, Driessens K et al (2020) Contextual encoder-decoder network for visual saliency prediction. Neural Netw 129:261–270. https://doi.org/10.1016/j.neunet.2020.05.004
Kümmerer M, Wallis T, Bethge M (2014) How close are we to understanding image-based saliency? arXiv preprint arXiv:1409.7686
Kümmerer M, Wallis TS, Bethge M (2015) Information-theoretic model comparison unifies saliency metrics. Proc Natl Acad Sci 112(52):16054–16059
Kwon YH, Park MG (2019) Predicting future frames using retrospective cycle gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1811–1820
Le Meur O, Le Callet P, Barba D (2007) Predicting visual fixations on video based on low-level visual features. Vis Res 47(19):2483–2498
Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In: European conference on computer vision, Springer, pp 702–716
Li J, Levine MD, An X et al (2012) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010
Li Q, Liu C, Chang F, et al (2022) Adaptive short-temporal induced aware fusion network for predicting attention regions like a driver. IEEE Trans Intell Transp Syst
Li T, Zhang Y, Li Q et al (2022) Ab-dlm: An improved deep learning model based on attention mechanism and bifpn for driver distraction behavior detection. IEEE Access 10:83138–83151
Lin L, Li W, Bi H et al (2021) Vehicle trajectory prediction using lstms with spatial-temporal attention mechanisms. IEEE Intell Transp Syst Mag 14(2):197–208
Lin Y, Cao D, Fu Z et al (2022) A lightweight attention-based network towards distracted driving behavior recognition. Appl Sci 12(9):4191
Linardos P, Mohedano E, Nieto JJ, et al (2019) Simple vs complex temporal recurrences for video saliency prediction. arXiv preprint arXiv:1907.01869
Lisheng J, Bingdong J, Baicang G et al (2022) Mtsf: Multi-scale temporal-spatial fusion network for driver attention prediction. Available at SSRN: https://www.ssrncom/abstract=4167535
Liu Z, Mao H, Wu CY, et al (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11976–11986
Lou J, Lin H, Marshall D, et al (2021) Transalnet: Visual saliency prediction using transformers. CoRR abs/2110.03593. https://arxiv.org/abs/2110.03593, 2110.03593
Lv K, Sheng H, Xiong Z et al (2020) Improving driver gaze prediction with reinforced attention. IEEE Trans Multimedia 23:4198–4207
Magán E, Sesmero MP, Alonso-Weber JM et al (2022) Driver drowsiness detection by applying deep learning techniques to sequences of images. Appl Sci 12(3):1145
Min K, Corso JJ (2019) Tased-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2394–2403
Min X, Zhai G, Zhou J et al (2020) A multimodal saliency model for videos with high audio-visual correspondence. IEEE Trans Image Process 29:3805–3819
Oktay O, Schlemper J, Folgoc LL, et al (2018) Attention u-net: Learning where to look for the pancreas. CoRR abs/1804.03999. http://arxiv.org/abs/1804.03999, 1804.03999
Ortiz C, Ortiz-Peregrina S, Castro J et al (2018) Driver distraction by smartphone use (whatsapp) in different age groups. Accid Anal Prev 117:239–249
Palazzi A, Abati D, Solera F et al (2018) Predicting the driver’s focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733
Pan J, Ferrer CC, McGuinness K, et al (2017) Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081
Pan X, Ge C, Lu R, et al (2022) On the integration of self-attention and convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, pp 805–815, https://doi.org/10.1109/CVPR52688.2022.00089
Pandey NN, Muppalaneni NB (2022) A survey on visual and non-visual features in driver’s drowsiness detection. Multimed Tools Appl 81(26):38175–38215
Pang Y, Zhao X, Zhang L, et al (2020) Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9413–9422
Pele O, Werman M (2008) A linear time histogram metric for improved SIFT matching. In: Forsyth DA, Torr PHS, Zisserman A (eds) Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III, Lecture Notes in Computer Science, vol 5304. Springer, pp 495–508, https://doi.org/10.1007/978-3-540-88690-7_37
Peng X, Zhao A, Wang S, et al (2019) Attention-driven driving maneuver detection system. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Perlman D, Samost A, Domel AG et al (2019) The relative impact of smartwatch and smartphone use while driving on workload, attention, and driving performance. Appl Ergon 75:8–16
Peters RJ, Iyer A, Itti L et al (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416
Posner MI (1980) Orienting of attention. Q J Exp Psychol 32(1):3–25
Pradhan AK, Hammel KR, DeRamus R et al (2005) Using eye movements to evaluate effects of driver age on risk perception in a driving simulator. Hum Factors 47(4):840–852
Qin X, Zhang Z, Huang C et al (2020) U2-net: Going deeper with nested u-structure for salient object detection. Pattern Recognit 106:107404
Ramos J (2022) Autonomous vehicles and accidents: are they safer than vehicles operated by drivers? https://tomorrow.city/a/self-driving-car-accident-rate
Reddy N, Jain S, Yarlagadda P, et al (2020) Tidying deep saliency prediction architectures. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021. IEEE, pp 10241–10247, https://doi.org/10.1109/IROS45743.2020.9341574
Rong Y, Kassautzki NR, Fuhl W, et al (2022) Where and what: Driver attention-based object detection. Proc ACM Hum Comput Interact 6(ETRA):1–22
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99
Rundo F, Spampinato C, Battiato S, et al (2020) Advanced 1d temporal deep dilated convolutional embedded perceptual system for fast car-driver drowsiness monitoring. In: 2020 AEIT International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE), IEEE, pp 1–6
Shao H, Wang L, Chen R, et al (2022) Safety-enhanced autonomous driving using interpretable sensor fusion transformer. arXiv preprint arXiv:2207.14024
Sharma S, Kumar V (2023) Distracted driver detection using learning representations. Multimed Tools Appl pp 1–18
Shi B, Dong W, Zhan Z (2022) Adafi-fcn: an adaptive feature integration fully convolutional network for predicting driver’s visual attention. Geo-spatial Information Science pp 1–17
Son J, Park M, Park BB (2015) The effect of age, gender and roadway environment on the acceptance and effectiveness of advanced driver assistance systems. Transportation research part F: traffic psychology and behaviour 31:12–24
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32. https://doi.org/10.1007/BF00130487
Takeda Y, Sato T, Kimura K et al (2016) Electrophysiological evaluation of attention in drivers and passengers: Toward an understanding of drivers’ attentional state in autonomous vehicles. Transportation research part F: traffic psychology and behaviour 42:140–150
Tavakoli HR, Borji A, Rahtu E, et al (2019) Dave: A deep audio-visual embedding for dynamic saliency prediction. arXiv preprint arXiv:1905.10693
Tawari A, Kang B (2017) A computational framework for driver’s visual attention using a fully convolutional architecture. In: 2017 IEEE Intelligent Vehicles Symposium (IV), IEEE, pp 887–894
Tian H, Deng T, Yan H (2022) Driving as well as on a sunny day? predicting driver’s fixation in rainy weather conditions via a dual-branch visual model. IEEE/CAA J Autom Sin 9(7):1335–1338
Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tseng PH, Carmi R, Cameron IG et al (2009) Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis 9(7):4–4
Ulrich L, Nonis F, Vezzetti E et al (2021) Can adas distract driver’s attention? an rgb-d camera and deep learning-based analysis. Appl Sci 11(24):11587
Kastner S, Ungerleider LG (2000) Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23(1):315–341
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402
Wang W, Shen J, Xie J et al (2019) Revisiting video saliency prediction in the deep learning era. IEEE Trans Pattern Anal Mach Intell 43(1):220–237
Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417
Webb BS, Dhruv NT, Solomon SG et al (2005) Early and late mechanisms of surround suppression in striate cortex of macaque. J Neurosci 25(50):11666–11675
Woo S, Park J, Lee JY, et al (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Woodman GF, Luck SJ (2003) Serial deployment of attention during visual search. J Exp Psychol: Hum Percept. Perform 29(1):121
Wu CY, Krahenbuhl P (2021) Towards long-form video understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1884–1894
Wu CY, Li Y, Mangalam K, et al (2022) Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13587–13597
Wundersitz L (2019) Driver distraction and inattention in fatal and injury crashes: Findings from in-depth road crash data. Traffic Inj Prev 20(7):696–701
Xia Y, Zhang D, Kim J, et al (2018) Predicting driver attention in critical situations. In: Asian conference on computer vision, Springer, pp 658–674
Xie C, Xia C, Ma M, et al (2022) Pyramid grafting network for one-stage high resolution saliency detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, pp 11707–11716, https://doi.org/10.1109/CVPR52688.2022.01142
Xu F, Li J, Yuan Q, et al (2020) A dual-attention-based neural network for see-through driving decision. In: 92nd IEEE Vehicular Technology Conference, VTC Fall 2020, Victoria, BC, Canada, November 18 - December 16, 2020. IEEE, pp 1–6, https://doi.org/10.1109/VTC2020-Fall49728.2020.9348588
Xu H, Gao Y, Yu F, et al (2017) End-to-end learning of driving models from large-scale video datasets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, pp 3530–3538, https://doi.org/10.1109/CVPR.2017.376
Yan F, Chen C, Xiao P et al (2022) Review of visual saliency prediction: Development process from neurobiological basis to deep models. Applied Sciences 12(1):309
Yang D, Zhang H, Yurtsever E et al (2022) Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention. IEEE Trans Intell Transp Syst 7(2):221–230
Yi P, Wang Z, Jiang K et al (2019) Multi-temporal ultra dense memory network for video super-resolution. IEEE Trans Circ Syst Video Tech 30(8):2503–2516
Yu C, Wang J, Peng C, et al (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Ferrari V, Hebert M, Sminchisescu C, et al (eds) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII, Lecture Notes in Computer Science, vol 11217. Springer, pp 334–349, https://doi.org/10.1007/978-3-030-01261-8_20
Zahabi M, Pankok C Jr, Kaber DB et al (2017) On-road visual sign salience, driver attention allocation, and target detection accuracy. Transp Res Rec 2663(1):40–47
Zatsarynna O, Abu Farha Y, Gall J (2021) Multi-modal temporal convolutional network for anticipating actions in egocentric videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2249–2258
Zeng Y, Zhuge Y, Lu H, et al (2019) Joint learning of saliency detection and weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7223–7233
Zhang G, Etemad A (2021) Capsule attention for multimodal eeg-eog representation learning with application to driver vigilance estimation. IEEE Trans Neural Syst Rehabil Eng 29:1138–1149
Zhang H, Goodfellow I, Metaxas D, et al (2019) Self-attention generative adversarial networks. In: International conference on machine learning, PMLR, pp 7354–7363
Zhang X, Park S, Beeler T, et al (2020) Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, Springer, pp 365–381
Zhao L, Yang F, Bu L et al (2021) Driver behavior detection via adaptive spatial attention mechanism. Adv Eng Inform 48:101280
Zhao S, Han G, Zhao Q et al (2020) Prediction of driver’s attention points based on attention model. Appl Sci 10(3):1083
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp 3085–3094, https://doi.org/10.1109/CVPR.2019.00320, http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_Pyramid_Feature_Attention_Network_for_Saliency_Detection_CVPR_2019_paper.html
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490
Zhu D, Zhao D, Min X, et al (2021) Lavs: A lightweight audio-visual saliency prediction model. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Moradi, M., Palazzo, S., Rundo, F. et al. Recent advancements in driver’s attention prediction. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19368-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19368-5