Abstract
Robust driver attention prediction for critical situations is a challenging computer vision problem, yet essential for autonomous driving. Because critical driving moments are so rare, collecting enough data for these situations is difficult with the conventional in-car data collection protocol—tracking eye movements during driving. Here, we first propose a new in-lab driver attention collection protocol and introduce a new driver attention dataset, Berkeley DeepDrive Attention (BDD-A) dataset, which is built upon braking event videos selected from a large-scale, crowd-sourced driving video dataset. We further propose Human Weighted Sampling (HWS) method, which uses human gaze behavior to identify crucial frames of a driving dataset and weights them heavily during model training. With our dataset and HWS, we built a driver attention prediction model that outperforms the state-of-the-art and demonstrates sophisticated behaviors, like attending to crossing pedestrians but not giving false alarms to pedestrians safely walking on the sidewalk. Its prediction results are nearly indistinguishable from ground-truth to humans. Although only being trained with our in-lab attention data, the model also predicts in-car driver attention data of routine driving with state-of-the-art accuracy. This result not only demonstrates the performance of our model but also proves the validity and usefulness of our dataset and data collection protocol.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alletto, S., Palazzi, A., Solera, F., Calderara, S., Cucchiara, R.: DR(eye)VE: a dataset for attention-based tasks with applications to autonomous and assisted driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 54–60 (2016)
Bazzani, L., Larochelle, H., Torresani, L.: Recurrent mixture density network for spatiotemporal visual attention. arXiv preprint arXiv:1603.08199 (2016)
Bruce, N., Tsotsos, J.: Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp. 155–162 (2006)
Bruce, N.D., Tsotsos, J.K.: Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9(3), 5–5 (2009)
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41, 740–757 (2018)
Cavanagh, P., Alvarez, G.A.: Tracking multiple targets with multifocal attention. Trends Cogn. Sci. 9(7), 349–354 (2005)
Cornelissen, F.W., Peters, E.M., Palmer, J.: The eyelink toolbox: eye tracking with matlab and the psychophysics toolbox. Behav. Res. Methods Instrum. Comput. 34(4), 613–617 (2002)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. arXiv preprint arXiv:1611.09571 (2016)
Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. J. Vis. 13(4), 11–11 (2013)
Fridman, L., Langhans, P., Lee, J., Reimer, B.: Driver gaze region estimation without use of eye movement. IEEE Intell. Syst. 31(3), 49–56 (2016)
Groner, R., Walder, F., Groner, M.: Looking at faces: local and global aspects of scanpaths. In: Advances in Psychology, vol. 22, pp. 523–533. Elsevier (1984)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2007)
Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kümmerer, M., Theis, L., Bethge, M.: Deep Gaze I: boosting saliency prediction with feature maps trained on ImageNet. In: International Conference on Learning Representations (ICLR 2015) (2015)
Kümmerer, M., Wallis, T.S., Bethge, M.: DeepGaze II: reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016)
Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 362–370 (2015)
Liu, Y., Zhang, S., Xu, M., He, X.: Predicting salient face in multiple-face videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4420–4428 (2017)
Mannan, S., Ruddock, K., Wooding, D.: Fixation sequences made during visual examination of briefly presented 2D images. Spat. Vis. 11(2), 157–178 (1997)
Murray, N., Vanrell, M., Otazu, X., Parraga, C.A.: Saliency estimation using a non-parametric low-level vision model. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 433–440. IEEE (2011)
Palazzi, A., Solera, F., Calderara, S., Alletto, S., Cucchiara, R.: Learning where to attend like a human driver. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 920–925. IEEE (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. IEEE (2017)
Rizzolatti, G., Riggio, L., Dascola, I., Umiltá, C.: Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia 25(1), 31–40 (1987)
Simon, L., Tarel, J.P., Brémond, R.: Alerting the drivers about road signs with poor visual saliency. In: 2009 IEEE Intelligent Vehicles Symposium, pp. 48–53. IEEE (2009)
Tawari, A., Kang, B.: A computational framework for driver’s visual attention using a fully convolutional architecture. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 887–894. IEEE (2017)
Thomas, C.L.: OpenSalicon: an open source implementation of the salicon saliency model. Technical report. TR-2016-02, University of Pittsburgh (2016)
Underwood, G., Humphrey, K., Van Loon, E.: Decisions about objects in real-world scenes are influenced by visual saliency before and during their inspection. Vis. Res. 51(18), 2031–2038 (2011)
Valenti, R., Sebe, N., Gevers, T.: Image saliency by isocentric curvedness and color. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2185–2192. IEEE (2009)
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic Saliency Using Background Priors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 29–42. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_3
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Yu, F., et al.: BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
Zhang, J., Sclaroff, S.: Saliency detection: a boolean map approach. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 153–160. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K., Whitney, D. (2019). Predicting Driver Attention in Critical Situations. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-20873-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)