Problems with Saliency Maps

  • Giuseppe Boccignone
  • Vittorio CuculoEmail author
  • Alessandro D’Amelio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11752)


Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures.


Saliency model Visual attention Gaze deployment 


  1. 1.
    Anderson, N.C., Anderson, F., Kingstone, A., Bischof, W.F.: A comparison of scanpath comparison methods. Behav. Res. Methods 47(4), 1377–1392 (2015)CrossRefGoogle Scholar
  2. 2.
    Anderson, N.C., Bischof, W.F., Laidlaw, K.E., Risko, E.F., Kingstone, A.: Recurrence quantification analysis of eye movements. Behav. Res. Methods 45(3), 842–856 (2013)CrossRefGoogle Scholar
  3. 3.
    Boccignone, G., Ferraro, M.: Modelling gaze shift as a constrained random walk. Phys. A: Stat. Mech. Appl. 331(1–2), 207–218 (2004)CrossRefGoogle Scholar
  4. 4.
    Boccignone, G., Ferraro, M.: Ecological sampling of gaze shifts. IEEE Trans. Cybern. 44(2), 266–279 (2014)CrossRefGoogle Scholar
  5. 5.
    Boccignone, G., Cuculo, V., D’Amelio, A., Grossi, G., Lanzarotti, R.: Give ear to my face: modelling multimodal attention to social interactions. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11130, pp. 331–345. Springer, Cham (2019). Scholar
  6. 6.
    Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)CrossRefGoogle Scholar
  7. 7.
    Bruce, N.D., Wloka, C., Frosst, N., Rahman, S., Tsotsos, J.K.: On computational modeling of visual saliency: examining what’s right, and what’s left. Vis. Res. 116, 95–112 (2015)CrossRefGoogle Scholar
  8. 8.
    Bylinskii, Z., DeGennaro, E., Rajalingham, R., Ruda, H., Zhang, J., Tsotsos, J.: Towards the quantitative evaluation of visual attention models. Vis. Res. 116, 258–268 (2015)CrossRefGoogle Scholar
  9. 9.
    Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2019)CrossRefGoogle Scholar
  10. 10.
    Bylinskii, Z., Recasens, A., Borji, A., Oliva, A., Torralba, A., Durand, F.: Where should saliency models look next? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 809–824. Springer, Cham (2016). Scholar
  11. 11.
    Cerf, M., Frady, E., Koch, C.: Faces and text attract gaze independent of thetask: experimental data and computer model. J. Vis. 9(12), 10 (2009)CrossRefGoogle Scholar
  12. 12.
    Clavelli, A., Karatzas, D., Lladós, J., Ferraro, M., Boccignone, G.: Modelling task-dependent eye guidance to objects in pictures. Cogn. Comput. 6(3), 558–584 (2014)CrossRefGoogle Scholar
  13. 13.
    Coutrot, A., Guyader, N.: An efficient audiovisual saliency model to predict eye positions when looking at conversations. In: 23rd European Signal Processing Conference, pp. 1531–1535, August 2015Google Scholar
  14. 14.
    Coutrot, A., Guyader, N.: How saliency, faces, and sound influence gaze in dynamic social scenes. J. Vis. 14(8), 5 (2014)CrossRefGoogle Scholar
  15. 15.
    Cristino, F., Mathôt, S., Theeuwes, J., Gilchrist, I.D.: Scanmatch: a novel method for comparing fixation sequences. Behav. Res. Methods 42(3), 692–700 (2010)CrossRefGoogle Scholar
  16. 16.
    Cuculo, V., D’Amelio, A., Lanzarotti, R., Boccignone, G.: Personality gaze patterns unveiled via automatic relevance determination. In: Mazzara, M., Ober, I., Salaün, G. (eds.) STAF 2018. LNCS, vol. 11176, pp. 171–184. Springer, Cham (2018). Scholar
  17. 17.
    Egeth, H.E., Yantis, S.: Visual attention: control, representation, and time course. Annu. Rev. Psychol. 48(1), 269–297 (1997)CrossRefGoogle Scholar
  18. 18.
    Furnari, A., Farinella, G.M., Battiato, S.: An experimental analysis of saliency detection with respect to three saliency levels. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 806–821. Springer, Cham (2015). Scholar
  19. 19.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998)CrossRefGoogle Scholar
  20. 20.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE 12th International conference on Computer Vision, pp. 2106–2113. IEEE (2009)Google Scholar
  21. 21.
    Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4(4), 219–27 (1985)Google Scholar
  22. 22.
    Kong, P., Mancas, M., Thuon, N., Kheang, S., Gosselin, B.: Do deep-learning saliency models really model saliency? In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2331–2335. IEEE (2018)Google Scholar
  23. 23.
    Kümmerer, M., Wallis, T.S., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. 112(52), 16054–16059 (2015)CrossRefGoogle Scholar
  24. 24.
    Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)Google Scholar
  25. 25.
    Le Meur, O., Baccino, T.: Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav. Res. Methods 45(1), 251–266 (2013)CrossRefGoogle Scholar
  26. 26.
    Le Meur, O., Coutrot, A.: Introducing context-dependent and spatially-variant viewing biases in saccadic models. Vis. Res. 121, 72–84 (2016)CrossRefGoogle Scholar
  27. 27.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)Google Scholar
  28. 28.
    Napoletano, P., Boccignone, G., Tisato, F.: Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy. IEEE Trans. Image Process. 24(11), 3266–3281 (2015)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Nguyen, T.V., Zhao, Q., Yan, S.: Attentive systems: a survey. Int. J. Comput. Vis. 126(1), 86–110 (2018)CrossRefGoogle Scholar
  30. 30.
    Schütt, H.H., Rothkegel, L.O., Trukenbrod, H.A., Engbert, R., Wichmann, F.A.: Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. J. Vis. 19(3), 1 (2019)CrossRefGoogle Scholar
  31. 31.
    Tatler, B.W., Baddeley, R.J., Gilchrist, I.D.: Visual correlates of fixation selection: effects of scale and time. Vis. Res. 45(5), 643–659 (2005)CrossRefGoogle Scholar
  32. 32.
    Tatler, B.: The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7(14), 4 (2007)CrossRefGoogle Scholar
  33. 33.
    Tatler, B., Hayhoe, M., Land, M., Ballard, D.: Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 5 (2011)CrossRefGoogle Scholar
  34. 34.
    Tatler, B., Vincent, B.: The prominence of behavioural biases in eye guidance. Vis. Cogn. 17(6–7), 1029–1054 (2009)CrossRefGoogle Scholar
  35. 35.
    Tavakoli, H.R., Borji, A., Anwer, R.M., Rahtu, E., Kannala, J.: Bottom-up attention guidance for recurrent image recognition. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3004–3008. IEEE (2018)Google Scholar
  36. 36.
    Torralba, A., Oliva, A., Castelhano, M., Henderson, J.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766 (2006)CrossRefGoogle Scholar
  37. 37.
    Xia, C., Han, J., Qi, F., Shi, G.: Predicting human saccadic scanpaths based on iterative representation learning. IEEE Trans. Image Process. 1 (2019)Google Scholar
  38. 38.
    Zhang, J., Malmberg, F., Sclaroff, S.: Visual Saliency: From Pixel-Level to Object-Level Analysis. Springer, Cham (2019). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.PHuSe Lab - Dipartimento di InformaticaUniversity of MilanMilanItaly

Personalised recommendations