Towards Modelling an Attention-Based Text Localization Process

  • Antonio Clavelli
  • Dimosthenis Karatzas
  • Josep Lladós
  • Mario Ferraro
  • Giuseppe Boccignone
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7887)


This note introduces a visual attention model of text localization in real-world scenes. The core of the model built upon the proto-object concept is discussed. It is shown how such dynamic mid-level representation of the scene can be derived in the framework of an action-perception loop engaging salience, text information value computation, and eye guidance mechanisms.

Preliminary results that compare model generated scanpaths with those eye-tracked from human subjects are presented.


text localization visual attention eye guidance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boccignone, G., Ferraro, M.: Feed and fly control of visual scanpaths for foveation image processing. Annals of Telecommunications, 1–17 (2012),
  2. 2.
    Boccignone, G., Ferraro, M.: Gaze shift behavior on video as composite information foraging. Signal Processing: Image Communication, 1–18 (2012),
  3. 3.
    Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. PAMI 35(1), 135–207 (2013)CrossRefGoogle Scholar
  4. 4.
    Cerf, M., Frady, E., Koch, C.: Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision 9(12) (2009)Google Scholar
  5. 5.
    Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., Van de Weijer, J.: Eye tracking: a comprehensive guide to methods and measures. Oxford University Press, Oxford (2011)Google Scholar
  6. 6.
    Karaoglu, S., van Gemert, J., Gevers, T.: Object reading: Text recognition for object recognition. In: Proc. ECCV 2012 Workshop IFCVCR (2012)Google Scholar
  7. 7.
    Meng, Q., Song, Y.: Text detection in natural scenes with salient region. In: Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 384–388. IEEE Computer Society (2012)Google Scholar
  8. 8.
    Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Rensink, R.A.: The dynamic representation of scenes. Vis. Cognit. 7, 17–42 (2000)CrossRefGoogle Scholar
  10. 10.
    Schütz, A., Braun, D., Gegenfurtner, K.: Eye movements and perception: A selective review. Journal of Vision 11(5) (2011)Google Scholar
  11. 11.
    Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. Journal of Vision 9(12), 1–27 (2009)CrossRefGoogle Scholar
  12. 12.
    Shahab, A., Shafait, F., Dengel, A.: Bayesian approach to photo time-stamp recognition. In: Proc. ICDAR, pp. 1039–1043. IEEE (2011)Google Scholar
  13. 13.
    Shahab, A., Shafait, F., Dengel, A., Uchida, S.: How salient is scene text? In: Proc. 10th IAPR Int. Workshop on DAS, pp. 317–321. IEEE (2012)Google Scholar
  14. 14.
    Sumathi, C., Santhanam, T., Priya, N.: Techniques and challenges of automatic text extraction in complex images: a survey. J. Theor. Appl. Inf. Tech. 35(2) (2012)Google Scholar
  15. 15.
    Sun, Q., Lu, Y., Sun, S.: A visual attention based approach to text extraction. In: Proc. 20th ICPR, pp. 3991–3995. IEEE (2010)Google Scholar
  16. 16.
    Tatler, B., Hayhoe, M., Land, M., Ballard, D.: Eye guidance in natural vision: Reinterpreting salience. Journal of Vision 11(5) (2011)Google Scholar
  17. 17.
    Tatler, B., Vincent, B.: The prominence of behavioural biases in eye guidance. Visual Cognition 17(6-7), 1029–1054 (2009)CrossRefGoogle Scholar
  18. 18.
    Tipping, M.: Sparse bayesian learning and the relevance vector machine. The Journal of Machine Learning Research 1, 211–244 (2001)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Torralba, A., Oliva, A., Castelhano, M., Henderson, J.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review 113(4), 766 (2006)CrossRefGoogle Scholar
  20. 20.
    Wang, H., Pomplun, M.: The attraction of visual attention to texts in real-world scenes. Journal of Vision 12(6) (2012)Google Scholar
  21. 21.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proc. ICCV, pp. 1457–1464. IEEE (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Antonio Clavelli
    • 1
  • Dimosthenis Karatzas
    • 1
  • Josep Lladós
    • 1
  • Mario Ferraro
    • 2
  • Giuseppe Boccignone
    • 3
  1. 1.Computer Vision CenterUniversitat Autonoma de BarcelonaBellaterra (Cerdanyola)Spain
  2. 2.Dipartimento di FisicaUniversitá di TorinoTorinoItaly
  3. 3.Dipartimento di InformaticaUniversitá di MilanoItaly

Personalised recommendations