Multimedia Tools and Applications

, Volume 42, Issue 1, pp 31–56 | Cite as

Crossing textual and visual content in different application scenarios

  • Julien Ah-Pine
  • Marco Bressan
  • Stephane Clinchant
  • Gabriela Csurka
  • Yves Hoppenot
  • Jean-Michel Renders


This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.


Text-image information processing Trans-media similarities Cross-content information retrieval and browsing Image auto-annotation Multimedia document generation 



The authors want to thank particularly INA for their contributions in our work and Florent Perronin for his greatly appreciated help in applying some of the Generic Visual Categorizer (GVC) components. We would like also to acknowledge the following Flickr users whose photographs we reproduced here under Creative Common licences:

Tatiana Sapateiro

Pedro Paulo Silva de Souza

Leonardo Pallotta

Laszlo Ilyes

Jorge Wagner


Scott Robinson

Gabriel Flores Romero

Jenny Mealing


David Katarina

T. Chu

Bill Wilcox


Fred Hsu

Abel Pardo López



Elena Heredero

Rick McCharles

Marília Almeida

Gustavo Madico

Douglas Fernandes

James Preston

Rodrigo Della Fávera

Dinesh Rao

Marina Campos Vinhal

Jorge Gobbi

Steve Taylor

Finally would like also to acknowledge the users who wrote the blog paragraphs were used and reproduced here. These texts can be found at the folloing addresses:


  1. 1.
    Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) Xrce’s participation to imageclefphoto 2008. In: Working Notes of the 2008 CLEF Workshop, Aarhus, 17–19 September 2008Google Scholar
  2. 2.
    Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135zbMATHCrossRefGoogle Scholar
  3. 3.
    Blei D, Michael, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR, Toronto, 28 July–1 August 2003Google Scholar
  4. 4.
    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, Melbourne, 24–28 August 1998Google Scholar
  5. 5.
    Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: ECCV, Prague, 11–14 May 2004Google Scholar
  6. 6.
    Chang Y-C, Chen H-H (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In: CLEF 2006 Working NotesGoogle Scholar
  7. 7.
    Clinchant S, Renders J, Csurka G (2007) Xrce’s participation to imageclefphoto 2007. In: Working Notes of the 2007 CLEF Workshop.
  8. 8.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, Prague, May 2004Google Scholar
  9. 9.
    Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proc. of the 14th international world wide web conference, Chiba, 10–14 May 2005Google Scholar
  10. 10.
    Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, Copenhagen, 27 May–2 June 2002Google Scholar
  11. 11.
    Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, Washington, DC, 27 June–2 July 2004Google Scholar
  12. 12.
    Flickr (2007) Flickr homepage.
  13. 13.
    Footstops (2007) Footstops homepage.
  14. 14.
    Grubinger M, Clough P, Hanbury A, Müller H (2007) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Working notes of the 2007 CLEF workshop
  15. 15.
    Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM multimedia, Singapore, 6–11 November 2005Google Scholar
  16. 16.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, Toronto, 28 July–1 August 2003Google Scholar
  17. 17.
    Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: NIPS, Vancouver, 13 December 2003Google Scholar
  18. 18.
    Li X, Chen L, Zhang L, Lin F, ying Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proc. of the 14th annual ACM international conference on multimedia (MM06), Santa Barbara, 23–27 October 2006Google Scholar
  19. 19.
    Li L-J, Wang G, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  20. 20.
    Li J, Wang JZ (2005) Alip: The automatic linguistic indexing of pictures system. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)—vol 2. IEEE Computer Society, Washington, DC, pp. 1208–1209Google Scholar
  21. 21.
    Maillot N, Chevallet J-P, Valea V, Lim JH (2006) Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In: CLEF 2006 Working NotesGoogle Scholar
  22. 22.
    Monay F, Gatica-Perez D (2004) Plsa-based image auto-annotation: constraining the latent space. In: ACM MM, New York, 10–16 October 2004Google Scholar
  23. 23.
    Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management, Orlando, October 1999Google Scholar
  24. 24.
    Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph-based automatic image captioning. In: CVPR workshop on multimedia data and document engineering, Washington, DC, July 2004Google Scholar
  25. 25.
    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  26. 26.
    Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: CVPR, Minneapolis, 18–23 June 2007Google Scholar
  27. 27.
    Realtravel (2007) Realtravel homepage.
  28. 28.
    Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Kluwer, DeventerGoogle Scholar
  29. 29.
    Tao T, Zhai C (2006) Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, 6–11 August 2006Google Scholar
  30. 30.
    Travbuddy (2007) Travbuddy homepage.
  31. 31.
    Travelblog (2007) Travelblog homepage.
  32. 32.
    Travellerspoint (2007) Travellerspoint homepage.
  33. 33.
    Travelpod (2007) Travelpod homepage.
  34. 34.
    Trippert (2007) Trippert homepage.
  35. 35.
    Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth international symposium on independent component analysis and blind source separation, Nara, 1–4 April 2003Google Scholar
  36. 36.
    Wang X, Zhang L, Jing W-YMF (2006) Annosearch: image auto-annotation by search. In: CVPR, New York, 17–22 June 2006Google Scholar
  37. 37.
    Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proc. of ACM multimedia workshop on multimedia information retrieval (MIR05), Singapore, 11–12 November 2005Google Scholar
  38. 38.
    Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: CIKM, Atlanta, 5–10 November 2001Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Julien Ah-Pine
    • 1
  • Marco Bressan
    • 1
  • Stephane Clinchant
    • 1
  • Gabriela Csurka
    • 1
  • Yves Hoppenot
    • 1
  • Jean-Michel Renders
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations