Skip to main content
Log in

Crossing textual and visual content in different application scenarios

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. such as graph embedded into a 2D representation for example

  2. http://download.wikimedia.org/

  3. http://stats.wikimedia.org/EN/CategoryOverviewIndex.htm

References

  1. Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) Xrce’s participation to imageclefphoto 2008. In: Working Notes of the 2008 CLEF Workshop, Aarhus, 17–19 September 2008

  2. Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    Article  MATH  Google Scholar 

  3. Blei D, Michael, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR, Toronto, 28 July–1 August 2003

  4. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, Melbourne, 24–28 August 1998

  5. Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: ECCV, Prague, 11–14 May 2004

  6. Chang Y-C, Chen H-H (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In: CLEF 2006 Working Notes

  7. Clinchant S, Renders J, Csurka G (2007) Xrce’s participation to imageclefphoto 2007. In: Working Notes of the 2007 CLEF Workshop. http://clef.isti.cnr.it/2007/working_notes/CLEF2007WN-Contents.html

  8. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, Prague, May 2004

  9. Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proc. of the 14th international world wide web conference, Chiba, 10–14 May 2005

  10. Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, Copenhagen, 27 May–2 June 2002

  11. Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, Washington, DC, 27 June–2 July 2004

  12. Flickr (2007) Flickr homepage. http://www.flickr.com

  13. Footstops (2007) Footstops homepage. http://footstops.com/

  14. Grubinger M, Clough P, Hanbury A, Müller H (2007) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Working notes of the 2007 CLEF workshop http://www.clef-campaign.org/2007/working_notes/CLEF2007WN-Contents.html

  15. Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM multimedia, Singapore, 6–11 November 2005

  16. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, Toronto, 28 July–1 August 2003

  17. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: NIPS, Vancouver, 13 December 2003

  18. Li X, Chen L, Zhang L, Lin F, ying Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proc. of the 14th annual ACM international conference on multimedia (MM06), Santa Barbara, 23–27 October 2006

  19. Li L-J, Wang G, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR, Minneapolis, 18–23 June 2007

  20. Li J, Wang JZ (2005) Alip: The automatic linguistic indexing of pictures system. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)—vol 2. IEEE Computer Society, Washington, DC, pp. 1208–1209

    Google Scholar 

  21. Maillot N, Chevallet J-P, Valea V, Lim JH (2006) Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In: CLEF 2006 Working Notes

  22. Monay F, Gatica-Perez D (2004) Plsa-based image auto-annotation: constraining the latent space. In: ACM MM, New York, 10–16 October 2004

  23. Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management, Orlando, October 1999

  24. Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph-based automatic image captioning. In: CVPR workshop on multimedia data and document engineering, Washington, DC, July 2004

  25. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, Minneapolis, 18–23 June 2007

  26. Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: CVPR, Minneapolis, 18–23 June 2007

  27. Realtravel (2007) Realtravel homepage. http://realtravel.com/

  28. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Kluwer, Deventer

    Google Scholar 

  29. Tao T, Zhai C (2006) Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, 6–11 August 2006

  30. Travbuddy (2007) Travbuddy homepage. http://www.travbuddy.com/

  31. Travelblog (2007) Travelblog homepage. http://www.travelblog.org/

  32. Travellerspoint (2007) Travellerspoint homepage. http://www.travellerspoint.com

  33. Travelpod (2007) Travelpod homepage. http://www.travelpod.com/

  34. Trippert (2007) Trippert homepage. http://trippert.com/

  35. Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth international symposium on independent component analysis and blind source separation, Nara, 1–4 April 2003

  36. Wang X, Zhang L, Jing W-YMF (2006) Annosearch: image auto-annotation by search. In: CVPR, New York, 17–22 June 2006

  37. Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proc. of ACM multimedia workshop on multimedia information retrieval (MIR05), Singapore, 11–12 November 2005

  38. Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: CIKM, Atlanta, 5–10 November 2001

Download references

Acknowledgements

The authors want to thank particularly INA for their contributions in our work and Florent Perronin for his greatly appreciated help in applying some of the Generic Visual Categorizer (GVC) components. We would like also to acknowledge the following Flickr users whose photographs we reproduced here under Creative Common licences:

Tatiana Sapateiro   http://www.flickr.com/photos/tatianasapateiro

Pedro Paulo Silva de Souza   http://www.flickr.com/photos/pedrop

Leonardo Pallotta   http://www.flickr.com/photos/groundzero

Laszlo Ilyes   http://www.flickr.com/photos/laszlo-photo

Jorge Wagner   http://www.flickr.com/photos/jorgewagner

UminDaGuma   http://www.flickr.com/photos/umindaguma

Scott Robinson   http://www.flickr.com/photos/clearlyambiguous

Gabriel Flores Romero   http://www.flickr.com/photos/gabofr

Jenny Mealing   http://www.flickr.com/photos/jennifrog

Roney   http://www.flickr.com/photos/roney

David Katarina   http://www.flickr.com/photos/davidkatarina

T. Chu   http://www.flickr.com/photos/spyderball

Bill Wilcox   http://www.flickr.com/photos/billwilcox

S2RD2   http://www.flickr.com/photos/stuardo

Fred Hsu   http://www.flickr.com/photos/fhsu

Abel Pardo López   http://www.flickr.com/photos/sancho_panza

Cat   http://www.flickr.com/photos/clspeace

Thowra_uk   http://www.flickr.com/photos/thowra

Elena Heredero   http://www.flickr.com/photos/elenaheredero

Rick McCharles   http://www.flickr.com/photos/rickmccharles

Marília Almeida   http://www.flickr.com/photos/68306118@N00

Gustavo Madico   http://www.flickr.com/photos/desdegus

Douglas Fernandes   http://www.flickr.com/photos/thejourney1972

James Preston   http://www.flickr.com/photos/jamespreston

Rodrigo Della Fávera   http://www.flickr.com/photos/rodrigofavera

Dinesh Rao   http://www.flickr.com/photos/dinrao

Marina Campos Vinhal   http://www.flickr.com/photos/marinacvinhal

Jorge Gobbi   http://www.flickr.com/photos/morrissey

Steve Taylor   http://www.flickr.com/photostheboywiththethorninhisside/

Finally would like also to acknowledge the users who wrote the blog paragraphs were used and reproduced here. These texts can be found at the folloing addresses:

http://realtravel.com/cuzco-journals-j1879736.html

http://realtravel.com/machu_picchu-journals-j5181463.html

http://realtravel.com/rio-journals-j4669810.html

http://www.travelpod.com/travel-blog-entries/sarah_s_america/south_america/1140114720/tpod.html

http://www.travelpod.com/travel-blog-entries/rachel_john/roundtheworld/1146006300/tpod.html

http://www.travelpod.com/travel-blog-entries/eatdessertfirst/world_tour_05/1160411340/tpod.html

http://www.travelpod.com/travel-blog-entries/idarich/rtw_2005/1140476400/tpod.html

http://www.travelpod.com/travel-blog-entries/twittg/rtw/1132765860/tpod.html

http://www.travelpod.com/travel-blog-entries/emanddave/worldtrip2006/1155492420/tpod.html

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriela Csurka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ah-Pine, J., Bressan, M., Clinchant, S. et al. Crossing textual and visual content in different application scenarios. Multimed Tools Appl 42, 31–56 (2009). https://doi.org/10.1007/s11042-008-0246-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-008-0246-8

Keywords

Navigation