Crossing textual and visual content in different application scenarios

Ah-Pine, Julien; Bressan, Marco; Clinchant, Stephane; Csurka, Gabriela; Hoppenot, Yves; Renders, Jean-Michel

doi:10.1007/s11042-008-0246-8

Crossing textual and visual content in different application scenarios

Published: 13 November 2008

Volume 42, pages 31–56, (2009)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Julien Ah-Pine¹,
Marco Bressan¹,
Stephane Clinchant¹,
Gabriela Csurka¹,
Yves Hoppenot¹ &
…
Jean-Michel Renders¹

418 Accesses
27 Citations
3 Altmetric
Explore all metrics

Abstract

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Image Retrieval Based on Keywords and Low-Level Image Features

The Browsing Issue in Multimodal Information Retrieval: A Navigation Tool Over a Multiple Media Search Result Space

Content-Based Multimedia Retrieval

Notes

such as graph embedded into a 2D representation for example
http://download.wikimedia.org/
http://stats.wikimedia.org/EN/CategoryOverviewIndex.htm

References

Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) Xrce’s participation to imageclefphoto 2008. In: Working Notes of the 2008 CLEF Workshop, Aarhus, 17–19 September 2008
Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Article MATH Google Scholar
Blei D, Michael, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR, Toronto, 28 July–1 August 2003
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, Melbourne, 24–28 August 1998
Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: ECCV, Prague, 11–14 May 2004
Chang Y-C, Chen H-H (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In: CLEF 2006 Working Notes
Clinchant S, Renders J, Csurka G (2007) Xrce’s participation to imageclefphoto 2007. In: Working Notes of the 2007 CLEF Workshop. http://clef.isti.cnr.it/2007/working_notes/CLEF2007WN-Contents.html
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, Prague, May 2004
Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proc. of the 14th international world wide web conference, Chiba, 10–14 May 2005
Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, Copenhagen, 27 May–2 June 2002
Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, Washington, DC, 27 June–2 July 2004
Flickr (2007) Flickr homepage. http://www.flickr.com
Footstops (2007) Footstops homepage. http://footstops.com/
Grubinger M, Clough P, Hanbury A, Müller H (2007) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Working notes of the 2007 CLEF workshop http://www.clef-campaign.org/2007/working_notes/CLEF2007WN-Contents.html
Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM multimedia, Singapore, 6–11 November 2005
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, Toronto, 28 July–1 August 2003
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: NIPS, Vancouver, 13 December 2003
Li X, Chen L, Zhang L, Lin F, ying Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proc. of the 14th annual ACM international conference on multimedia (MM06), Santa Barbara, 23–27 October 2006
Li L-J, Wang G, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR, Minneapolis, 18–23 June 2007
Li J, Wang JZ (2005) Alip: The automatic linguistic indexing of pictures system. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)—vol 2. IEEE Computer Society, Washington, DC, pp. 1208–1209
Google Scholar
Maillot N, Chevallet J-P, Valea V, Lim JH (2006) Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In: CLEF 2006 Working Notes
Monay F, Gatica-Perez D (2004) Plsa-based image auto-annotation: constraining the latent space. In: ACM MM, New York, 10–16 October 2004
Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management, Orlando, October 1999
Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph-based automatic image captioning. In: CVPR workshop on multimedia data and document engineering, Washington, DC, July 2004
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, Minneapolis, 18–23 June 2007
Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: CVPR, Minneapolis, 18–23 June 2007
Realtravel (2007) Realtravel homepage. http://realtravel.com/
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Kluwer, Deventer
Google Scholar
Tao T, Zhai C (2006) Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, 6–11 August 2006
Travbuddy (2007) Travbuddy homepage. http://www.travbuddy.com/
Travelblog (2007) Travelblog homepage. http://www.travelblog.org/
Travellerspoint (2007) Travellerspoint homepage. http://www.travellerspoint.com
Travelpod (2007) Travelpod homepage. http://www.travelpod.com/
Trippert (2007) Trippert homepage. http://trippert.com/
Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth international symposium on independent component analysis and blind source separation, Nara, 1–4 April 2003
Wang X, Zhang L, Jing W-YMF (2006) Annosearch: image auto-annotation by search. In: CVPR, New York, 17–22 June 2006
Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proc. of ACM multimedia workshop on multimedia information retrieval (MIR05), Singapore, 11–12 November 2005
Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: CIKM, Atlanta, 5–10 November 2001

Download references

Acknowledgements

The authors want to thank particularly INA for their contributions in our work and Florent Perronin for his greatly appreciated help in applying some of the Generic Visual Categorizer (GVC) components. We would like also to acknowledge the following Flickr users whose photographs we reproduced here under Creative Common licences:

Tatiana Sapateiro http://www.flickr.com/photos/tatianasapateiro

Pedro Paulo Silva de Souza http://www.flickr.com/photos/pedrop

Leonardo Pallotta http://www.flickr.com/photos/groundzero

Laszlo Ilyes http://www.flickr.com/photos/laszlo-photo

Jorge Wagner http://www.flickr.com/photos/jorgewagner

UminDaGuma http://www.flickr.com/photos/umindaguma

Scott Robinson http://www.flickr.com/photos/clearlyambiguous

Gabriel Flores Romero http://www.flickr.com/photos/gabofr

Jenny Mealing http://www.flickr.com/photos/jennifrog

Roney http://www.flickr.com/photos/roney

David Katarina http://www.flickr.com/photos/davidkatarina

T. Chu http://www.flickr.com/photos/spyderball

Bill Wilcox http://www.flickr.com/photos/billwilcox

S2RD2 http://www.flickr.com/photos/stuardo

Fred Hsu http://www.flickr.com/photos/fhsu

Abel Pardo López http://www.flickr.com/photos/sancho_panza

Cat http://www.flickr.com/photos/clspeace

Thowra_uk http://www.flickr.com/photos/thowra

Elena Heredero http://www.flickr.com/photos/elenaheredero

Rick McCharles http://www.flickr.com/photos/rickmccharles

Marília Almeida http://www.flickr.com/photos/68306118@N00

Gustavo Madico http://www.flickr.com/photos/desdegus

Douglas Fernandes http://www.flickr.com/photos/thejourney1972

James Preston http://www.flickr.com/photos/jamespreston

Rodrigo Della Fávera http://www.flickr.com/photos/rodrigofavera

Dinesh Rao http://www.flickr.com/photos/dinrao

Marina Campos Vinhal http://www.flickr.com/photos/marinacvinhal

Jorge Gobbi http://www.flickr.com/photos/morrissey

Steve Taylor http://www.flickr.com/photostheboywiththethorninhisside/

Finally would like also to acknowledge the users who wrote the blog paragraphs were used and reproduced here. These texts can be found at the folloing addresses:

http://realtravel.com/cuzco-journals-j1879736.html

http://realtravel.com/machu_picchu-journals-j5181463.html

http://realtravel.com/rio-journals-j4669810.html

http://www.travelpod.com/travel-blog-entries/sarah_s_america/south_america/1140114720/tpod.html

http://www.travelpod.com/travel-blog-entries/rachel_john/roundtheworld/1146006300/tpod.html

http://www.travelpod.com/travel-blog-entries/eatdessertfirst/world_tour_05/1160411340/tpod.html

http://www.travelpod.com/travel-blog-entries/idarich/rtw_2005/1140476400/tpod.html

http://www.travelpod.com/travel-blog-entries/twittg/rtw/1132765860/tpod.html

http://www.travelpod.com/travel-blog-entries/emanddave/worldtrip2006/1155492420/tpod.html

Author information

Authors and Affiliations

Xerox Research Centre Europe, 6, chemin de Maupertuis, 38240, Meylan, France
Julien Ah-Pine, Marco Bressan, Stephane Clinchant, Gabriela Csurka, Yves Hoppenot & Jean-Michel Renders

Authors

Julien Ah-Pine
View author publications
You can also search for this author in PubMed Google Scholar
Marco Bressan
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Clinchant
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Csurka
View author publications
You can also search for this author in PubMed Google Scholar
Yves Hoppenot
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Michel Renders
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriela Csurka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ah-Pine, J., Bressan, M., Clinchant, S. et al. Crossing textual and visual content in different application scenarios. Multimed Tools Appl 42, 31–56 (2009). https://doi.org/10.1007/s11042-008-0246-8

Download citation

Published: 13 November 2008
Issue Date: March 2009
DOI: https://doi.org/10.1007/s11042-008-0246-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crossing textual and visual content in different application scenarios

Abstract

Access this article

Similar content being viewed by others

Multimodal Image Retrieval Based on Keywords and Low-Level Image Features

The Browsing Issue in Multimodal Information Retrieval: A Navigation Tool Over a Multiple Media Search Result Space

Content-Based Multimedia Retrieval

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation