Skip to main content

Leveraging Image, Text and Cross–media Similarities for Diversity–focused Multimedia Retrieval

  • Chapter
Book cover ImageCLEF

Part of the book series: The Information Retrieval Series ((INRE,volume 32))

Abstract

This chapter summarizes the different cross–modal information retrieval techniques Xerox Research Centre implemented during three years of participation in ImageCLEF Photo tasks. The main challenge remained constant: how to optimally couple visual and textual similarities, when they capture things at different semantic levels and when one of the media (the textual one) gives, most of the time, much better retrieval performance. Some core components turned out to be very effective all over the years: the visual similarity metrics based on Fisher Vector representation of images and the cross–media similarity principle based on relevance models. However, other components were introduced to solve additional issues: We tried different query– and document–enrichment methods by exploiting auxiliary resources such as Flickr or open–source thesauri, or by doing some statistical ‘semantic smoothing’. We also implemented some clustering mechanisms in order to promote diversity in the top results and to provide faster access to relevant information. This chapter describes, analyses and assesses each of these components, namely: the monomodal similarity measures, the different cross–media similarities, the query and document enrichment, and finally the mechanisms to ensure diversity in what is proposed to the user. To conclude, we discuss the numerous lessons we have learnt over the years by trying to solve this very challenging task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ah-Pine J (2009) Cluster analysis based on the central tendency deviation principle. In: Proceedings of the International Conference on Advanced Data Mining and Applications, pp 5–18

    Google Scholar 

  • Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) XRCE’s participation to ImageCLEF 2008. In: Working Notes of CLEF 2008, Aarhus, Denmark

    Google Scholar 

  • Ah-Pine J, Bressan M, Clinchant S, Csurka G, Hoppenot Y, Renders J (2009) Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42(1):31–56

    Article  Google Scholar 

  • Ah-Pine J, Clinchant S, Csurka G, Liu Y (2009) XRCE’s participation to ImageCLEF 2009. In: Working Notes of the 2009 CLEF Workshop, Corfu, Greece

    Google Scholar 

  • Ah-Pine J, Csurka G, Renders JM (2009c) Evaluation of diversity–focused strategies for multimedia retrieval. In: Evaluating Systems for Multilingual and Multimodal Information Access, Springer, Lecture Notes in Computer Science (LNCS), vol 5706, pp 677–684

    Google Scholar 

  • Ah-Pine J, Clinchant S, Csurka G Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In: Multilingual Information Access Evaluation, Springer, Lecture Notes in Computer Science (LNCS)

    Google Scholar 

  • Barnard K, Duygulu P, Forsyth D, de Freitas N, Jordan M (2003) Matching words and pictures. Journal of Machine Learning Research 3:1107–1135

    Article  MATH  Google Scholar 

  • Blei D, Jordan MI (2003) Modeling annotated data. In: Proceedings of the ACM SIGIR conference, ACM press, pp 127–134

    Google Scholar 

  • Boudin F, El-Bèze M, Torres-Moreno J (2008) A scalable MMR approach to sentence scoring for multi–document update summarization. In: Proceedings of the international conference on computational linguistics, pp 21–24

    Google Scholar 

  • Carbonell J, Goldstein J (1998) The use of MMR, diversity–based reranking for reordering documents and producing summaries. In: Proceedings of the ACM SIGIR conference, ACM press, pp 335–336

    Google Scholar 

  • Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision, vol 1, pp 350–362

    Google Scholar 

  • Chang YC, Chen HH (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross–language image retrieval. In: Working notes CLEF 2006

    Google Scholar 

  • Clinchant S, Renders J, Csurka G (2007) XRCE’s participation to ImageCLEF 2007. In: Working Notes of CLEF 2007, Budapest, Hungary

    Google Scholar 

  • Clinchant S, Renders JM, Csurka G (2008) Trans–media pseudo–relevance feedback methods in multimedia retrieval. In: Advances in Multilingual and Multimodal Information Retrieval, Springer, Lecture Notes in Computer Science (LNCS), vol 5152, pp 569–576

    Google Scholar 

  • Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, pp 59–74

    Google Scholar 

  • Deselaers T, Hanbury A (2008) The Visual Concept Detection Task in ImageCLEF 2008. In: Working Notes of CLEF 2008

    Google Scholar 

  • Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: European conference on computer vision, vol 4, pp 97–112

    Google Scholar 

  • Everingham M, Sivic J, Zisserman A (2006) “hello! my name is... buffy” – automatic naming of characters in TV video. In: British machine vision conference, pp 889–908

    Google Scholar 

  • Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: International conference on computer vision and pattern recognition, vol 2, pp 1002–1009

    Google Scholar 

  • Huang T, Dagli C, Rajaram S, Chang E, Mandel M, Poliner G, Ellis D (2008) Active learning for interactive multimedia retrieval. Proceedings of the IEEE 96(4):648–667

    Article  Google Scholar 

  • Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual–text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM Multimedia, ACM press, pp 21–30

    Google Scholar 

  • Jaakkola T, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, MIT Press, pp 487–493

    Google Scholar 

  • Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross–media relevance models. In: Proceedings of the ACM SIGIR conference, ACM press, pp 119–126

    Google Scholar 

  • Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Annual conference on neural information processing systems, pp 553–560

    Google Scholar 

  • Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 25:1075–1088

    Article  Google Scholar 

  • Lin Z, Chua T, Kan M, Lee W, Qiu L, Ye S (2005) NUS at DUC 2007: Using evolutionary models of text. In: Document Understanding Conference

    Google Scholar 

  • Maillot N, Chevallet JP, Valea V, Lim JH (2006) IPAL Inter–Media Pseudo–Relevance Feedback Approach to ImageCLEF 2006 photo retrieval. In: CLEF 2006 Working Notes

    Google Scholar 

  • Manning CD, Schütze H, Lee L (2000) Review: Foundations of statistical natural language processing

    Google Scholar 

  • Marcotorchino J, Michaud P (1981) Heuristic approach of the similarity aggregation problem. Methods of operation research 43:395–404

    MATH  Google Scholar 

  • Monay F, Gatica-Perez D (2004) PLSA–based Image Auto–Annotation: Constraining the Latent Space. In: Proceedings of ACM Multimedia, ACM press, pp 348–351

    Google Scholar 

  • Mori Y, Takahashi H, Oka R (1999) Image–to–word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management

    Google Scholar 

  • Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph–based automatic image captioning. In: CVPR Workshop on Multimedia Data and Document Engineering at the computer Vision and Pattern recognition conference

    Google Scholar 

  • Perronnin F (2010) Large–scale image retrieval with compressed fisher vectors. In: International Conference on computer vision and pattern recognition, IEEE

    Google Scholar 

  • Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: International conference on computer vision and pattern recognition, IEEE

    Google Scholar 

  • Shen X, Zhai C (2005) Active feedback in ad hoc information retrieval. In: International ACM SIGIR conference, ACM press, pp 59–66

    Google Scholar 

  • Sivic JS, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International conference on computer vision, IEEE, vol 2, pp 1470–1477

    Google Scholar 

  • Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth International Symposium on Independent Component Analysis and Blind Source Separation

    Google Scholar 

  • Zhai C, Lafferty JD (2001) Model–based feedback in the language modeling approach to information retrieval. In: Conference on Information and Knowledge management, pp 403–410

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Ah-Pine .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ah-Pine, J., Clinchant, S., Csurka, G., Perronnin, F., Renders, JM. (2010). Leveraging Image, Text and Cross–media Similarities for Diversity–focused Multimedia Retrieval. In: Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) ImageCLEF. The Information Retrieval Series, vol 32. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15181-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15181-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15180-4

  • Online ISBN: 978-3-642-15181-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics