Multimedia Tools and Applications

, Volume 69, Issue 3, pp 991–1019 | Cite as

Multimodal retrieval with relevance feedback based on genetic programming

  • Rodrigo Tripodi CalumbyEmail author
  • Ricardo da Silva Torres
  • Marcos André Gonçalves


This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal) similarity measures using the information obtained throughout the user relevance feedback iterations. With these new functions, several similarity measures, including those extracted from different modalities (e.g., text, and content), are combined into one single measure that properly encodes the user preferences. This framework was instantiated for multimodal image retrieval using visual and textual features and was validated using two image collections, one from the Washington University and another from the ImageCLEF Photographic Retrieval Task. For this image retrieval instance several multimodal relevance feedback techniques were implemented and evaluated. The proposed approach has produced statistically significant better results for multimodal retrieval over single modality approaches and superior effectiveness when compared to the best submissions of the ImageCLEF Photographic Retrieval Task 2008.


Multimodal retrieval Learn to rank Image retrieval Relevance feedback Genetic programming 



We would like to thank all partners from LIS (Laboratory of Information Systems - IC/UNICAMP), RECOD (Reasoning for Complex Data - IC/UNICAMP), LDB (Databases Lab - DCC/UFMG). This work was supported by The National Council for Scientific and Technological Development (CNPq), Coordination for the Improvement of Higher Level Personnel (CAPES), São Paulo Research Foundation (FAPESP), and Minas Gerais Agency for Research and Development (FAPEMIG).


  1. 1.
    Agrawal R, Grosky W, Fotouhi F (2006) Image retrieval using multimodal keywords. In: ISM ’06: Proceedings of the eighth IEEE international symposium on multimedia. Washington, DC, USA, pp 817–822. doi: 10.1109/ISM.2006.91
  2. 2.
    Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders JM (2008) Xrce’s participation to imageclef 2008. In: Working notes for the CLEF 2008 workshopGoogle Scholar
  3. 3.
    Atrey P, Hossain M, Saddik AE, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:1–35. doi: 10.1007/s00530-010-0182-0 CrossRefGoogle Scholar
  4. 4.
    Baeza-Yates RA, Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USAGoogle Scholar
  5. 5.
    Banzhaf W, Nordin P, Keller R, Francone F (1998) Genetic programming—an introduction. Morgan Kaufmann Publishers, Inc, San Francisco, CACrossRefzbMATHGoogle Scholar
  6. 6.
    Bhanu B, Lin Y (2004) Object detection in multi-modal images using genetic programming. Appl Soft Comput 4(2):175–201CrossRefGoogle Scholar
  7. 7.
    Bottoni P, Ferri F, Grifoni P, Marcante A, Mussio P, Padula M, Reggiori A (2009) e-document management in situated interactivity: the wil approach. Univers Access Inf Soc 8:137–153. doi: 10.1007/s10209-008-0142-z, URL: CrossRefGoogle Scholar
  8. 8.
    Bruno E, Kludas J, Marchand-Maillet S (2007) Combining multimodal preferences for multimedia information retrieval. In: MIR ’07: proceedings of the international workshop on workshop on multimedia information retrieval. New York, NY, USA, pp 71–78. doi:10.1145/1290082.1290095
  9. 9.
    Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’04. ACM, New York, NY, USA, pp 25–32. doi:10.1145/1008992.1009000 CrossRefGoogle Scholar
  10. 10.
    Caschera MC, D’Ulizia A (2007) Information extraction based on personalization and contextualization models for multimodal data. In: Proceedings of the 18th international conference on database and expert systems applications. IEEE Computer Society, Washington, DC, USA, pp 114–118. doi:10.1109/DEXA.2007.89, URL: Google Scholar
  11. 11.
    Chai JY, Hong P, Zhou MX (2004) A probabilistic approach to reference resolution in multimodal user interfaces. In: Proceedings of the 9th international conference on intelligent user interfaces, IUI ’04. ACM, New York, NY, USA, pp 70–77. doi:10.1145/964442.964457 Google Scholar
  12. 12.
    Clinchant S, Csurka1 G, Ah-Pine J, Jacquet G, Perronnin F, Sánchez J, Minoukadeh K (2010) Xrce’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of imageclef 2010. In: CLEF (Notebook Papers/LABs/Workshops)Google Scholar
  13. 13.
    Clough P, Grubinger M, Deselaers T, Hanbury A, Mller H (2007) Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In: Evaluation of multilingual and multi-modal information retrieval. Lecture notes in computer science, vol 4730. Springer Berlin / Heidelberg, pp 579–594. doi: 10.1007/978-3-540-74999-8_71, URL: CrossRefGoogle Scholar
  14. 14.
    Coelho TAS, Calado PP, Souza LV, Ribeiro-Neto B, Muntz R (2004) Image retrieval using multiple evidence ranking. IEEE Trans Knowl Data Eng 16(4):408–417. doi: 10.1109/TKDE.2004.1269666 CrossRefGoogle Scholar
  15. 15.
    Cooke T, Jkel F, Wallraven C, Blthoff HH (2007) Multimodal similarity and categorization of novel, three-dimensional objects. Neuropsychologia 45(3):484–495. CrossRefGoogle Scholar
  16. 16.
    Corradini A, Mehta M, Bernsen NO, Martin JC, Abrilian S (2003) Multimodal input fusion in humancomputer interaction on the example of the on-going nice project. In: Proceedings of the NATO-ASI conference on data fusion for situation monitoring, incident detection, alert and response managementGoogle Scholar
  17. 17.
    Deb S, Zhang Y (2004) An overview of content-based image retrieval techniques. In: Proceedings of the 18th international conference on advanced information networking and applications, vol 1, pp 59–64Google Scholar
  18. 18.
    Dorairaj R, Namuduri K (2004) Compact combination of MPEG-7 color and texture descriptors for image retrieval. In: Conference record of the thirty-eighth asilomar conference on signals, systems and computers, vol 1, pp 387–391Google Scholar
  19. 19.
    D’Ulizia A, Ferri F, Grifoni P (2010) Generating multimodal grammars for multimodal dialogue processing. Trans Sys Man Cyber Part A 40:1130–1145. doi: 10.1109/TSMCA.2010.2041227 Google Scholar
  20. 20.
    Equitz W, Niblack W (1994) Retrieving images from a database using texture-algorithms from the QBIC system. IBM Research Report Technical Report RJ 9805, IBMGoogle Scholar
  21. 21.
    Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for Web search. J Am Soc Inf Sci Technol 55(7):628–636CrossRefGoogle Scholar
  22. 22.
    Ferecatu M, Sahbi H (2008) Telecom paristech at imageclefphoto 2008: bi-modal text and image retrieval with diversity enhancement. In: Working notes for the CLEF 2008 workshopGoogle Scholar
  23. 23.
    Ferreira CD, dos Santos JA, da Silva Torres R, Gonçalves MA, Rezende RC, Fan W (2011) Relevance feedback based on genetic programming for image retrieval. Pattern Recogn Lett 32(1):27–37CrossRefGoogle Scholar
  24. 24.
    Ferri F, Grifoni P, Padula M (2002) Using shape to index and query Web document contents. J Vis Lang Comput 13(4):355–373. doi:10.1006/jvlc.2002.0221, URL: CrossRefGoogle Scholar
  25. 25.
    Flickner M, Sawhney H, Niblack W, Ashley JQH, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. Computer 28(9):23–32CrossRefGoogle Scholar
  26. 26.
    Freitas RB, da Silva Torres R (2005) OntoSAIA: Um ambiente Baseado em Ontologias para Recuperao e Anotao Semi-Automtica de Imagens. In: Proceedings of primeiro workshop de bibliotecas digitais, Simpsio Brasileiro de Banco de Dados, pp 60–79. Uberlandia, MG, BrazilGoogle Scholar
  27. 27.
    Grubinger M, Clough P, Hanbury A, Mller H (2008) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Advances in multilingual and multimodal information retrieval. Lecture notes in computer science, vol 5152. Springer Berlin / Heidelberg, pp 433–444. doi: 10.1007/978-3-540-85760-0_57, URL:
  28. 28.
    Harman D (1992) Relevance feedback revisited. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval. Copenhagen, Denmark, pp 1–10. doi:10.1145/133160.133167
  29. 29.
    Huang C, Liu Q (2007) An orientation independent texture descriptor for image retireval. In: International conference on computational science, pp 772–776Google Scholar
  30. 30.
    Huang J, Kumar R, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 762–768Google Scholar
  31. 31.
    Jiang W, Er G, Dai Q, Gu J (2005) Hidden annotation for image retrieval with long-term relevance feedback learning. Pattern Recogn 38(11):2007–2021CrossRefGoogle Scholar
  32. 32.
    Johnston M, Bangalore S (2005) Finite-state multimodal integration and understanding. Nat Lang Eng 11:159–187. doi: 10.1017/S1351324904003572, URL: CrossRefGoogle Scholar
  33. 33.
    Kak A, Pavlopoulou C (2002) Content-based image retrieval from large medical databases. In: First international symposium on 3D data processing visualization and transmission, vol 10(1), pp 138–147Google Scholar
  34. 34.
    Kim DH, Chung CW, Barnard K (2005) Relevance feedback using adaptive clustering for image similarity retrieval. J Syst Softw 78(1):9–23CrossRefGoogle Scholar
  35. 35.
    Kovaćević A, Milosavljevć B, Konjović Z, Vidaković M (2010) Adaptive content-based music retrieval system. Multimed Tools Appl 47:525–544. doi: 10.1007/s11042-009-0336-2 CrossRefGoogle Scholar
  36. 36.
    Kovalev V, Volmer S (1998) Color co-occurence descriptors for querying-by-example. In: Proceedings of the 1998 conference on multimedia modeling, pp 32–38Google Scholar
  37. 37.
    Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USAzbMATHGoogle Scholar
  38. 38.
    Lew MS (ed) (2001) Principles of visual information retrieval—advances in pattern recognition. Springer-Verlag, London Berlin HeidelbergGoogle Scholar
  39. 39.
    Lewis J, Ossowski S, Hicks J, Errami M, Garner H (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18):2298–2304. CrossRefGoogle Scholar
  40. 40.
    Li B, Yuan S (2004) A novel relevance feedback method in content-based image retrieval. In: Proceedings of international conference on information technology: coding an computing, pp 120–123Google Scholar
  41. 41.
    Lieberman H, Rosenzweig E, Singh P (2001) Aria: an agent for annotating and retrieving images. Computer 34(7):57–62CrossRefGoogle Scholar
  42. 42.
    Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1190CrossRefGoogle Scholar
  43. 43.
    Lu K, He X (2005) Image retrieval based on incremental subspace learning. Pattern Recogn 38(11):2047–2054CrossRefGoogle Scholar
  44. 44.
    Mankoff J, Hudson SE, Abowd GD (2000) Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00. ACM, New York, NY, USA, pp 368–375. doi:10.1145/332040.332459 CrossRefGoogle Scholar
  45. 45.
    Meffert K (2010) Jgap—Java genetic algorithms and genetic programming package. Accessed 15 Jan 2011
  46. 46.
    Ogle VE, Stonebraker M (1995) Chabot: retrieval from relational database of images. Computer 28(9):40–48CrossRefGoogle Scholar
  47. 47.
    Oviatt S (2008) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap multimodal interfaces. CRC PressGoogle Scholar
  48. 48.
    Penatti OB, da Silva Torres R (2008) Color descriptors for Web image retrieval: a comparative study. In: XXI Brazilian symposium on computer graphics and image processingGoogle Scholar
  49. 49.
    Penatti OB, Valle EA, da Silva Torres R (2012) Comparative study of global color and texture descriptors for Web image retrieval. J Vis Commun Image Represent 23:359–380CrossRefGoogle Scholar
  50. 50.
    Porter MF (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 313–316. Google Scholar
  51. 51.
    Robertson SE, Walker S, Jones S, Hancock-beaulieu MM, Gatford M (1995) Okapi at trec-3. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp 109–126Google Scholar
  52. 52.
    Rui Y, Huang TS, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5):644–655CrossRefGoogle Scholar
  53. 53.
    Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Visual Commun Image Represent 10(1):39–62CrossRefGoogle Scholar
  54. 54.
    da Silva Torres R (2004) Integrating image and spatial data for biodiversity information management. PhD thesis, Institute of Computing, University of CampinasGoogle Scholar
  55. 55.
    da Silva Torres R, Falcão AX (2006) Content-based image retrieval: theory and applications. Rev Inform Teór Apl 13(2):161–185Google Scholar
  56. 56.
    da Silva Torres R, Falcão AX, Gonalves MA, Papa JP, Zhang B, Fan W, Fox EA (2009) A genetic programming framework for content-based image retrieval. Pattern Recogn 42(2):283–292CrossRefGoogle Scholar
  57. 57.
    Santos KL, Almeida H, da Silva Torres R, Gonalves MA (2009) Recuperao de imagens da Web utilizando múltiplas evidncias textuais e programao gentica. In: Brazilian symposium on databases. Fortaleza, Brazil, pp 91–105Google Scholar
  58. 58.
    Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  59. 59.
    Stehling R, Nascimento M, Falcão A (2002) A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proceedings of the eleventh international conference on information and knowledge management, pp 102–109Google Scholar
  60. 60.
    Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7(1):11–32CrossRefGoogle Scholar
  61. 61.
    Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473CrossRefGoogle Scholar
  62. 62.
    Tao B, Dickinson B (2000) Texture recognition and image retrieval using gradient indexing. J Vis Commun Image Represent 11(3):327–342CrossRefGoogle Scholar
  63. 63.
    Thomas A, Paul C, Sanderson M, Grubinger M (2009) Overview of the ImageCLEFphoto 2008 photographic retrieval task. In: Evaluating systems for multilingual and multimodal information access. Lecture notes in computer science, vol 5706. Springer Berlin / Heidelberg, pp 500–511. doi: 10.1007/978-3-642-04447-2_62, URL: CrossRefGoogle Scholar
  64. 64.
    Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multi-modality learning. In: MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on multimedia. New York, NY, USA, pp 862–871. doi:10.1145/1101149.1101337
  65. 65.
    Vadivel A, Majumdar A, Sural S (2004) Characteristics of weighted feature vector in content-based image retrieval applications. In: International conference intelligent sensing and information processing, pp 127–132Google Scholar
  66. 66.
    Williams A, Yoon P (2007) Content-based image retrieval using joint correlograms. Multimed Tools Appl 34(2):239–248CrossRefGoogle Scholar
  67. 67.
    Wu P, Manjunanth BS, Newsam SD, Shin HD (1999) A texture descriptor for image retrieval and browsing. In: CBAIVL ’99: proceedings of the IEEE workshop on content-based access of image and video libraries. IEEE Computer Society, Washington, DC, USA, p 3CrossRefGoogle Scholar
  68. 68.
    Xu Z, Xu X, Yu K, Tresp V (2003) A hybrid relevance-feedback approach to text retrieval. In: Proceedings of the 25th European conference on information retrieval research. Lecture notes in computer science, vol 2633, pp 81–293Google Scholar
  69. 69.
    Yan R, Hauptmann AG (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10(4–5):445–484. doi: 10.1007/s10791-007-9031-y, URL: CrossRefGoogle Scholar
  70. 70.
    Zeng Z, Hu Y, Liu M, Fu Y, Huang TS (2006) Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition. In: Proceedings of the 14th annual ACM international conference on multimedia, MULTIMEDIA ’06, pp 65–68. ACM, New York, NY, USA. doi:10.1145/1180639.1180661 CrossRefGoogle Scholar
  71. 71.
    Zhai CX, Cohen WW, Lafferty J (2003) Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03. ACM, New York, NY, USA, pp 10–17. doi:10.1145/860435.860440 CrossRefGoogle Scholar
  72. 72.
    Zhang D, Lu G (2004) Review of shape representation and description. Pattern Recogn 37(1):1–19CrossRefzbMATHGoogle Scholar
  73. 73.
    Zhang B, Gonçalves MA, Fan W, Chen Y, Fox EA, Calado P, Cristo M (2004) Combining structural and citation-based evidence for text classification. In: Proceedings of the 13th ACM conference on information and knowledge management, pp 162–163Google Scholar
  74. 74.
    Zhang R, Zhang Z, Li M, Ma W, Zhang H (2006) A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimedia Syst 12(1):27–33. doi: 10.1007/s00530-006-0025-1, URL: CrossRefGoogle Scholar
  75. 75.
    Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimedia Syst 8(6): 536–544CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Rodrigo Tripodi Calumby
    • 1
    • 2
    Email author
  • Ricardo da Silva Torres
    • 2
  • Marcos André Gonçalves
    • 3
  1. 1.Department of Exact SciencesUniversity of Feira de SantanaFeira de SantanaBrazil
  2. 2.RECOD Lab, Institute of ComputingUniversity of CampinasCampinasBrazil
  3. 3.Department of Computer ScienceFederal University of Minas GeraisBelo HorizonteBrazil

Personalised recommendations