Information Retrieval Journal

, Volume 21, Issue 1, pp 81–106 | Cite as

A study of untrained models for multimodal information retrieval

  • Melanie ImhofEmail author
  • Martin Braschler


Operational multimodal information retrieval systems have to deal with increasingly complex document collections and queries that are composed of a large set of textual and non-textual modalities such as ratings, prices, timestamps, geographical coordinates, etc. The resulting combinatorial explosion of modality combinations makes it intractable to treat each modality individually and to obtain suitable training data. As a consequence, instead of finding and training new models for each individual modality or combination of modalities, it is crucial to establish unified models, and fuse their outputs in a robust way. Since the most popular weighting schemes for textual retrieval have in the past generalized well to many retrieval tasks, we demonstrate how they can be adapted to be used with non-textual modalities, which is a first step towards finding such a unified model. We demonstrate that the popular weighting scheme BM25 is suitable to be used for multimodal IR systems and analyze the underlying assumptions of the BM25 formula with respect to merging modalities under the so-called raw-score merging hypothesis, which requires no training. We establish a multimodal baseline for two multimodal test collections, show how modalities differ with respect to their contribution to relevance and the difficulty of treating modalities with overlapping information. Our experiments demonstrate that our multimodal baseline with no training achieves a significantly higher retrieval effectiveness than using just the textual modality for the social book search 2016 collection and lies in the range of a trained multimodal approach using the optimal linear combination of the modality scores.


Multimodal information retrieval BM25 Raw-score merging hypothesis 


  1. Amati, G., & Van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), 20(4), 357–389.CrossRefGoogle Scholar
  2. Bogers, T., Koolen, M., Jaap, K., Kazai, G., & Preminger, M. (2014). Overview of the INEX 2014 social book search track. In Conference and labs of the evaluation forum (pp. 462–479).Google Scholar
  3. Braschler, M. (2004). Combination approaches for multilingual text retrieval. Information Retrieval, 7(1), 183–204.CrossRefGoogle Scholar
  4. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. Scholar
  5. Buscaldi, D., & Rosso, P. (2008). The UPV at GeoCLEF 2008: The GeoWorSE system. In working notes from the cross language evaluation forum.Google Scholar
  6. Bush, V., et al. (1945). As we may think. The Atlantic Monthly, 176(1), 101–108.Google Scholar
  7. Callan, J. P., Lu, Z., & Croft, W. B. (1995). Searching distributed collections with inference networks. In Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval (pp. 21–28). ACM.Google Scholar
  8. Chowdhury, A., McCabe, M. C., Grossman, D., & Frieder, O. (2002). Document normalization revisited. In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 381–382). ACM.Google Scholar
  9. Cleverdon, C. (1967). The cranfield tests on index language devices. In Aslib proceedings (Vol. 19, pp. 173–194). MCB UP Ltd.Google Scholar
  10. Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 758–759). ACM.Google Scholar
  11. Craswell, N., Robertson, S., Zaragoza, H., & Taylor, M. (2005). Relevance weighting for query independent evidence. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 416–423). ACM.Google Scholar
  12. Dalvi, N. N., Kumar, R., & Pang, B. (2013). Para’normal’activity: On the distribution of average ratings. In ICWSM.Google Scholar
  13. Depeursinge, A., & Müller, H. (2010). Fusion techniques for combining textual and visual information retrieval. ImageCLEF (pp. 95–114). Berlin: Springer.CrossRefGoogle Scholar
  14. Fox, E. A., & Shaw, J. A. (1994). Combination of multiple searches. NIST Special Publication SP, pp. 243–243.Google Scholar
  15. Hashemi, S. H., & Kamps, J. (2014). Venue recommendation and web search based on anchor text. DTIC Document, Tech. rep.Google Scholar
  16. Imhof, M. (2016). BM25 for non textual modalities in social book search. In Seventh international conference of the CLEF Association, CLEF.Google Scholar
  17. Imhof, M., Badache, I., & Boughanem, M. (2015). Multimodal social book search. In Sixth international conference of the CLEF Association, CLEF.Google Scholar
  18. Imhof, M., & Braschler, M. (2015). Are test collections real? Mirroring real-world complexity in IR test collections. In I. R. Experimental (Ed.), Meets multilinguality, multimodality, and interaction (pp. 241–247). Berlin: Springer.CrossRefGoogle Scholar
  19. Kamps, J., De Rijke, M., & Sigurbjörnsson, B. (2004). Length normalization in XML retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 80–87). ACM.Google Scholar
  20. Koolen, M., Bogers, T., Gäde, M., Hall, M., Hendrickx, I., Huurdeman, H., et al. (2016). Overview of the CLEF 2016 social book search lab. In International conference of the cross-language evaluation forum for European languages (pp. 351–370). Berlin: Springer.Google Scholar
  21. Kwok, K., Grunfeld, L., & Lewis, D. (1995). TREC-3 ad-hoc, routing retrieval and thresholding experiments using PIRCS. NIST Special Publication SP, pp. 247–247.Google Scholar
  22. Li, X., & Croft, W. B. (2003). Time-based language models. In Proceedings of the twelfth international conference on information and knowledge management, CIKM ’03 (pp. 469–475). New York, NY: ACM.
  23. Losada, D. E., & Azzopardi, L. (2008). An analysis on document length retrieval trends in language modeling smoothing. Information Retrieval, 11(2), 109–138.CrossRefGoogle Scholar
  24. Lv, Y., & Zhai, C. (2011). When documents are very long, BM25 fails! In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 1103–1104). ACM.Google Scholar
  25. Macdonald, C., Dinçer, B. T., & Ounis, I. (2015) Transferring learning to rank models for web search. In Proceedings of the 2015 international conference on the theory of information retrieval (pp. 41–50). ACM.Google Scholar
  26. Mandl, T., Carvalho, P., Di Nunzio, G. M., Gey, F., Larson, R. R., Santos, D., & Womser-Hacker, C. (2009). GeoCLEF 2008: The CLEF 2008 cross-language geographic information retrieval track overview. In Evaluating systems for multilingual and multimodal information access (pp. 808–821). Berlin: Springer.Google Scholar
  27. Moulin, C., Barat, C., & Ducottet, C. (2010). Fusion of tf. idf weighted bag of visual features for image classification. In 2010 International workshop on content-based multimedia indexing (CBMI) (pp. 1–6). IEEE.Google Scholar
  28. Mourão, A., Martins, F., & Magalhães, J. (2015). Multimodal medical information retrieval with unsupervised rank fusion. Computerized Medical Imaging and Graphics, 39, 35–45.CrossRefGoogle Scholar
  29. Müller, H., Clough, P., Deselaers, T., & Caputo, B. (2010). Image-CLEF: Experimental evaluation in visual information retrieval series. The information retrieval seriesGoogle Scholar
  30. Overell, S., Rae, A., & Rüger, S. (2008). MMIS at GeoCLEF 2008: Experiments in GIR. In Working notes from the cross language evaluation forum.Google Scholar
  31. Peters, C., Braschler, M., & Clough, P. (2012). Multilingual information retrieval: From research to practice. Berlin: Springer.CrossRefGoogle Scholar
  32. Rajaraman, S. (2009). Five stars dominate ratings.
  33. Robertson, S. E., Van Rijsbergen, C., & Porter, M. F. (1980). Probabilistic models of indexing and searching. In Proceedings of the 3rd annual ACM conference on research and development in information retrieval (pp. 35–56). Butterworth & Co.Google Scholar
  34. Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on information and knowledge management (pp. 42–49). ACM.Google Scholar
  35. Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation, 60(5), 503–520.CrossRefGoogle Scholar
  36. Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Breda: Now Publishers Inc.Google Scholar
  37. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.CrossRefGoogle Scholar
  38. Savoy, J. (2003). Advances in cross-language information retrieval: Third workshop of the cross-language evaluation forum, CLEF 2002 Rome, Italy, September 19–20, 2002 Revised Papers, chap. Report on CLEF 2002 experiments: combining multiple sources of evidence, pp. 66–90. Berlin: Springer.Google Scholar
  39. Savoy, J. (2005). Data fusion for effective european monolingual information retrieval (pp. 233–244). Berlin: Springer. Scholar
  40. Schuth, A., Balog, K., & Kelly, L. (2015). Overview of the living labs for information retrieval evaluation (LL4IR) CLEF lab 2015. In International Conference of the cross-language evaluation forum for European languages (pp. 484–496). Berlin: Springer.Google Scholar
  41. Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (pp. 21–29). ACM.Google Scholar
  42. Smucker, M. D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In CIKM ’07: Proceedings of the sixteenth ACM conference on information and knowledge management (pp. 623–632). New York, NY: ACM.
  43. Villegas, M., Müller, H., Gilbert, A., Piras, L., Wang, J., Mikolajczyk, K., de Herrera, A. G. S., Bromuri, S., Amin, M. A., & Mohammed, M. K., et al. (2015). General overview of imageCLEF at the CLEF 2015 labs. In International conference of the cross-language evaluation forum for European languages (pp. 444–461). berlin: springer.Google Scholar
  44. Voorhees, E. M., & Harman, D. (1999) Overview of the eighth text retrieval conference (TREC-8). In TREC.Google Scholar
  45. Voorhees, E., Gupta, N. K., & Johnson-Laird, B. (1995). The collection fusion problem. NIST Special Publication SP, pp. 95–95.Google Scholar
  46. Voorhees, E. M., & Harman, D. (1996). Overview of the fifth text retrieval conference (TREC-5). TREC, 97, 1–28.Google Scholar
  47. Wilkins, P., Ferguson, P., & Smeaton, A. F. (2006) Using score distributions for query-time fusion in multimediaretrieval. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval (pp. 51–60). ACM.Google Scholar
  48. Woolf, M. (2014). A statistical analysis of 1.2 million amazon reviews.
  49. Yang, J., Jiang, Y. G., Hauptmann, A. G., & Ngo, C. W. (2007) Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on multimedia information retrieval (pp. 197–206). ACMGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Université de NeuchâtelNeuchâtelSwitzerland
  2. 2.Zurich University of Applied SciencesWinterthurSwitzerland

Personalised recommendations