Artificial Intelligence Review

, Volume 25, Issue 1–2, pp 179–191 | Cite as

Probability-based fusion of information retrieval result sets

  • D. Lillis
  • F. Toolan
  • A. Mur
  • L. Peng
  • R. Collier
  • J. Dunnion
Article

Abstract

Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become an extremely important area as the amount of electronically available information increases dramatically. There are numerous methods of performing the IR task both by utilising different techniques and through using different representations of the information available to us. It has been shown that some algorithms outperform others on certain tasks. Combining the results produced by different algorithms has resulted in superior retrieval performance and this has become an important research area. This paper introduces a probability-based fusion technique probFuse that shows initial promise in addressing this question. It also compares probFuse with the common CombMNZ data fusion technique.

Keywords

Data fusion Information retrieval ProbFuse 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslam JA, Montague M (2000) Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 379–381Google Scholar
  2. Aslam JA, Montague M (2001) Models for metasearch. In: SIGIR ’01: proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 276–284Google Scholar
  3. Baeza-Yates RA and Ribeiro-Neto B (1999). Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA Google Scholar
  4. Bartell BT, Cottrell GW, Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag, New York, New York Inc., NY, USA, pp 173–181Google Scholar
  5. Beitzel SM., Jensen EC, Chowdhury A, Grossman D, Frieder O and Goharian N (2004). Fusion of effective retrieval strategies in the same information retrieval system. J Am Soc Inf Sci Technol 55(10): 859–868 CrossRefGoogle Scholar
  6. Callan JP, Lu Z, Croft WB (1995) Searching distributed collections with inference networks. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 21–28Google Scholar
  7. Das-Gupta P, Katzer J (1983) A study of the overlap among document representations. In: SIGIR ’83: Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press. New York, NY, USA, pp 106–114Google Scholar
  8. Dietterich TG (2000). Ensemble methods in machine learning. Lecture Notes Comput Sci 1857: 1–15 CrossRefGoogle Scholar
  9. Fox EA, Shaw JA (1994) Combination of multiple searches. In: Proceedings of the 2nd text Retrieval conference (TREC-2), national institute of standards and technology special publication 500-215. pp 243–252Google Scholar
  10. Giacinto G and Roli F (2001). Dynamic classifier selection based on multiple classifier behaviour. Pattern Recogn 34(9): 1879–1881 MATHCrossRefGoogle Scholar
  11. Harman D (1993) Overview of the first text retrieval conference (TREC-1). In: SIGIR ’93: proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 36–47Google Scholar
  12. Howe AE and Dreilinger D (1997). SavvySearch: a metasearch engine that learns which search engines to query.. AI Mag 18(2): 19–25 Google Scholar
  13. Larkey LS, Connell ME, Callan J (2000) Collection selection and results merging with topically organized U.S. patents and TREC data. In: CIKM ’00: proceedings of the ninth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 282–289Google Scholar
  14. Lee JH (1997). Analyses of multiple evidence combination. SIGIR Forum 31(SI): 267–276 CrossRefGoogle Scholar
  15. Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 427–433Google Scholar
  16. Montague M, Aslam JA (2002) Condorcet fusion for improved retrieval. In: CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 538–548Google Scholar
  17. Mur A, Peng L, Collier R, Lillis D, Toolan F, Dunnion J (2005) A HOTAIR scalability model. In: Proceedings of the 16th irish conference on artificial intelligence and cognitive science (AICS 2005). University of Ulster. Portstewart, Northern Ireland, pp 359–368Google Scholar
  18. Peng L, Collier R, Mur A, Lillis D, Toolan F, Dunnion J (2005) A self-configuring agent-based document indexing system. In: Proceedings of the 4th international central and eastern european conference on multi-agent systems (CEEMAS 2005). Springer-Verlag GmbH, Budapest, Hungary,Google Scholar
  19. Powell AL, French JC, Callan J, Connell M, Viles CL (2000) The impact of database selection on distributed searching. In: SIGIR ’00: proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 232–239Google Scholar
  20. Rasolofo Y, Abbaci F, Savoy J (2001) Approaches to collection selection and results merging for distributed information retrieval. In: CIKM ’01: proceedings of the tenth international conference on Information and knowledge management. ACM Press. New York, NY, USA, pp 191–198Google Scholar
  21. Salton G, Fox EA and Wu H (1983). Extended boolean information retrieval. Commun ACM 26(11): 1022–1036 MATHCrossRefMathSciNetGoogle Scholar
  22. Salton G and Lesk ME (1968). Computer evaluation of indexing and text processing. J ACM 15(1): 8–36 MATHCrossRefGoogle Scholar
  23. Saracevic T and Kantor P (1988). A study of information seeking and retrieving. III. Searchers, searches and overlap. J Am Soc Inform Sci 39(3): 197–216 CrossRefGoogle Scholar
  24. Selberg E, Etzioni O (1997) The metacrawler architecture for resource aggregation on the web. IEEE Expert (January–February): 11–14Google Scholar
  25. Si L, Callan J (2002) Using sampled data and regression to merge search engine results. In: SIGIR ’02: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 19–26Google Scholar
  26. Vogt CC and Cottrell GW (1999). Fusion via a linear combination of scores. Inform Retrieval 1(3): 151–173 CrossRefGoogle Scholar
  27. Voorhees EM, Gupta NK, Johnson-Laird B (1994) The collection fusion problem. In: Proceedings of the third text retrieval conference (TREC-3). pp 95–104Google Scholar
  28. Voorhees EM, Gupta NK, Johnson-Laird B (1995) Learning collection fusion strategies. In: SIGIR ’95: proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press. New York, NY, USA, pp 172–179Google Scholar
  29. Voorhees EM, Tong RM (1997) Multiple search engines in database merging. In: Proceedings of the second ACM international conference on digital libraries. ACM Press, Philadelphia, Pa, New York, pp 93–102Google Scholar
  30. Wu S, Crestani F (2002) Data fusion with estimated weights. In: CIKM ’02: Proceedings of the eleventh international conference on information and knowledge management. ACM Press. New York, NY, USA, pp 648–651Google Scholar
  31. Wu S, Crestani F (2004) Shadow document methods of results merging. In: SAC ’04: proceedings of the 2004 ACM symposium on applied computing. ACM Press. New York, NY, USA, pp 1067–1072Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • D. Lillis
    • 1
  • F. Toolan
    • 1
  • A. Mur
    • 1
  • L. Peng
    • 1
  • R. Collier
    • 1
  • J. Dunnion
    • 1
  1. 1.School of Computer Science and InformaticsUniversity College DublinDublinIreland

Personalised recommendations