Advertisement

Extending Probabilistic Data Fusion Using Sliding Windows

  • David Lillis
  • Fergus Toolan
  • Rem Collier
  • John Dunnion
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Recent developments in the field of data fusion have seen a focus on techniques that use training queries to estimate the probability that various documents are relevant to a given query and use that information to assign scores to those documents on which they are subsequently ranked. This paper introduces SlideFuse, which builds on these techniques, introducing a sliding window in order to compensate for situations where little relevance information is available to aid in the estimation of probabilities.

SlideFuse is shown to perform favourably in comparison with CombMNZ, ProbFuse and SegFuse. CombMNZ is the standard baseline technique against which data fusion algorithms are compared whereas ProbFuse and SegFuse represent the state-of-the-art for probabilistic data fusion methods.

Keywords

Information Retrieval Data Fusion Input System Mean Average Precision Relevance Judgment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bartell, B.T., Cottrell, G.W., Belew, R.K.: Automatic combination of multiple ranked retrieval systems. In: SIGIR 1994: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 173–181. Springer, New York (1994) Reference to show that it has long been demonstrated that fusion improves resultsGoogle Scholar
  2. 2.
    Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O., Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol. 55(10), 859–868 (2004)CrossRefGoogle Scholar
  3. 3.
    Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)CrossRefGoogle Scholar
  4. 4.
    Aslam, J.A., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: SIGIR 2000: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 379–381. ACM Press, New York (2000)CrossRefGoogle Scholar
  5. 5.
    Voorhees, E.M., Gupta, N.K., Johnson-Laird, B.: The collection fusion problem. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp. 95–104 (1994)Google Scholar
  6. 6.
    Lillis, D., Toolan, F., Collier, R., Dunnion, J.: ProbFuse: a probabilistic approach to data fusion. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 139–146. ACM Press, New York (2006)CrossRefGoogle Scholar
  7. 7.
    Shokouhi, M.: Segmentation of search engine results for effective data-fusion. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, Springer, Heidelberg (2007)Google Scholar
  8. 8.
    Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Proceedings of the 2nd Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pp. 243–252 (1994)Google Scholar
  9. 9.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: SIGIR 1995: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 21–28. ACM Press, New York (1995)CrossRefGoogle Scholar
  10. 10.
    Si, L., Callan, J.: Using sampled data and regression to merge search engine results. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19–26. ACM Press, New York (2002)CrossRefGoogle Scholar
  11. 11.
    Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: CIKM 2002: Proceedings of the eleventh international conference on Information and knowledge management, pp. 538–548. ACM Press, New York (2002)CrossRefGoogle Scholar
  12. 12.
    Lee, J.H.: Analyses of multiple evidence combination. SIGIR Forum 31(SI), 267–276 (1997)CrossRefGoogle Scholar
  13. 13.
    Voorhees, E.M., Gupta, N.K., Johnson-Laird, B.: Learning collection fusion strategies. In: SIGIR 1995: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 172–179. ACM Press, New York (1995)CrossRefGoogle Scholar
  14. 14.
    Aslam, J.A., Montague, M.: Models for metasearch. In: SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284. ACM Press, New York (2001)CrossRefGoogle Scholar
  15. 15.
    Craswell, N., Hawking, D., Thistlewaite, P.B.: Merging results from isolated search engines. In: Australasian Database Conference, Auckland, New Zealand, pp. 189–200 (1999)Google Scholar
  16. 16.
    Lawrence, S., Giles, C.L.: Inquirus, the NECI meta search engine. In: Seventh International World Wide Web Conference, Brisbane, Australia, pp. 95–105. Elsevier, Amsterdam (1998)Google Scholar
  17. 17.
    Gravano, L., Chang, K., Garcia-Molina, H., Paepcke, A.: Starts: Stanford protocol proposal for internet retrieval and search. Technical report, Stanford, CA, USA (1997)Google Scholar
  18. 18.
    Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Probabilistic data fusion on a large document collection. In: Proceedings of the 17th Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2006), Belfast, Northern Ireland, Queen’s University Belfast (2006)Google Scholar
  19. 19.
    Craswell, N., Hawking, D.: Overview of the TREC-2004 web track. In: Proceedings of the Thirteenth Text REtrieval Conference (TREC-2004) (2004)Google Scholar
  20. 20.
    Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 25–32. ACM Press, New York (2004)Google Scholar
  21. 21.
    Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a Very Large AltaVista Query Log. Technical Report 1998-014, Digital SRC (1998), http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1998-014.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Lillis
    • 1
  • Fergus Toolan
    • 2
  • Rem Collier
    • 1
  • John Dunnion
    • 1
  1. 1.School of Computer Science and InformaticsUniversity College Dublin 
  2. 2.Department of Computing ScienceGriffith College Dublin 

Personalised recommendations