Advertisement

Information Retrieval

, Volume 1, Issue 3, pp 151–173 | Cite as

Fusion Via a Linear Combination of Scores

  • Christopher C. Vogt
  • Garrison W. Cottrell
Article

Abstract

We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d—the difference between the average score on relevant documents and the average score on nonrelevant documents—as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of weights which generalize well to new documents. We describe a number of experiments involving large numbers of different IR systems which support these findings.

linear combination fusion neural networks routing performance evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bartell BT, Cottrell GW and Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: Croft WB and van Rijsbergen C, eds. SIGIR 94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag, Dublin, pp. 173–181.Google Scholar
  2. Belkin N, Kantor P, Fox E and Shaw J (1995) Combining evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.Google Scholar
  3. Boughanem M, Layaida R and Caron A (1993) A neural network model for documentary base self-organising and querying. In Proceedings of the Fifth International Conference on Computing and Information, Sudbury, Ontario, pp. 512–518.Google Scholar
  4. Crestani F (1994) Comparing neural and probabilistic relevance feedback in an interactive information retrieval system. In 1994 IEEE International Conference on Neural Networks. Vol. 5, pp. 3426–3430.Google Scholar
  5. Deerwester S, Dumais ST, Furnas GW, Landauer TK and Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407.Google Scholar
  6. Diamond T (1998) Information retrieval using dynamic evidence combination. Unpublished Ph.D. Thesis proposal, School of Information Studies, Syracuse University.Google Scholar
  7. Egan JP (1975) Signal Detection Theory and ROC-Analysis. Academic Press.Google Scholar
  8. Guttman L (1978) What is not what in statistics. The Statistician, 26:81–107.Google Scholar
  9. Harman D, ed. (1995) The Third Text REtrieval Conference (TREC-3), Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500–226.Google Scholar
  10. Harman DK, ed. (1997) The Fifth Text REtrieval Conference (TREC-5), Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500–238.Google Scholar
  11. Hertz J, Krogh A and Palmer RG (1991) Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA.Google Scholar
  12. Jordan MI and Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.Google Scholar
  13. Kantor PB (1995) Decision level data fusion for routing of documents in the TREC3 context: A best case analysis of worst case results. In: Harman, 1995.Google Scholar
  14. Knaus D, Mittendorf E and Schäuble P (1995) Improving a basic retrieval method by links and passage level evidence. In: Harman, 1995.Google Scholar
  15. Lee JH (1997) Analyses of multiple evidence combination. In: Belkin NJ, Narasimhalu AD and Willett P, eds. SIGIR 97: Proceedings of the Twentieth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Philadelphia, pp. 267–276.Google Scholar
  16. Ng KB (1998) An Investigation of the Conditions for Effective Data Fusion in Information Retrieval. Ph.D. Thesis, School of Communication, Information, and Library Studies, Rutgers University.Google Scholar
  17. Press WH, Teukolsky SA, Vettering WT and Flannery BP (1995) Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.Google Scholar
  18. Selberg E and Etzioni O (1996) Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World Wide Web Conference, pp. 195–208.Google Scholar
  19. Shaw J and Fox E (1995) Combination of multiple searches. In: Harman, 1995.Google Scholar
  20. van Rijsbergen C (1979) Information Retrieval. Butterworths, London.Google Scholar
  21. Vogt C, Cottrell G, Belew R and Bartell B (1997) Using relevance to train a linear mixture of experts. In: Harman, 1997, pp. 503–515.Google Scholar
  22. Vogt C, Cottrell G, Belew R and Bartell B (1999) User lenses—achieving 100% precision on frequently asked questions. In Seventh International Conference on User Modeling, Banff, Canada, pp. 87–96.Google Scholar
  23. Vogt CC and Cottrell GW (1998a) Predicting the performance of linearly combined IR systems. In: Croft WB, van Rijsbergen K and Wilkinson R, eds. SIGIR 98: Proceedings of the Twenty First Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Melbourne, pp. 190–196.Google Scholar
  24. Vogt CC and Cottrell GW (1998b). Using d' to optimize rankings. Technical Report, CS98–601, U.C. San Diego, CSE Department.Google Scholar
  25. Wong S, Cai Y and Yao Y (1993) Computation of term associations by a neural network. In: Korfhage R, Rasmussen E and Willett P, eds. SIGIR 93: Proceedings of the Sixteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Pittsburgh, PA, pp. 107–115.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Christopher C. Vogt
  • Garrison W. Cottrell

There are no affiliations available

Personalised recommendations