Fusion Via a Linear Combination of Scores

Vogt, Christopher C.; Cottrell, Garrison W.

doi:10.1023/A:1009980820262

Fusion Via a Linear Combination of Scores

Published: October 1999

Volume 1, pages 151–173, (1999)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Fusion Via a Linear Combination of Scores

Download PDF

Christopher C. Vogt &
Garrison W. Cottrell

396 Accesses
205 Citations
Explore all metrics

Abstract

We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d—the difference between the average score on relevant documents and the average score on nonrelevant documents—as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of weights which generalize well to new documents. We describe a number of experiments involving large numbers of different IR systems which support these findings.

References

Bartell BT, Cottrell GW and Belew RK (1994) Automatic combination of multiple ranked retrieval systems. In: Croft WB and van Rijsbergen C, eds. SIGIR 94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag, Dublin, pp. 173–181.
Google Scholar
Belkin N, Kantor P, Fox E and Shaw J (1995) Combining evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431–448.
Google Scholar
Boughanem M, Layaida R and Caron A (1993) A neural network model for documentary base self-organising and querying. In Proceedings of the Fifth International Conference on Computing and Information, Sudbury, Ontario, pp. 512–518.
Crestani F (1994) Comparing neural and probabilistic relevance feedback in an interactive information retrieval system. In 1994 IEEE International Conference on Neural Networks. Vol. 5, pp. 3426–3430.
Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK and Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407.
Google Scholar
Diamond T (1998) Information retrieval using dynamic evidence combination. Unpublished Ph.D. Thesis proposal, School of Information Studies, Syracuse University.
Egan JP (1975) Signal Detection Theory and ROC-Analysis. Academic Press.
Guttman L (1978) What is not what in statistics. The Statistician, 26:81–107.
Google Scholar
Harman D, ed. (1995) The Third Text REtrieval Conference (TREC-3), Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500–226.
Harman DK, ed. (1997) The Fifth Text REtrieval Conference (TREC-5), Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500–238.
Hertz J, Krogh A and Palmer RG (1991) Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA.
Google Scholar
Jordan MI and Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.
Google Scholar
Kantor PB (1995) Decision level data fusion for routing of documents in the TREC3 context: A best case analysis of worst case results. In: Harman, 1995.
Knaus D, Mittendorf E and Schäuble P (1995) Improving a basic retrieval method by links and passage level evidence. In: Harman, 1995.
Lee JH (1997) Analyses of multiple evidence combination. In: Belkin NJ, Narasimhalu AD and Willett P, eds. SIGIR 97: Proceedings of the Twentieth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Philadelphia, pp. 267–276.
Google Scholar
Ng KB (1998) An Investigation of the Conditions for Effective Data Fusion in Information Retrieval. Ph.D. Thesis, School of Communication, Information, and Library Studies, Rutgers University.
Press WH, Teukolsky SA, Vettering WT and Flannery BP (1995) Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.
Selberg E and Etzioni O (1996) Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World Wide Web Conference, pp. 195–208.
Shaw J and Fox E (1995) Combination of multiple searches. In: Harman, 1995.
van Rijsbergen C (1979) Information Retrieval. Butterworths, London.
Google Scholar
Vogt C, Cottrell G, Belew R and Bartell B (1997) Using relevance to train a linear mixture of experts. In: Harman, 1997, pp. 503–515.
Vogt C, Cottrell G, Belew R and Bartell B (1999) User lenses—achieving 100% precision on frequently asked questions. In Seventh International Conference on User Modeling, Banff, Canada, pp. 87–96.
Vogt CC and Cottrell GW (1998a) Predicting the performance of linearly combined IR systems. In: Croft WB, van Rijsbergen K and Wilkinson R, eds. SIGIR 98: Proceedings of the Twenty First Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Melbourne, pp. 190–196.
Google Scholar
Vogt CC and Cottrell GW (1998b). Using d' to optimize rankings. Technical Report, CS98–601, U.C. San Diego, CSE Department.
Google Scholar
Wong S, Cai Y and Yao Y (1993) Computation of term associations by a neural network. In: Korfhage R, Rasmussen E and Willett P, eds. SIGIR 93: Proceedings of the Sixteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM Press, Pittsburgh, PA, pp. 107–115.
Google Scholar

Download references

Authors

Christopher C. Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Garrison W. Cottrell
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vogt, C.C., Cottrell, G.W. Fusion Via a Linear Combination of Scores. Information Retrieval 1, 151–173 (1999). https://doi.org/10.1023/A:1009980820262

Download citation

Issue Date: October 1999
DOI: https://doi.org/10.1023/A:1009980820262

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fusion Via a Linear Combination of Scores

Abstract

Article PDF

Similar content being viewed by others

Inexpensive and Effective Data Fusion Methods with Performance Weights

Data Fusion Methods with Graded Relevance Judgment

Streamlining Evaluation with ir-measures

References

Rights and permissions

About this article

Cite this article

Navigation

Fusion Via a Linear Combination of Scores

Abstract

Article PDF

Similar content being viewed by others

Inexpensive and Effective Data Fusion Methods with Performance Weights

Data Fusion Methods with Graded Relevance Judgment

Streamlining Evaluation with ir-measures

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation