Skip to main content

Score Distributions in Information Retrieval

  • Conference paper
Advances in Information Retrieval Theory (ICTIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5766))

Included in the following conference series:

Abstract

We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being “friendly” to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robertson, S.: On score distributions and relevance. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 40–51. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Nottelmann, H., Fuhr, N.: From uncertain inference to probability of relevance for advanced IR applications. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 235–250. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Callan, J.: Distributed information retrieval. In: Advances Information Retrieval: Recent Research from the CIIR, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  4. Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Proceedings SIGIR 1995, pp. 246–254. ACM Press, New York (1995)

    Google Scholar 

  5. Oard, D.W., Hedin, B., Tomlinson, S., Baron, J.R.: Overview of the TREC 2008 legal track. In: Proceedings TREC 2008 (2009)

    Google Scholar 

  6. Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings SIGIR 1997, pp. 267–276. ACM Press, New York (1997)

    Google Scholar 

  7. Manmatha, R., Rath, T.M., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proceedings SIGIR 2001, pp. 267–275. ACM Press, New York (2001)

    Google Scholar 

  8. Fernández, M., Vallet, D., Castells, P.: Using historical data to enhance rank aggregation. In: Proceedings SIGIR 2006, pp. 643–644. ACM Press, New York (2006)

    Google Scholar 

  9. Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, half-life, and threshold optimization for adaptive document filtering. In: Proceeding TREC 2000 (2000)

    Google Scholar 

  10. Zhang, Y., Callan, J.: Maximum likelihood estimation for filtering thresholds. In: Proceedings SIGIR 2001, pp. 294–302. ACM Press, New York (2001)

    Google Scholar 

  11. Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty detection, and named-page finding. In: Proceedings TREC 2002 (2002)

    Google Scholar 

  12. Arampatzis, A., Robertson, S., Kamps, J.: Where to stop reading a ranked list? threshold optimization using truncated score distributions. In: Proceedings SIGIR 2009. ACM Press, New York (2009)

    Google Scholar 

  13. Swets, J.A.: Information retrieval systems. Science 141(3577), 245–250 (1963)

    Article  Google Scholar 

  14. Swets, J.A.: Effectiveness of information retrieval methods. American Documentation 20, 72–89 (1969)

    Article  Google Scholar 

  15. Bookstein, A.: When the most “pertinent” document should not be retrieved – an analysis of the Swets model. Information Processing and Management 13(6), 377–383 (1977)

    Article  MATH  Google Scholar 

  16. Baumgarten, C.: A probabilitstic solution to the selection and fusion problem in distributed information retrieval. In: Proceedings SIGIR 1999, pp. 246–253. ACM Press, New York (1999)

    Google Scholar 

  17. Arampatzis, A., van Hameren, A.: The score-distributional threshold optimization for adaptive binary classification tasks. In: Proceedings SIGIR 2001, pp. 285–293. ACM Press, New York (2001)

    Google Scholar 

  18. Fernández, M., Vallet, D., Castells, P.: Probabilistic score normalization for rank aggregation. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 553–556. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. van Rijsbergen, C.J.: Information Retrieval, Butterworth (1979)

    Google Scholar 

  20. Cooper, W.S.: Some inconsistencies and misnomers in probabilistic information retrieval. In: Proceedings SIGIR 1991, pp. 57–61. ACM Press, New York (1991)

    Google Scholar 

  21. Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic regression. In: Proceedings SIGIR 1992, pp. 198–210. ACM Press, New York (1992)

    Google Scholar 

  22. Arampatzis, A.: Unbiased s-d threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In: Proceedings TREC 2001 (2002)

    Google Scholar 

  23. Robertson, S.E.: The parametric description of retrieval tests. part 1: The basic parameters. Journal of Documentation 25(1), 1–27 (1969)

    Google Scholar 

  24. Robertson, S.E., Bovey, J.D.: Statistical problems in the application of probabilistic models to information retrieval. Technical Report Report No. 5739, BLR&DD (1982)

    Google Scholar 

  25. Arampatzis, A., Kamps, J.: Where to stop reading a ranked list? In: Proceedings TREC 2008 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Arampatzis, A., Robertson, S., Kamps, J. (2009). Score Distributions in Information Retrieval. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04417-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04416-8

  • Online ISBN: 978-3-642-04417-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics