Modeling the Score Distributions of Relevant and Non-relevant Documents

  • Evangelos Kanoulas
  • Virgil Pavlu
  • Keshi Dai
  • Javed A. Aslam
Conference paper

DOI: 10.1007/978-3-642-04417-5_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5766)
Cite this paper as:
Kanoulas E., Pavlu V., Dai K., Aslam J.A. (2009) Modeling the Score Distributions of Relevant and Non-relevant Documents. In: Azzopardi L. et al. (eds) Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg

Abstract

Empirical modeling of the score distributions associated with retrieved documents is an essential task for many retrieval applications. In this work, we propose modeling the relevant documents’ scores by a mixture of Gaussians and modeling the non-relevant scores by a Gamma distribution. Applying variational inference we automatically trade-off the goodness-of-fit with the complexity of the model. We test our model on traditional retrieval functions and actual search engines submitted to TREC. We demonstrate the utility of our model in inferring precision-recall curves. In all experiments our model outperforms the dominant exponential-Gaussian model.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Evangelos Kanoulas
    • 1
  • Virgil Pavlu
    • 1
  • Keshi Dai
    • 1
  • Javed A. Aslam
    • 1
  1. 1.College of Computer and Information ScienceNortheastern UniversityBostonUSA

Personalised recommendations