Towards a Better Understanding of the Relationship between Probabilistic Models in IR

  • Robin Aly
  • Thomas Demeester
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6931)

Abstract

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.

Keywords

Information Retrieval Language Model Score Function Sample Space Ranking Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Crestani, F., Lalmas, M., Rijsbergen, C.J.V., Campbell, I.: Is this document relevant?.probably: a survey of probabilistic models in information retrieval. ACM Comput. Surv. 30(4), 528–552 (1998) ISSN: 0360-0300CrossRefGoogle Scholar
  2. 2.
    Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente, Enschede (January 2001)Google Scholar
  4. 4.
    Hosmer, D.W., Lemeshow, S.: Applied logistic regression. Wiley-Interscience Publication, Hoboken (September 2000) ISBN 0471356328MATHCrossRefGoogle Scholar
  5. 5.
    Lafferty, J., Zhai, C.: Probabilistic Relevance Models Based on Document and Query Generation, ch. 1, pp. 1–10. Kluwer Academic Pub., Dordrecht (2003)MATHGoogle Scholar
  6. 6.
    Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Language Modeling for Information Retrieval, pp. 11–56. Kluwer Academic Publishers, Dordrecht (2003)CrossRefGoogle Scholar
  7. 7.
    Luk, R.W.P.: On event space and rank equivalence between probabilistic retrieval models. Information Retrieval 11(6), 539–561 (2008), ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-008-9062-zCrossRefGoogle Scholar
  8. 8.
    Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing, 1st edn. The MIT Press, Cambridge (June 1999) ISBN 0-26213-360-1Google Scholar
  9. 9.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM, New York (1998) ISBN 1-58113-015-5, doi:10.1145/290941.291008Google Scholar
  10. 10.
    Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33, 294–304 (1977)CrossRefGoogle Scholar
  11. 11.
    Robertson, S.E.: On event spaces and probabilistic models in information retrieval. Information Retrieval 8(2), 319–329 (2005) ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-005-5665-9CrossRefGoogle Scholar
  12. 12.
    Robertson, S.E., Spärck-Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976), doi:10.1002/asi.4630270302CrossRefGoogle Scholar
  13. 13.
    Robertson, S.E., Maron, M.E., Cooper, W.S.: Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development 1(1), 1–21 (1982)Google Scholar
  14. 14.
    Roelleke, T., Wang, J.: A parallel derivation of probabilistic information retrieval models. In: SIGIR 2006, pp. 107–114. ACM, New York (2006) ISBN 1-59593-369-7, doi:10.1145/1148170.1148192Google Scholar
  15. 15.
    Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: SIGIR 2008, pp. 435–442. ACM, New York (2008) ISBN 978-1-60558-164-4, doi:10.1145/1390334.1390409Google Scholar
  16. 16.
    Spärck-Jones, K., Robertson, S.E., Zaragoza, H., Hiemstra, D.: Language modelling and relevance. In: Language Modelling for Information Retrieval, pp. 57–71. Kluwer, Dordrecht (2003)CrossRefGoogle Scholar
  17. 17.
    Yan, R.: Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval. PhD thesis, Canegie Mellon University (2006)Google Scholar
  18. 18.
    Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008) ISSN 1554-0669, doi:10.1561/1500000008CrossRefGoogle Scholar
  19. 19.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004) ISSN 1046-8188, doi:10.1145/984321.984322CrossRefGoogle Scholar
  20. 20.
    Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manage. 42(1), 31–55 (2006) ISSN 0306-4573, doi:10.1016/j.ipm.2004.11.003MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Robin Aly
    • 1
  • Thomas Demeester
    • 2
  1. 1.University of TwenteThe Netherlands
  2. 2.Ghent UniversityBelgium

Personalised recommendations