Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification

  • Hengzhi Wu
  • Thomas Roelleke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5766)

Abstract

Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K), where tf is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle between independent and subsumed events. We refer to this new assumption as semi-subsumed. While this finding of a well-defined probabilistic assumption solves the probabilistic interpretation of the BM25 TF quantification, it is also of wider impact regarding probability theory.

Keywords

Term Frequency Term Probability Probabilistic Semantic Probabilistic Assumption Term Occurrence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60, 503–520 (2004)CrossRefGoogle Scholar
  2. 2.
    Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: ACM SIGIR, pp. 232–241 (1994)Google Scholar
  3. 3.
    Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Large test collection experiments on an operational interactive system: Okapi at TREC. IP&M 31, 345–360 (1995)Google Scholar
  4. 4.
    Roelleke, T., Tsikrika, T., Kazai, G.: A general matrix framework for modelling information retrieval. IP&M, Special Issue on Theory in Information Retrieval 42(1) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hengzhi Wu
    • 1
  • Thomas Roelleke
    • 1
  1. 1.Queen Mary, University of LondonUK

Personalised recommendations