Skip to main content

Divergence-from-Randomness Models

  • Reference work entry
  • First Online:
  • 11 Accesses

Synonyms

Deviation from randomness

Definition

Divergence-from-randomness (DFR) information retrieval models are term-document matching functions that are obtained by the product of two divergence functions. An example of DFR function is that related to Jensen’s information of two probability distributions [9, pp. 26–28]:

$$ \sum_i{I}_1\left({\hat{p}}_i^{+}||{\hat{p}}_i\right).{I}_2\left({\hat{p}}_i^{+}||{\hat{p}}_i\right) $$

where \( {I}_1\left({\hat{p}}_i^{+}||{\hat{p}}_i\right)={\hat{p}}_i^{+}-{\hat{p}}_i=\varDelta {\hat{p}}_i\ \mathrm{and}\\ {I}_2\left({\hat{p}}_i^{+}||{\hat{p}}_i\right)={ \log}_2\frac{{\hat{p}}_i+\varDelta {\hat{p}}_i}{{\hat{p}}_i} \).

The DFR generalizes the Jensen’s information as follows:

$$ \sum_i{I}_1\left({\hat{p}}_i^{+}||{\hat{p}}_i\right).{I}_2\left({\hat{p}}_i^{+}||{p}_i\right) $$

where

  • p is a prior probability density function of terms (or documents) in the collection.

  • \( \hat{p} \) is the frequency of the term in a document (or in a subset of documents).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Amati G. Frequentist and Bayesian approach to information retrieval. In: Proceedings of the 28th European Conference on IR Research; 2005. p. 13–24.

    Google Scholar 

  2. Amati G, Carpineto C, Romano G. Query difficulty, robustness, and selective application of query expansion. In: Proceedings of the 26th European Conference on IR Research; 2004. p. 127–37.

    Google Scholar 

  3. Amati G, Van Rijsbergen CJ. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst. 2002;20(4):357–89.

    Article  Google Scholar 

  4. Gärdenfors P. Knowledge in flux. MIT; 1988.

    Google Scholar 

  5. Gaussier E, Clinchant S. The BNB distribution for text modeling. In: ECIR, lecture notes in computer science. Springer; 2008.

    Google Scholar 

  6. Good IJ. A casual calculus I. Br J Phil Sci. 1961;11(44):305–18.

    Article  Google Scholar 

  7. Harter SP. A probabilistic approach to automatic keyword indexing. PhD thesis, Thesis No. T25146. Graduate Library, The University of Chicago; 1974.

    Google Scholar 

  8. He I, Ounis B. On setting the hyper-parameters of the term frequency normalisation for information retrieval. ACM Trans Inf Syst. 2007;25(3).

    Google Scholar 

  9. Kullback S. Information theory and statistics. New York: Wiley; 1959.

    MATH  Google Scholar 

  10. Ounis I, Amati G, Plachouras V, He B, Macdonald C, Johnson D. Terrier information retrieval platform. In: Proceedings of the 27th European Conference on IR Research; 2005. p. 517–9.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giambattista Amati .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Amati, G. (2018). Divergence-from-Randomness Models. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_924

Download citation

Publish with us

Policies and ethics