Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Inverse Document Frequency

  • Iadh Ounis
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_933

Synonyms

IDF

Definition

The inverse document frequency (IDF) is a statistical weight used for measuring the importance of a term in a text document collection. The document frequency DF of a term is defined by the number of documents in which a term appears.

Key Points

Karen Sparck-Jones first proposed that terms with low document frequency are more valuable than terms with high document frequency during retrieval [2]. In other words, the underlying idea of IDF is that the more frequently the term appears in the collection, the less informative the term is.

In its simplest form, the IDF weight of a term is assigned as follows [ 3]:
$$ \mathrm{IDF}={ \log}_2\frac{\mathrm{N}}{\mathrm{DF}} $$
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Robertson SE, Walker S. On relevance weights with little relevance information. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1997. p. 16–24.Google Scholar
  2. 2.
    Sparck-Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc. 1972;28(1):11–20.CrossRefGoogle Scholar
  3. 3.
    Sparck-Jones K. Index term weighting. Inf Storage Retr. 1973;9(11):619–33.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of GlasgowGlasgowUK

Section editors and affiliations

  • Giambattista Amati
    • 1
  1. 1.Fondazione Ugo BordoniRomeItaly