Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Term Statistics for Structured Text Retrieval

  • Jaap Kamps
  • Mounia Lalmas
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_412

Synonyms

Inverse element frequency; Within-element term frequency

Definition

Classical ranking algorithms in information retrieval make use of term statistics, the most common (and basic) ones being within-document term frequency, tf, and document frequency, df. tf is the number of occurrences of a term in a document and is used to reflect how well a term captures the topic of a document, whereas df is the number of documents in which a term appears and is used to reflect how well a term discriminates between relevant and non-relevant documents. df is also commonly referred to as inverse document frequency, idf, since it is inversely related to the importance of a term. Both tf and idf are obtained at indexing time. Ranking algorithms for structured text retrieval, and more precisely XML retrieval, require similar terms statistics, but with respect to elements.

Key Points

To calculate term statistics for elements, one could simply replace documents by elements and calculate so-called...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Clarke CLA. Controlling overlap in content-oriented XML retrieval. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2005. p. 441–48.Google Scholar
  2. 2.
    Grabs G, Schek H.-S. ETH Zürich at INEX: flexible information retrieval from XML with PowerDB-XML. In: Proceedings of the 1st International Workshop of the Initiative for the Evaluation of XML Retrieval; 2002. p. 141–8.Google Scholar
  3. 3.
    Mass Y, Mandelbrod M. Component ranking and automatic query refinement for XML retrieval. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2005. p. 73–84.CrossRefGoogle Scholar
  4. 4.
    Sigurbjörnsson B., Kamps J., de Rijke M. An element-based approach to XML retrieval. In: Proceedings of the 2nd International Workshop of the Initiative for the Evaluation of XML Retrieval; 2003. p. 19–26.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.Yahoo! Inc.LondonUK

Section editors and affiliations

  • Jaap Kamps
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands