Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Text Indexing and Retrieval

  • Haoda Huang
  • Benyu Zhang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_417

Synonyms

Document index and retrieval

Definition

Text indexing is a preprocessing step for text retrieval. During the text indexing process, texts are collected, parsed and stored to facilitate fast and accurate text retrieval. Text retrieval (also called document retrieval) is a branch of information retrieval in which the information is stored primarily in the form of text. Text retrieval is defined as the matching of some stated user query against a set of texts. As the result of text retrieval, texts are ranked and presented to the user according to their relevance with user query. User queries can range from a few words to multi-sentence full descriptions, which represent the user’s information need.

Historical Background

Text indexing is the most fundamental part of a retrieval system. Over the past two decades, the corpus size of typical retrieval system has increased dramatically. The Text REtrieval Conference (TREC) (http://trec.nist.gov/) that started in 1992 only provides...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.zbMATHCrossRefGoogle Scholar
  2. 2.
    Metzler D, Croft WB. A Markov random field model for term dependencies. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2005. p. 472–9.Google Scholar
  3. 3.
    Metzler DA. Beyond bags of words: effectively modeling dependence and features in information retrieval, Ph.D. thesis, University of Massachussetts, 2007.Google Scholar
  4. 4.
    Ponte J., Croft WB. A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 275–81.Google Scholar
  5. 5.
    Ricardo BY, Berthier R-N. Modern information retrieval. New York: Addison Wesley Longman; 1999.Google Scholar
  6. 6.
    Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2001. p. 334–42.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina

Section editors and affiliations

  • Zheng Chen
    • 1
  1. 1.Microsoft Research AsiaMicrosoft CorporationBeijingChina