Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Index Creation and File Structures

  • Steven M. Beitzel
  • Eric C. Jensen
  • Ophir Frieder
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_944

Synonyms

Indexing; Inverted indexes

Definition

A core element of modern information retrieval systems is the document index. The index is a set of data structures that are constructed from a source document collection with the goal of allowing an information retrieval system to provide timely, efficient response to search queries. The process of index creation typically involves reading and processing the source document collection, parsing the text in each individual document and extracting the necessary features to allow for retrieving and ranking that document in response to a user query. Additionally, indexing systems often use dimension reduction, compression, and other related techniques to drastically reduce the storage footprint of the source collection in its indexed form. Document indexes are frequently stored in a set of file structures that are conducive to rapid retrieval and ranking by an information retrieval system in response to a query.

Historical Background

As...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Grossman D, Frieder O. Information retrieval: algorithms and heuristics. 2nd ed. Dordrecht: Springer; 2004.zbMATHCrossRefGoogle Scholar
  2. 2.
    The size of the World Wide Web: http://www.worldwidewebsize.com. Retrieved Mar 2008.
  3. 3.
    Witten IH, Moffat A, Bell TC. Managing gigabytes: compressing and indexing documents and images. 2nd ed. San Francisco: Morgan Kaufmann; 1999.zbMATHGoogle Scholar
  4. 4.
    Zobel J, Moffat A. Inverted files for text search engines. ACM Comput Surv. 2007;38(2):6.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Steven M. Beitzel
    • 1
  • Eric C. Jensen
    • 2
  • Ophir Frieder
    • 3
  1. 1.Telcordia TechnologiesPiscatawayUSA
  2. 2.Twitter, Inc.San FranciscoUSA
  3. 3.Georgetown UniversityWashingtonUSA

Section editors and affiliations

  • Edie Rasmussen
    • 1
  1. 1.Library, Archival & Inf. StudiesThe Univ. of British ColumbiaVancouverCanada