Tree-Based Indexing

Manolopoulos, Yannis; Theodoridis, Yannis; Tsotras, Vassilis J.

doi:10.1007/978-1-4614-8265-9_755

Tree-Based Indexing

Yannis Manolopoulos³,
Yannis Theodoridis⁴ &
Vassilis J. Tsotras⁵

Reference work entry
First Online: 01 January 2018

132 Accesses
2 Citations

Download reference work entry PDF

Synonyms

B+-tree; Index Sequential Access Method (ISAM); R-tree

Definition

Consider a relation R with some numeric attribute A taking values over an (ordered) domain D. A range query retrieves all tuples in R whose attribute A has values in the interval [low, high]. That is, low ≤ R.A ≤ high. To enable fast processing of range selection queries, an access method that maintains order is needed. Such an index has the form of a tree, where each node corresponds to a page. Leaf nodes contain (or index) the actual values of A, while index nodes provide ordered access to the nodes underneath. Examples of tree-based indexing are the B+-tree and the R-tree (for single- and multidimensional ranges, respectively).

Key Points

A major performance goal of a database management system is to minimize the number of I/Os (i.e., blocks or pages transferred) between the disk and main memory. One way to achieve this goal is to minimize the number of I/Os when answering a query. Consider, for example, relation Employee (ssn, name, salary, dept., address); the query, “Find the employees who reside in Santa Monica,” references only a fraction of Employee records. It would be very inefficient to have the database system sequentially read all the pages of the Employee file and check the address field of each employee record for the value “Santa Monica.” Instead the system should be able to locate the pages with “Santa Monica” employee records directly.

To allow such fast access, additional data structures called access methods (or indices) are designed per database file. The term search-key is used to identify the attribute(s) on which the access method is built. There are two fundamental access methods, namely, tree-based and hash-based indexing. They differ on the kind of queries that they can efficiently address. Tree-based indexing maintains the order of the search-key values. It is thus applicable to attributes that are numeric, and hence it can be used to address range queries (and also equality queries, when the range query interval is reduced to one value). Hash-based indexing on the other hand does not assume any ordering; rather it is based on mapping the search-key values on a collection of buckets. Therefore it can only address equality (or membership) queries. Since a tree-based index maintains the order of the indexed values in its leaf pages, a (one-dimensional) range query is implemented as a search for the leaf page with the lower value of the range interval, followed by the accessing of sibling pages until a page that contains the higher value of the range interval is reached.

Tree-based indices are further categorized by whether their search-key ordering is the same with the file’s logical order (if any). Note that a file may or may not be ordered according to the value of an (a sequence of) attribute(s). A file stored without any logical order is called an unordered file or heap. If the search-key of a tree-based index is the same as the ordering attribute of a (ordered) file, then the index is called primary. Since such an index also clusters the values of the file, the term clustering index has also been used. The search-key of a primary index is usually the file’s primary key; however this is not necessary. An index built on any non-ordering attribute of a file is called secondary. A relation can have several indices, on different search-keys; among them, at most one is primary (clustering) index, and the rest are secondary ones (obviously, a file can have at most a single logical order since it is physically stored once).

Historically, the first tree-based index used in relational databases was the ISAM file (a static tree). Currently, the most widely used tree-based index is the B+-tree which is a dynamic structure (i.e., its size increases or decreases as the size of the indexed file changes by record insertions/deletions). As with any data structure, the performance of an access method is characterized by three costs, namely, query time, update time, and space. Query time corresponds to the number of page accesses (I/Os) needed to answer a query. Update time counts the number of I/Os needed for updating the method when a record is updated, inserted, or deleted. Space measures the number of pages occupied by the method’s data structures. The B+-tree uses logarithmic update and query costs, while its space requirements are linear to the size of the indexed file.

Cross-References

Author information

Authors and Affiliations

Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Piraeus, Piraeus, Greece
Yannis Theodoridis
University of California-Riverside, Riverside, CA, USA
Vassilis J. Tsotras

Authors

Yannis Manolopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Theodoridis
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis J. Tsotras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannis Manolopoulos .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

University of California-Riverside, Riverside, MA, USA
Vassilis J. Tsotras

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Manolopoulos, Y., Theodoridis, Y., Tsotras, V.J. (2018). Tree-Based Indexing. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_755

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_755
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Tree-Based Indexing

Synonyms

Definition

Key Points

Cross-References

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Synonyms

Definition

Key Points

Cross-References

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation