Reference Work Entry

Encyclopedia of Database Systems

pp 1435-1438

Indexed Sequential Access Method

  • Alex DelisAffiliated withUniversity of Athens
  • , Vassilis J. TsotrasAffiliated withUniversity of California-Riverside

Synonyms

Indexed sequential file; ISAM file; ISAM

Definition

An indexed sequential access method is a static, hierarchical, disk-based index structure that enables both (single-dimensional) range and membership queries on an ordered data file. The records of the data file are stored in sequential order according to some data attribute(s). Since ISAM is static, it does not change its structure if records are added or deleted from the data file. Should new records be inserted into the data file, they are stored in an overflow area. Deleted records are removed from the file (leaving empty space).

Historical Background

Although transparent for the user of a DBMS, access methods play a key role in database performance. A major performance goal of a DBMS is to minimize the number of I/Os (i.e., blocks or pages transferred) between the disk and main memory. One way to achieve this goal is to minimize the number of I/Os when answering queries. Note that many queries reference only a small portion of the records in a database table. For example the query: “find the employees who reside in Santa Monica, CA” references only a fraction of the records in the Employee relation. It would be rather inefficient to have the database system sequentially read all the pages of the Employee file and check the residence field of each employee record for the name ‘Santa Monica’. Instead the system should be able to locate the pages with ‘Santa Monica’ employee records directly. To allow such fast access, additional disk-resident structures called indices (or access methods) are designed per database relation. One of the first such methods developed was the index sequential access method (ISAM). ISAM was developed at IBM in late 1960s [3] and it is essentially the predecessor to the widely used B+-tree index. A major difference between ISAM and the B+-tree [1] is that instead of overflowing pages, the B+-tree introduces page splitting. The ISAM was later replaced by IBMs virtual storage access method (VSAM) [4] which introduced the notion of splitting (data or index pages) when there is not enough space for inserting a new record.

Foundations

The ISAM structure contains three separate storage areas: the data file, the index file and the overflow area. For simplicity, assume that the data file is an Employee relation, ordered according to the social security number or ssn attribute. Moreover, assume that this relation is stored sequentially on the disk, following the logical order of the ssn attribute. If the Employee file has n records and one page can hold B Employee records, the total number of pages in this file is O(n ∕ B). Note that each file page is full of Employee records except possibly the last page. Moreover, given the sequential storage of the file, each page can easily access the next page of the file in ssn order (it is simply the next physical page on the disk).

A straightforward way to build an index on the Employee file is to create a new (much smaller) file that contains one representative record from each Employee file page. The records in this new file are of the form:  <search_value, ptr> where ptr is a pointer to an Employee file page (a page-id number uniquely identifying the page on the disk) and search_value is the smaller ssn recorded in that page. Since these records are smaller in size than the Employee file records, each page of the new file will contain many of them. If the Employee file is large, the new file will spread over a number of pages (this number is clearly bounded by O(n ∕ B 2)). However, since the new file is also an ordered file (it has two attributes and is ordered according to the search_value attribute) it can be indexed by another (even smaller) level of index pages, and so on. This process continues until the creation of an index layer that consists of a single page. As a result a multi-way, tree-structured index is created whose nodes correspond to pages (see Fig. 1). It is worth pointing out that all index pages from possibly multiple index levels are resident in the index file area of the ISAM.
https://static-content.springer.com/image/prt%3A978-0-387-39940-9%2F9/MediaObjects/978-0-387-39940-9_9_Part_Fig1-738_HTML.jpg
Indexed Sequential Access Method. Figure 1

An indexed sequential access method.

The ISAM organization is a single-dimensional (as opposed to multi-dimensional) index. It supports searches on the attribute (or collection of attributes) on which the data file is ordered. For example, searching the indexed sequential access method for a given ssn K (i.e., a membership query) is simple. The search starts from the root page where the record with the largest search_value that is less or equal to K is located. The search then continues to the page in the next index level, pointed by this record, until a page of the Employee file is reached. If K is found among the ssn values of that Employee page, the appropriate record is returned as answer to the query. If K is not found the answer is empty. It is easy to see that this search takes O(logB(n ∕ B)) page accesses (I/Os) as this is the height (in pages) of the tree. The reader should note that the logarithm is base B, the size of the page, since this is a multi-way tree where each node has O(B) fan-out.

Range queries (as in: find the Employee records with ssn in the range [25, 100]) are addressed similarly. A search is first performed for the ssn defining the lower part of the range (in the above query example this would be ssn = 25). This look-up will lead to an appropriate Employee record located in some file page. Records with higher ssn values within this page are accessed until a record with ssn larger than the upper limit of the query range is found. If the upper limit of the query range is higher than the highest ssn in this page, the next page of the file is accessed and so on (recall that the file is stored sequentially). The search stops when an Employee page is found that contains a record with ssn larger than the query range.

If a denotes the answer size to a range query (number of Employee records satisfying the query range predicate), ISAM answers a range query in O(logB(n ∕ B) + a ∕ B) I/Os. Note that the logarithmic part is spent to find the Employee page with the first record that satisfies the query predicate (if any) and the O(a ∕ B) part corresponds to accessing the rest of the Employee pages that contain answer.

While the use of the index greatly facilitates query time, there is of course a space overhead, since the access method itself uses pages to store its records. However, this overhead is minimal. The number of pages used by the index structure is still bounded by O(n ∕ B). This is because the first level uses at most O(n ∕ B 2) pages, the second at most O(n ∕ B 3) and so on.

An interesting observation is that an indexed sequential access method imitates binary search on a disk-based environment. However, given that at each node of the index a whole page is accessed, there are O(B) choices (instead of just 2 in the binary search) at each node.

The main advantages of the ISAM organization are its simplicity, small space overhead and fast query time. The structure however is static. If new records are added in the Employee file they are handled in an overflow file. Since there is no empty space in the data file, overflow pages are created to store the new records. Such pages are typically chained to the page where a record should have been stored (see Fig. 2). Various proposals exist on how to handle the overflow file [2,4,5,6]. Nevertheless, the structure of the index does not change as the size of the data file changes. This eventually affects query time. The overflow file can be merged periodically with the main Employee file, at which time the index needs to be recreated. Similarly, if records are deleted in the original Employee file, pages may be left containing very few records which affects both storage and query time. These problems are solved by the B+-tree, which is a dynamic indexing scheme [1].
https://static-content.springer.com/image/prt%3A978-0-387-39940-9%2F9/MediaObjects/978-0-387-39940-9_9_Part_Fig2-738_HTML.jpg
Indexed Sequential Access Method. Figure 2

The indexed sequential access method with overflows.

There are two main differences between ISAM and B+-tree: firstly, when a new page is created in the B+-tree, space is left to accommodate future insertions. In practice, a newly created page starts half empty so that it can store many new records before a structural re-organization is needed. If the page becomes full of records and a new record is directed to it, the page is split into two pages (that are half full). Secondly, a page in the B+-tree is not allowed to become scarce of records (unless it is the tree’s root page). As a result, when a page is accessed, it is guaranteed to contain a minimum number of records. If due to deletions a pages record occupancy falls below the threshold (half the page size) the page is merged with another page so that the combination has enough records. Note that leaving pages half empty imposes additional space overhead for the B+-tree than the ISAM; however it results into a very effective dynamic height-balanced indexing scheme.

Finally, ISAM can be considered as the tree-based index alternative to the static external hashing. Both schemes are static and overflow areas are used for additional records. Their major difference is that ISAM can perform both range and membership queries, while static external hashing is designed only for membership queries.

KEY APPLICATIONS

ISAM has been used in early database management systems as an index method to provide fast access to range and membership queries. It was later replaced by the VSAM structure [4] which introduced the notion of page splitting. Finally, the B+-tree was proposed as a dynamic indexing structure [1] and is now the standard access method in most relational database systems.

Cross-references

B+-Tree

Indexing

Membership Query

Range Query

Copyright information

© Springer Science+Business Media, LLC 2009
Show all