The VLDB Journal

, Volume 18, Issue 1, pp 157–179

B-tries for disk-based string management

Authors

    • School of Computer Science and Information TechnologyRMIT University
  • Justin Zobel
    • NICTAUniversity of Melbourne
Regular Paper

DOI: 10.1007/s00778-008-0094-1

Cite this article as:
Askitis, N. & Zobel, J. The VLDB Journal (2009) 18: 157. doi:10.1007/s00778-008-0094-1

Abstract

A wide range of applications require that large quantities of data be maintained in sort order on disk. The B-tree, and its variants, are an efficient general-purpose disk-based data structure that is almost universally used for this task. The B-trie has the potential to be a competitive alternative for the storage of data where strings are used as keys, but has not previously been thoroughly described or tested. We propose new algorithms for the insertion, deletion, and equality search of variable-length strings in a disk-resident B-trie, as well as novel splitting strategies which are a critical element of a practical implementation. We experimentally compare the B-trie against variants of B-tree on several large sets of strings with a range of characteristics. Our results demonstrate that, although the B-trie uses more memory, it is faster, more scalable, and requires less disk space.

Keywords

B-tree Burst trie Secondary storage Vocabulary accumulation Word-level indexing Data structures

Copyright information

© Springer-Verlag 2008