The VLDB Journal

, Volume 19, Issue 5, pp 633–660

Engineering scalable, cache and space efficient tries for strings

Authors

    • University of Melbourne
  • Ranjan Sinha
    • University of Melbourne
Regular Paper

DOI: 10.1007/s00778-010-0183-9

Cite this article as:
Askitis, N. & Sinha, R. The VLDB Journal (2010) 19: 633. doi:10.1007/s00778-010-0183-9

Abstract

Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous research addressed this issue by replacing linked lists with dynamic arrays forming a cache-conscious array burst trie. Though faster, this variant can incur high instruction costs which can hinder its efficiency. Thus, engineering a fast, compact, and scalable trie for strings remains an open problem. In this paper, we introduce a novel and practical solution that carefully combines a trie with a hash table, creating a variant of burst trie called HAT-trie. We provide a thorough experimental analysis which demonstrates that for large set of strings and on alternative computing architectures, the HAT-trie—and two novel variants engineered to achieve further space-efficiency—is currently the leading in-memory trie-based data structure offering rapid, compact, and scalable storage and retrieval of variable-length strings.

Keywords

Cache-conscious hash tableBurst trieStringsIn-memory data structuresJudy trieSpace-efficientDynamic arrayScalable

Copyright information

© Springer-Verlag 2010