Engineering a Compressed Suffix Tree Implementation

  • Niko Välimäki
  • Wolfgang Gerlach
  • Kashyap Dixit
  • Veli Mäkinen
Conference paper

DOI: 10.1007/978-3-540-72845-0_17

Volume 4525 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Välimäki N., Gerlach W., Dixit K., Mäkinen V. (2007) Engineering a Compressed Suffix Tree Implementation. In: Demetrescu C. (eds) Experimental Algorithms. WEA 2007. Lecture Notes in Computer Science, vol 4525. Springer, Berlin, Heidelberg

Abstract

Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = {A,C,G,T} can be stored in n log|Σ| = 2n bits, its suffix tree occupies O(nlogn) bits. In practice, the size difference easily reaches factor 50.

We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log|Σ|) bits, and supports all typical suffix tree operations with at most logn factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30.

Our implementation follows the original proposal in spirit, but some internal parts are tailored towards practical implementation. Our construction algorithm has time requirement O(nlogn log|Σ|) and uses closely the same space as the final structure while constructing it: on the 10 MB DNA sequence, the maximum space usage during construction is only 1.4 times the final product size.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Niko Välimäki
    • 1
  • Wolfgang Gerlach
    • 2
  • Kashyap Dixit
    • 3
  • Veli Mäkinen
    • 1
  1. 1.Department of Computer Science, University of HelsinkiFinland
  2. 2.Technische Fakultät, Universität BielefeldGermany
  3. 3.Department of Computer Science and Engineering, Indian Institute of Technology, KanpurIndia