Compressed Suffix Arrays for Massive Data

  • Jouni Sirén
Conference paper

DOI: 10.1007/978-3-642-03784-9_7

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5721)
Cite this paper as:
Sirén J. (2009) Compressed Suffix Arrays for Massive Data. In: Karlgren J., Tarhio J., Hyyrö H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg

Abstract

We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA). The algorithm requires O(n logn) time in the worst case, and only O(n) bits of extra space in addition to the CSA. As the basic step, we describe an algorithm for merging two CSAs. We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation. We also describe a parallel implementation of the algorithm, capable of indexing several gigabytes per hour.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jouni Sirén
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations