Compressed Suffix Arrays for Massive Data
- Cite this paper as:
- Sirén J. (2009) Compressed Suffix Arrays for Massive Data. In: Karlgren J., Tarhio J., Hyyrö H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg
We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA). The algorithm requires O(n logn) time in the worst case, and only O(n) bits of extra space in addition to the CSA. As the basic step, we describe an algorithm for merging two CSAs. We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation. We also describe a parallel implementation of the algorithm, capable of indexing several gigabytes per hour.
Unable to display preview. Download preview PDF.