Constructing Compressed Suffix Arrays with Large Alphabets

  • Wing-Kai Hon
  • Tak-Wah Lam
  • Kunihiko Sadakane
  • Wing-Kin Sung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2906)

Abstract

Recent research in compressing suffix arrays has resulted in two breakthrough indexing data structures, namely, compressed suffix arrays (CSA) [7] and FM-index [5]. Either of them makes it feasible to store a full-text index in the main memory even for a piece of text data with a few billion characters (such as human DNA). However, constructing such indexing data structures with limited working memory (i.e., without constructing suffix arrays) is not a trivial task. This paper addresses this problem. Currently, only CSA admits a space-efficient construction algorithm [15]. For a text T of length n over an alphabet Σ, this algorithm requires O(|Σ|nlogn) time and (2 H0 + 1+ε)n bits of working space, where H0 is the 0-th order empirical entropy of T and ε is any non-zero constant. This algorithm is good enough when the alphabet size | Σ| is small. It is not practical for text data containing protein, Chinese or Japanese, where the alphabet may include up to a few thousand characters.

The main contribution of this paper is a new algorithm which can construct CSA in O(nlogn) time using (H0 + 2+ε)n bits of working space. Note that the running time of our algorithm is independent of the alphabet size and the space requirement is smaller as it is likely that H0 > 1. This paper also makes contribution to the space-efficient construction of FM-index. We show that FM-index can indeed be constructed from CSA directly in O(n) time.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Wing-Kai Hon
    • 1
  • Tak-Wah Lam
    • 1
  • Kunihiko Sadakane
    • 2
  • Wing-Kin Sung
    • 3
  1. 1.Department of Computer Science and Informations SystemsThe University of Hong KongHong Kong
  2. 2.Department of Computer Science and Communication EngineeringKyushu UniversityJapan
  3. 3.School of ComputingNational University of SingaporeSingapore

Personalised recommendations