Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

  • Kunihiko Sadakane
Conference paper

DOI: 10.1007/3-540-40996-3_35

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1969)
Cite this paper as:
Sadakane K. (2000) Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array. In: Goos G., Hartmanis J., van Leeuwen J., Lee D.T., Teng SH. (eds) Algorithms and Computation. ISAAC 2000. Lecture Notes in Computer Science, vol 1969. Springer, Berlin, Heidelberg

Abstract

A compressed text database based on the compressed sufffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies \( O(n\log |\Sigma |) \) bits for the alphabet . On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in \( O(|P|\log n + occ\log ^\varepsilon n) \) time and decompress a part of the text of length l in \( O(l + \log ^e n) \) time for any given 1 ≥ ∈ > 0. Our data structure occupies only \( n(\frac{2} {\varepsilon }(\frac{3} {2} + H_0 + 2logH_0 ) + 2 + \frac{{4log^\varepsilon n}} {{log^\varepsilon n - 1}}) + o(n) + O(|\Sigma |log|\Sigma |) \) bits where \( {\rm H}0 \leqslant {\text{log}}\left| \sum \right| \) is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Kunihiko Sadakane
    • 1
  1. 1.Department of System Information Sciences Graduate School of Information SciencesTohoku UniversityJapan

Personalised recommendations