Advertisement

Succinct Text Indexes on Large Alphabet

  • Meng Zhang
  • Jijun Tang
  • Dong Guo
  • Liang Hu
  • Qiang Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3959)

Abstract

In this paper, we first consider some properties of strings who have the same suffix array. Next, we design a data structure to support rank and select operations on an alphabet Σ using nlog|Σ| + (n log|Σ|) bits in O(log|Σ|) time for a text of length n. It also supports an extended rank, namely rank  ≤ , such that rank \(^{\rm \leq}_{\alpha}\)(T,i) returns the number of letters which are smaller than α in string T, plus the number of αs up to position i. Also, it runs in O(log|Σ|) time. By this structure, we implement the DAWG succinctly. The main structure only takes nlog|Σ| + o(nlog|Σ|) bits and supports basic operations of DAWG efficiently.

Keywords

Function Rank Binary String Suffix Array Select Operation Large Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 31–43. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. Bannai, H., Inenaga, S., Shinohara, A., Takeda, M.: Inferring strings from graphs and arrays. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 208–217. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theoretical Computer Science 40, 31–55 (1985)MATHCrossRefMathSciNetGoogle Scholar
  4. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. DEC SRC Research Report 124 (1994)Google Scholar
  5. Crochemore, M., Hancart, C.: Automata for matching patterns. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages. Linear Modeling: Background and Application, vol. 2(9), pp. 399–462. Springer, Heidelberg (1997)Google Scholar
  6. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 4lst Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)Google Scholar
  7. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W., Baeza-Yates, R.A. (eds.) Information Retrieval: Algorithms and Data Structures, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)Google Scholar
  8. Gusfield, D.: Algorithms on Strings Trees and Sequences. Cambridge University-Press, New York (1997)MATHCrossRefGoogle Scholar
  9. Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, STOC (2000)Google Scholar
  10. He, M., Ian Munro, J., Srinivasa Rao, S.: A categorization theorem on suffix arrays with applications to space efficient text indexes. In: SIAM Symposium on Discrete Algorithms (SODA), pp. 23–32 (2005)Google Scholar
  11. Jacobson, G.: Succinct static data structures. Technical Report CMU-CS-89-112, Dept. of Computer Science, Carnegie-Mellon University (January 1989)Google Scholar
  12. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)MATHCrossRefMathSciNetGoogle Scholar
  13. Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs. In: Proc. 38th Annual IEEE Symp. on Foundations of Computer Science, October 1997, pp. 118–126 (1997)Google Scholar
  14. Munro, J.I.: Tables. In: Proceedings of the 16th ray Conference on Foundations of Software Technology and Computer Science (FSTTCS 1996). LNCS, vol. 1180, pp. 37–42 (1996)Google Scholar
  15. Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix arrays. In: Proc. 11th International Symposium on Algorithms and Computation. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)Google Scholar
  16. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar
  17. Zhang, M.: Succinct Text Indexes on Large Alphabet. Technical Report, Jilin University (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Meng Zhang
    • 1
  • Jijun Tang
    • 2
  • Dong Guo
    • 1
  • Liang Hu
    • 1
  • Qiang Li
    • 1
  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.Department of Computer Science and EngineeringUniversity of South CarolinaUSA

Personalised recommendations