Advertisement

Permuted Longest-Common-Prefix Array

  • Juha Kärkkäinen
  • Giovanni Manzini
  • Simon J. Puglisi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5577)

Abstract

The longest-common-prefix (LCP) array is an adjunct to the suffix array that allows many string processing problems to be solved in optimal time and space. Its construction is a bottleneck in practice, taking almost as long as suffix array construction. In this paper, we describe algorithms for constructing the permuted LCP (PLCP) array in which the values appear in position order rather than lexicographical order. Using the PLCP array, we can either construct or simulate the LCP array. We obtain a family of algorithms including the fastest known LCP construction algorithm and some extremely space efficient algorithms. We also prove a new combinatorial property of the LCP values.

Keywords

Lexicographical Order Construction Algorithm Primary Memory Extra Space Sparse Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM Journal of Experimental Algorithmics 12, 1–24 (2008)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Ferragina, P., Grossi, R.: The String B-Tree: A new data structure for string search in external memory and its applications. Journal of the ACM 46, 236–280 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Fischer, J., Mäkinen, V., Navarro, G.: An(other) entropy-bounded compressed suffix tree. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 152–165. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
  6. 6.
    Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sorting. Theoretical Computer Science 387, 249–257 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  9. 9.
    Kelbert, A.: Memorial website of Dima Khmelev (2006), http://mgg.coas.oregonstate.edu/~anya/dima/index-eng.html
  10. 10.
    Khmelev, D.: Personal communication (2004)Google Scholar
  11. 11.
    Khmelev, D.: Program lcp version 0.1.9 (2004), http://www.math.toronto.edu/dkhmelev/PROGS/misc/lcp-eng.html
  12. 12.
    Mäkinen, V.: Compact suffix array — a space efficient full-text index. Fundamenta Informaticae 56, 191–210 (2003); Special Issue - Computing Patterns in StringsMathSciNetzbMATHGoogle Scholar
  13. 13.
    Manber, U., Myers, G.W.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Manzini, G.: Two space saving tricks for linear time LCP computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 372–383. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39 (2007)Google Scholar
  16. 16.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX 2007). SIAM, Philadelphia (2007)Google Scholar
  17. 17.
    Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39, 1–31 (2007)CrossRefGoogle Scholar
  18. 18.
    Puglisi, S.J., Turpin, A.: Space-time tradeoffs for Longest-Common-Prefix array computation. In: Hong, S.-H., Nagamochi, H., Fukunaga, T. (eds.) ISAAC 2008. LNCS, vol. 5369, pp. 124–135. Springer, Heidelberg (2008)Google Scholar
  19. 19.
    Sadakane, K.: Succinct representations of lcp information and improvements in the compressed suffix arrays. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 225–232. ACM/SIAM (2002)Google Scholar
  20. 20.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48, 294–313 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Sinha, R., Puglisi, S.J., Moffat, A., Turpin, A.: Improving suffix array locality for fast pattern matching on disk. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 661–672. ACM Press, New York (2008)CrossRefGoogle Scholar
  22. 22.
    Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th annual Symposium on Foundations of Computer Science, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Juha Kärkkäinen
    • 1
  • Giovanni Manzini
    • 2
  • Simon J. Puglisi
    • 3
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland
  2. 2.Department of Computer ScienceUniversity of Eastern PiedmontItaly
  3. 3.School of Computer Science and Information TechnologyRoyal Melbourne Institute of TechnologyAustralia

Personalised recommendations