Advertisement

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

  • German TischlerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)

Abstract

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n) words (i.e. \(O(n \log n)\) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n bit version of the LCP array which uses \(O(n \log \sigma )\) bits of additional space in external memory when given a (compressed) BWT with alphabet size \(\sigma \) and a sampled inverse suffix array at sampling rate \(O(\log n)\). This is often a significant space gain in practice where \(\sigma \) is usually much smaller than n or even constant. The algorithm has average run-time \(O(n\log n\log \sigma )\) and worst case run-time \(O(n^2\log \sigma )\). It can be improved to \(O(n\log ^2 n\log \sigma )\) worst case time while keeping the same space bound in external memory if \(O(n / \log n)\) bits of internal memory are available. We also present experimental data showing that our approach is practical.

Keywords

Space Requirement External Memory Space Usage Suffix Tree Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T.: Computing the longest common prefix array based on the Burrows-Wheeler transform. J. Discrete Algorithms 18, 22–31 (2013). http://dx.doi.org/10.1016/j.jda.2012.07.007 MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In: Sanders, P., Zeh, N. (eds.) Proceedings of ALENEX 2013, pp. 88–102. SIAM (2013). http://dx.doi.org/10.1137/1.9781611972931.8
  3. 3.
    Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)Google Scholar
  4. 4.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)Google Scholar
  5. 5.
    Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12, 1–24 (2008). http://doi.acm.org/10.1145/1227161.1402296 MathSciNetCrossRefGoogle Scholar
  6. 6.
    Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707–730 (2012). http://dx.doi.org/10.1007/s00453-011-9535-0 MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings FOCS 2000, pp. 390–398. IEEE Computer Society (2000). http://dx.doi.org/10.1109/SFCS.2000.892127
  8. 8.
    Ferragina, P., Manzini, G.: An experimental study of a compressed index. Inf. Sci. 135(1–2), 13–28 (2001). http://dx.doi.org/10.1016/S0020-0255(01)00098-6 CrossRefzbMATHGoogle Scholar
  9. 9.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004). http://dx.doi.org/10.1007/978-3-540-30213-1_23 CrossRefGoogle Scholar
  10. 10.
    Fischer, J.: Optimal succinctness for range minimum queries. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 158–169. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-12200-2_16 CrossRefGoogle Scholar
  11. 11.
    Fischer, J.: Wee LCP. Inf. Process. Lett. 110(8–9), 317–320 (2010). http://dx.doi.org/10.1016/j.ipl.2010.02.010 MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci. 410(51), 5354–5364 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.012 MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings SODA 2003, pp. 841–850. ACM/SIAM (2003). http://dl.acm.org/citation.cfm?id=644108.644250
  14. 14.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Yao, F.F., Luks, E.M. (eds.) Proceedings of STOC 2000, pp. 397–406. ACM (2000). http://doi.acm.org/10.1145/335305.335351
  15. 15.
    Hon, W., Sadakane, K., Sung, W.: Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput. 38(6), 2162–2178 (2009). http://dx.doi.org/10.1137/070685373 MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kärkkäinen, J., Kempa, D.: LCP array construction in external memory. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 412–423. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-07959-2_35 Google Scholar
  17. 17.
    Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-02441-2_17 CrossRefGoogle Scholar
  18. 18.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006). http://doi.acm.org/10.1145/1217856.1217858 MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001). http://dx.doi.org/10.1007/3-540-48194-X_17 CrossRefGoogle Scholar
  20. 20.
    Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993). http://dx.doi.org/10.1137/0222058 MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FOCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996). http://dx.doi.org/10.1007/3-540-62034-6_35 Google Scholar
  22. 22.
    Okanohara, D., Sadakane, K.: A linear-time Burrows-Wheeler transform using induced sorting. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 90–101. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-03784-9_9 CrossRefGoogle Scholar
  23. 23.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003). http://dx.doi.org/10.1016/S0196-6774(03)00087-7 MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 41(4), 589–607 (2007). http://dx.doi.org/10.1007/s00224-006-1198-x MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Sirén, J.: Sampled longest common prefix array. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 227–237. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-13509-5_21 CrossRefGoogle Scholar
  26. 26.
    Szpankowski, W.: On the height of digital trees and related problems. Algorithmica 6(1–6), 256–277 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Tischler, G.: Faster average case low memory semi-external construction of the Burrows-Wheeler transform. In: Iliopoulos, C.S., Langiu, A. (eds.) Proceedings of ICABD 2014. CEUR Workshop Proceedings, vol. 1146, pp. 61–68 (2014). http://ceur-ws.org/Vol-1146/paper10.pdf
  28. 28.
    Tischler, G.: Faster average case low memory semi-external construction of the Burrows-Wheeler transform. Mathematics in Computer Science (2014, accepted)Google Scholar
  29. 29.
    Vitter, J.S.: Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci. 2(4), 305–474 (2008). http://dx.doi.org/10.1561/0400000014 MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Weiner, P.: Linear pattern matching algorithms. In: Proceedings of FOCS 1973, pp. 1–11. IEEE Computer Society (1973). http://dx.doi.org/10.1109/SWAT.1973.13

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Max Planck Institute of Molecular Cell Biology and GeneticsDresdenGermany

Personalised recommendations