Skip to main content
Log in

Extended suffix array construction using Lyndon factors

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

In this paper, we are extending the novel approach of incremental suffix array construction using Lyndon factorization to the construction of extended suffix array where extended suffix array is the suffix array along with the corresponding longest common prefix (LCP) array. Main motive behind the incremental and simultaneous construction of suffix array and LCP array is that both involve in calculating the order information by considering the common prefixes of the suffixes. As local suffixes once sorted have the same sorted order when these are merged with sorted suffixes of another Lyndon factor. So, merging of Lyndon factors is simply merging of two sorted lists of suffixes of these Lyndon factors. Also, the two sorted orders coincide thus making the merging of Lyndon factors a simple merging of two sorted lists of suffixes. Incremental LCP construction simultaneously saves a lot of computation and hence time. The proposed approach has quadratic run time and the disk working space requirement is O(n). Experiments also show the performance gain of our approach in terms of time over the existing method of incremental construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Ferragina P and Manzini G 2005 Indexing compressed text. J. Assoc. Comput. Mach. 52: 552–581

    Article  MathSciNet  MATH  Google Scholar 

  2. Grossi R and Vitter J 2005 Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35: 378–407

    Article  MathSciNet  MATH  Google Scholar 

  3. Hon W, Lam T, Sadakane K, Sung W and Yiu S 2007 A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48: 23–36

    Article  MathSciNet  MATH  Google Scholar 

  4. Ferragina P and Manzini G 2000 Opportunistic data structures with applications. In: Proceedings of Annual Symposium on Foundations of Computer Science, pp. 390–398

  5. Makinen, V. and G. Navarro 2005 Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12: 40–66

    MathSciNet  MATH  Google Scholar 

  6. Canovas R and Navarro G 2010 Practical compressed suffix trees. In: Proceedings of the International Conference on Experimental Algorithms: LNCS. 6049, pp. 94–105

  7. Fischer J, Makinen V and Navarro G 2008 An(other) entropy-bounded compressed suffix tree. In: CPM: LNCS. 5029, pp. 152–165

  8. Sadakane K 2007 Compressed suffix trees with full functionality. Theory Comput. Syst. 41: 589–607

    Article  MathSciNet  MATH  Google Scholar 

  9. Valimaki N, Makinen V, Gerlach W and Dixit K 2009 Engineering a compressed suffix tree implementation. ACM J. Exp. Algorithmics 14: article 2

  10. Weiner P 1973. Linear pattern matching algorithms. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 1–11

  11. Gusfield D 1997 Algorithms on strings, trees, and sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  12. Ferragina P and Grossi R 1999 The string B-Tree: a new data structure for string search in external memory and its applications. J. ACM 46: 236–280

    Article  MathSciNet  MATH  Google Scholar 

  13. Sinha R, Puglisi S J, Moffat A and Turpin A 2008 Improving suffix array locality for fast pattern matching on disk. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 661–672

  14. Chen K T, Fox R H and Lyndon R C 1958 Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 68(1): 81–95

    Article  MathSciNet  MATH  Google Scholar 

  15. Duval J P 1983 Factorizing words over an ordered alphabet. J. Algorithms, 4(4): 363–381

    Article  MathSciNet  MATH  Google Scholar 

  16. Brlek S, Lachau J O, Provençal X and Reutenauer C 2009 Lyndon + Christoffel = Digitally Convex. Pattern Recogn. 42(10): 2239–2246

    Article  MATH  Google Scholar 

  17. Hohlweg C and Reutenauer C 2003 Lyndon words, permutations and trees. Theoret. Comput. Sci. 307(1): 173–178

    Article  MathSciNet  MATH  Google Scholar 

  18. Berstel J, Lauve A, Reutenauer C and Saliola F 2008 Combinatorics on words: Chritoffel words and repetition in words. CRM Monograph Series. American Mathematical Society, 27, Providence, Rhode Island

  19. Bonomo S, Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Suffixes, Conjugates and Lyndon words. Lect. Notes Comput. Sci. 7907: 131–142

    Article  MathSciNet  MATH  Google Scholar 

  20. Manber U and Myers G 1993 Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5): 935–948

    Article  MathSciNet  MATH  Google Scholar 

  21. Burrows M, Wheeler and David J 1994 A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation

  22. Sadakane K 2000 Compressed text databases with efficient query algorithms based on the compressed suffix array. In: ISAAC’00, LNCS 1969: 410–421

  23. Fiala M, and Holub J 2008 DCA using suffix arrays. In: Data Compression Conference DCC’2008, pp. 516

  24. Sestak R, Lnsk J and Zemlicka M 2008 Suffix array for large alphabet. In: Data Compression Conference DCC’2008, pp. 543

  25. Bieganski P, Riedl J and Carlis J V 1994 Generalized Suffix Trees for Biological Sequence Data: Applications and Implementation. In: Proceedings of the 27th Annual Hawaii International Conference on System Sciences. Hawaii: IEEE, pp. 34–55

  26. Vyverman M, De Baets B, Fack V and Dawyndt P 2013 essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics 29(6): 802–804

    Article  Google Scholar 

  27. Schröder J, Schröder H, Puglisi SJ, Sinha R and Schmidt B 2009 SHREC: a short-read error correction method, Bioinformatics. 25(17): 2157–2163

    Article  Google Scholar 

  28. Gonnella G and Kurtz S 2012 Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13(1): 1–19

    Article  Google Scholar 

  29. Hazelhurst S and Lipák Z 2011 KABOOM! A new suffix array based algorithm for clustering expression data. Bioinformatics 27(24): 3348–3355

    Article  Google Scholar 

  30. Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Sorting suffixes of a text via its Lyndon Factorization. Stringology 119–127

  31. Mantaci S, Restivo A, Rosone G and Sciortino M 2014 Suffix array and Lyndon factorization of a text. J. Discrete Algorithms 28: 2–8

    Article  MathSciNet  MATH  Google Scholar 

  32. Apostolico A and Crochemore M 1995 Fast parallel Lyndon factorization with applications. Math. Syst. Theory 28(2): 89–108

    Article  MathSciNet  MATH  Google Scholar 

  33. Ghuman S S, Giaquinta E and Tarhio J 2014 Alternative algorithms for Lyndon Factorization. Stringology 169–178

  34. Ferragina P, Gagie T and Manzini G 2012 Lightweight data indexing and compression in external memory. Algorithmica. 63(3): 707–730

    Article  MathSciNet  MATH  Google Scholar 

  35. Makinen V and Navarro G 2008 Dynamic entropy-compressed sequences and full-text indexes. ACM Trans. Alg. 4(3): article 32

  36. González R and Navarro G 2008 Improved dynamic rank-select entropy-bound structures. In: Proceedings of the Latin American Theoretical Informatics (LATIN), Lecture Notes in Computer Science. 4957

  37. Karkkainen J, Sanders P and Burkhardt S 2006 Linear work suffix array construction. J. ACM 53(6): 918–936

    Article  MathSciNet  MATH  Google Scholar 

  38. Nong G, Zhang S and Chan W H 2009 Linear suffix array construction by almost pure induced-sorting. In: DCC, James A. Storer and Michael W. Marcellin (Eds.), IEEE Computer Society, pp. 193–202

  39. Bender M A and Farach-Colton M 2000 The LCA problem revisited, Lecture Notes in Computer Science, pp. 88–94

  40. Karkkainen J, Manzini G and Simon J P 2009 Permuted longest common-prefix array. In: CPM (Gregory Kucherov and Esko Ukkonen, eds.), Lecture Notes in Computer Science. 5577: 181–192

  41. Kasai T, Lee G, Arimura H, Arikawa S and Park K 2001 Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proceedings of Annual Symposium on Combinatorial Pattern Matching 2089: 181–192

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunita.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sunita, Garg, D. Extended suffix array construction using Lyndon factors. Sādhanā 43, 133 (2018). https://doi.org/10.1007/s12046-018-0832-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-0832-z

Keywords

Navigation