Abstract
In this paper, we are extending the novel approach of incremental suffix array construction using Lyndon factorization to the construction of extended suffix array where extended suffix array is the suffix array along with the corresponding longest common prefix (LCP) array. Main motive behind the incremental and simultaneous construction of suffix array and LCP array is that both involve in calculating the order information by considering the common prefixes of the suffixes. As local suffixes once sorted have the same sorted order when these are merged with sorted suffixes of another Lyndon factor. So, merging of Lyndon factors is simply merging of two sorted lists of suffixes of these Lyndon factors. Also, the two sorted orders coincide thus making the merging of Lyndon factors a simple merging of two sorted lists of suffixes. Incremental LCP construction simultaneously saves a lot of computation and hence time. The proposed approach has quadratic run time and the disk working space requirement is O(n). Experiments also show the performance gain of our approach in terms of time over the existing method of incremental construction.
Similar content being viewed by others
References
Ferragina P and Manzini G 2005 Indexing compressed text. J. Assoc. Comput. Mach. 52: 552–581
Grossi R and Vitter J 2005 Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35: 378–407
Hon W, Lam T, Sadakane K, Sung W and Yiu S 2007 A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48: 23–36
Ferragina P and Manzini G 2000 Opportunistic data structures with applications. In: Proceedings of Annual Symposium on Foundations of Computer Science, pp. 390–398
Makinen, V. and G. Navarro 2005 Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12: 40–66
Canovas R and Navarro G 2010 Practical compressed suffix trees. In: Proceedings of the International Conference on Experimental Algorithms: LNCS. 6049, pp. 94–105
Fischer J, Makinen V and Navarro G 2008 An(other) entropy-bounded compressed suffix tree. In: CPM: LNCS. 5029, pp. 152–165
Sadakane K 2007 Compressed suffix trees with full functionality. Theory Comput. Syst. 41: 589–607
Valimaki N, Makinen V, Gerlach W and Dixit K 2009 Engineering a compressed suffix tree implementation. ACM J. Exp. Algorithmics 14: article 2
Weiner P 1973. Linear pattern matching algorithms. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 1–11
Gusfield D 1997 Algorithms on strings, trees, and sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge
Ferragina P and Grossi R 1999 The string B-Tree: a new data structure for string search in external memory and its applications. J. ACM 46: 236–280
Sinha R, Puglisi S J, Moffat A and Turpin A 2008 Improving suffix array locality for fast pattern matching on disk. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 661–672
Chen K T, Fox R H and Lyndon R C 1958 Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 68(1): 81–95
Duval J P 1983 Factorizing words over an ordered alphabet. J. Algorithms, 4(4): 363–381
Brlek S, Lachau J O, Provençal X and Reutenauer C 2009 Lyndon + Christoffel = Digitally Convex. Pattern Recogn. 42(10): 2239–2246
Hohlweg C and Reutenauer C 2003 Lyndon words, permutations and trees. Theoret. Comput. Sci. 307(1): 173–178
Berstel J, Lauve A, Reutenauer C and Saliola F 2008 Combinatorics on words: Chritoffel words and repetition in words. CRM Monograph Series. American Mathematical Society, 27, Providence, Rhode Island
Bonomo S, Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Suffixes, Conjugates and Lyndon words. Lect. Notes Comput. Sci. 7907: 131–142
Manber U and Myers G 1993 Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5): 935–948
Burrows M, Wheeler and David J 1994 A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation
Sadakane K 2000 Compressed text databases with efficient query algorithms based on the compressed suffix array. In: ISAAC’00, LNCS 1969: 410–421
Fiala M, and Holub J 2008 DCA using suffix arrays. In: Data Compression Conference DCC’2008, pp. 516
Sestak R, Lnsk J and Zemlicka M 2008 Suffix array for large alphabet. In: Data Compression Conference DCC’2008, pp. 543
Bieganski P, Riedl J and Carlis J V 1994 Generalized Suffix Trees for Biological Sequence Data: Applications and Implementation. In: Proceedings of the 27th Annual Hawaii International Conference on System Sciences. Hawaii: IEEE, pp. 34–55
Vyverman M, De Baets B, Fack V and Dawyndt P 2013 essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics 29(6): 802–804
Schröder J, Schröder H, Puglisi SJ, Sinha R and Schmidt B 2009 SHREC: a short-read error correction method, Bioinformatics. 25(17): 2157–2163
Gonnella G and Kurtz S 2012 Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13(1): 1–19
Hazelhurst S and Lipák Z 2011 KABOOM! A new suffix array based algorithm for clustering expression data. Bioinformatics 27(24): 3348–3355
Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Sorting suffixes of a text via its Lyndon Factorization. Stringology 119–127
Mantaci S, Restivo A, Rosone G and Sciortino M 2014 Suffix array and Lyndon factorization of a text. J. Discrete Algorithms 28: 2–8
Apostolico A and Crochemore M 1995 Fast parallel Lyndon factorization with applications. Math. Syst. Theory 28(2): 89–108
Ghuman S S, Giaquinta E and Tarhio J 2014 Alternative algorithms for Lyndon Factorization. Stringology 169–178
Ferragina P, Gagie T and Manzini G 2012 Lightweight data indexing and compression in external memory. Algorithmica. 63(3): 707–730
Makinen V and Navarro G 2008 Dynamic entropy-compressed sequences and full-text indexes. ACM Trans. Alg. 4(3): article 32
González R and Navarro G 2008 Improved dynamic rank-select entropy-bound structures. In: Proceedings of the Latin American Theoretical Informatics (LATIN), Lecture Notes in Computer Science. 4957
Karkkainen J, Sanders P and Burkhardt S 2006 Linear work suffix array construction. J. ACM 53(6): 918–936
Nong G, Zhang S and Chan W H 2009 Linear suffix array construction by almost pure induced-sorting. In: DCC, James A. Storer and Michael W. Marcellin (Eds.), IEEE Computer Society, pp. 193–202
Bender M A and Farach-Colton M 2000 The LCA problem revisited, Lecture Notes in Computer Science, pp. 88–94
Karkkainen J, Manzini G and Simon J P 2009 Permuted longest common-prefix array. In: CPM (Gregory Kucherov and Esko Ukkonen, eds.), Lecture Notes in Computer Science. 5577: 181–192
Kasai T, Lee G, Arimura H, Arikawa S and Park K 2001 Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proceedings of Annual Symposium on Combinatorial Pattern Matching 2089: 181–192
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sunita, Garg, D. Extended suffix array construction using Lyndon factors. Sādhanā 43, 133 (2018). https://doi.org/10.1007/s12046-018-0832-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-018-0832-z