Extended suffix array construction using Lyndon factors

Sunita; Garg, Deepak

doi:10.1007/s12046-018-0832-z

Extended suffix array construction using Lyndon factors

Published: 05 July 2018

Volume 43, article number 133, (2018)
Cite this article

Sādhanā Aims and scope Submit manuscript

Sunita¹ &
Deepak Garg²

126 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we are extending the novel approach of incremental suffix array construction using Lyndon factorization to the construction of extended suffix array where extended suffix array is the suffix array along with the corresponding longest common prefix (LCP) array. Main motive behind the incremental and simultaneous construction of suffix array and LCP array is that both involve in calculating the order information by considering the common prefixes of the suffixes. As local suffixes once sorted have the same sorted order when these are merged with sorted suffixes of another Lyndon factor. So, merging of Lyndon factors is simply merging of two sorted lists of suffixes of these Lyndon factors. Also, the two sorted orders coincide thus making the merging of Lyndon factors a simple merging of two sorted lists of suffixes. Incremental LCP construction simultaneously saves a lot of computation and hence time. The proposed approach has quadratic run time and the disk working space requirement is O(n). Experiments also show the performance gain of our approach in terms of time over the existing method of incremental construction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ferragina P and Manzini G 2005 Indexing compressed text. J. Assoc. Comput. Mach. 52: 552–581
Article MathSciNet MATH Google Scholar
Grossi R and Vitter J 2005 Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35: 378–407
Article MathSciNet MATH Google Scholar
Hon W, Lam T, Sadakane K, Sung W and Yiu S 2007 A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48: 23–36
Article MathSciNet MATH Google Scholar
Ferragina P and Manzini G 2000 Opportunistic data structures with applications. In: Proceedings of Annual Symposium on Foundations of Computer Science, pp. 390–398
Makinen, V. and G. Navarro 2005 Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12: 40–66
MathSciNet MATH Google Scholar
Canovas R and Navarro G 2010 Practical compressed suffix trees. In: Proceedings of the International Conference on Experimental Algorithms: LNCS. 6049, pp. 94–105
Fischer J, Makinen V and Navarro G 2008 An(other) entropy-bounded compressed suffix tree. In: CPM: LNCS. 5029, pp. 152–165
Sadakane K 2007 Compressed suffix trees with full functionality. Theory Comput. Syst. 41: 589–607
Article MathSciNet MATH Google Scholar
Valimaki N, Makinen V, Gerlach W and Dixit K 2009 Engineering a compressed suffix tree implementation. ACM J. Exp. Algorithmics 14: article 2
Weiner P 1973. Linear pattern matching algorithms. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 1–11
Gusfield D 1997 Algorithms on strings, trees, and sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge
Book MATH Google Scholar
Ferragina P and Grossi R 1999 The string B-Tree: a new data structure for string search in external memory and its applications. J. ACM 46: 236–280
Article MathSciNet MATH Google Scholar
Sinha R, Puglisi S J, Moffat A and Turpin A 2008 Improving suffix array locality for fast pattern matching on disk. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 661–672
Chen K T, Fox R H and Lyndon R C 1958 Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 68(1): 81–95
Article MathSciNet MATH Google Scholar
Duval J P 1983 Factorizing words over an ordered alphabet. J. Algorithms, 4(4): 363–381
Article MathSciNet MATH Google Scholar
Brlek S, Lachau J O, Provençal X and Reutenauer C 2009 Lyndon + Christoffel = Digitally Convex. Pattern Recogn. 42(10): 2239–2246
Article MATH Google Scholar
Hohlweg C and Reutenauer C 2003 Lyndon words, permutations and trees. Theoret. Comput. Sci. 307(1): 173–178
Article MathSciNet MATH Google Scholar
Berstel J, Lauve A, Reutenauer C and Saliola F 2008 Combinatorics on words: Chritoffel words and repetition in words. CRM Monograph Series. American Mathematical Society, 27, Providence, Rhode Island
Bonomo S, Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Suffixes, Conjugates and Lyndon words. Lect. Notes Comput. Sci. 7907: 131–142
Article MathSciNet MATH Google Scholar
Manber U and Myers G 1993 Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5): 935–948
Article MathSciNet MATH Google Scholar
Burrows M, Wheeler and David J 1994 A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation
Sadakane K 2000 Compressed text databases with efficient query algorithms based on the compressed suffix array. In: ISAAC’00, LNCS 1969: 410–421
Fiala M, and Holub J 2008 DCA using suffix arrays. In: Data Compression Conference DCC’2008, pp. 516
Sestak R, Lnsk J and Zemlicka M 2008 Suffix array for large alphabet. In: Data Compression Conference DCC’2008, pp. 543
Bieganski P, Riedl J and Carlis J V 1994 Generalized Suffix Trees for Biological Sequence Data: Applications and Implementation. In: Proceedings of the 27th Annual Hawaii International Conference on System Sciences. Hawaii: IEEE, pp. 34–55
Vyverman M, De Baets B, Fack V and Dawyndt P 2013 essaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics 29(6): 802–804
Article Google Scholar
Schröder J, Schröder H, Puglisi SJ, Sinha R and Schmidt B 2009 SHREC: a short-read error correction method, Bioinformatics. 25(17): 2157–2163
Article Google Scholar
Gonnella G and Kurtz S 2012 Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13(1): 1–19
Article Google Scholar
Hazelhurst S and Lipák Z 2011 KABOOM! A new suffix array based algorithm for clustering expression data. Bioinformatics 27(24): 3348–3355
Article Google Scholar
Mantaci S, Restivo A, Rosone G and Sciortino M 2013 Sorting suffixes of a text via its Lyndon Factorization. Stringology 119–127
Mantaci S, Restivo A, Rosone G and Sciortino M 2014 Suffix array and Lyndon factorization of a text. J. Discrete Algorithms 28: 2–8
Article MathSciNet MATH Google Scholar
Apostolico A and Crochemore M 1995 Fast parallel Lyndon factorization with applications. Math. Syst. Theory 28(2): 89–108
Article MathSciNet MATH Google Scholar
Ghuman S S, Giaquinta E and Tarhio J 2014 Alternative algorithms for Lyndon Factorization. Stringology 169–178
Ferragina P, Gagie T and Manzini G 2012 Lightweight data indexing and compression in external memory. Algorithmica. 63(3): 707–730
Article MathSciNet MATH Google Scholar
Makinen V and Navarro G 2008 Dynamic entropy-compressed sequences and full-text indexes. ACM Trans. Alg. 4(3): article 32
González R and Navarro G 2008 Improved dynamic rank-select entropy-bound structures. In: Proceedings of the Latin American Theoretical Informatics (LATIN), Lecture Notes in Computer Science. 4957
Karkkainen J, Sanders P and Burkhardt S 2006 Linear work suffix array construction. J. ACM 53(6): 918–936
Article MathSciNet MATH Google Scholar
Nong G, Zhang S and Chan W H 2009 Linear suffix array construction by almost pure induced-sorting. In: DCC, James A. Storer and Michael W. Marcellin (Eds.), IEEE Computer Society, pp. 193–202
Bender M A and Farach-Colton M 2000 The LCA problem revisited, Lecture Notes in Computer Science, pp. 88–94
Karkkainen J, Manzini G and Simon J P 2009 Permuted longest common-prefix array. In: CPM (Gregory Kucherov and Esko Ukkonen, eds.), Lecture Notes in Computer Science. 5577: 181–192
Kasai T, Lee G, Arimura H, Arikawa S and Park K 2001 Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proceedings of Annual Symposium on Combinatorial Pattern Matching 2089: 181–192

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Thapar University, Patiala, 147004, India
Sunita
Computer Science Engineering Department, Bennett University, Greater Noida, 201310, India
Deepak Garg

Authors

Sunita
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Garg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunita.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sunita, Garg, D. Extended suffix array construction using Lyndon factors. Sādhanā 43, 133 (2018). https://doi.org/10.1007/s12046-018-0832-z

Download citation

Received: 24 May 2017
Revised: 31 October 2017
Accepted: 04 January 2018
Published: 05 July 2018
DOI: https://doi.org/10.1007/s12046-018-0832-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extended suffix array construction using Lyndon factors

Abstract

Access this article

Similar content being viewed by others

Inducing the Lyndon Array

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

On the Optimisation of the GSACA Suffix Array Construction Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extended suffix array construction using Lyndon factors

Abstract

Access this article

Similar content being viewed by others

Inducing the Lyndon Array

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

On the Optimisation of the GSACA Suffix Array Construction Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation