Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

Ohlebusch, Enno; Beller, Timo

doi:10.1007/978-3-319-11918-2_12

Enno Ohlebusch¹⁷ &
Timo Beller¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

624 Accesses

Abstract

The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and there are dozens of algorithms that search for exact or approximate repeats. The notions of maximal and supermaximal (exact) repeats have received special attention, and it is possible to simultaneously compute them on index data structures like the suffix tree or the enhanced suffix array. Very recently, this research has been extended in two directions. Gallé and Tealdi devised an alphabet-independent linear-time algorithm that finds all context-diverse repeats (which subsume maximal and supermaximal repeats as special cases), while Taillefer and Miller gave a quadratic-time algorithm that simultaneously computes and classifies maximal, near-supermaximal, and supermaximal repeats. In this paper, we provide new alphabet-independent linear-time algorithms for both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On Context-Diverse Repeats and Their Incremental Computation

Composite Repetition-Aware Data Structures

Flexible Indexing of Repetitive Collections

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)
Article MathSciNet MATH Google Scholar
Becher, V., Deymonnaz, A., Heiber, P.: Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome. Bioinformatics 25(14), 1746–1753 (2009)
Article Google Scholar
Beller, T., Berger, K., Ohlebusch, E.: Space-efficient computation of maximal and supermaximal repeats in genome sequences. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 99–110. Springer, Heidelberg (2012)
Chapter Google Scholar
Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T.: Computing the longest common prefix array based on the Burrows-Wheeler transform. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 197–208. Springer, Heidelberg (2011)
Chapter Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: Towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010)
Chapter Google Scholar
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)
Google Scholar
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing 40(2), 465–492 (2011)
Article MathSciNet MATH Google Scholar
Fischer, J., Heun, V., Kramer, S.: Optimal string mining under frequency constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 139–150. Springer, Heidelberg (2006)
Google Scholar
Franěk, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. Journal of Automata, Languages and Combinatorics 8(4), 579–591 (2003)
MathSciNet MATH Google Scholar
Gallé, M., Tealdi, M.: On context-diverse repeats and their incremental computation. In: Dediu, A.-H., Martín-Vide, C., Sierra-Rodríguez, J.-L., Truthe, B. (eds.) LATA 2014. LNCS, vol. 8370, pp. 384–395. Springer, Heidelberg (2014)
Chapter Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)
Google Scholar
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13, 338–355 (1984)
Article MathSciNet MATH Google Scholar
Hui, L.C.K.: Color set size problem with applications to string matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)
Chapter Google Scholar
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)
Chapter Google Scholar
Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Külekci, M.O., Vitter, J.S., Xu, B.: Efficient maximal repeat finding using the Burrows-Wheeler transform and wavelet tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(2), 421–429 (2012)
Article Google Scholar
Lian, C.N., Halachev, M., Shiri, N.: Searching for supermaximal repeats in large DNA sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds.) BIRD 2008. CCIS, vol. 13, pp. 87–101. Springer, Heidelberg (2008)
Google Scholar
Narisawa, K., Inenaga, S., Bannai, H., Takeda, M.: Efficient computation of substring equivalence classes with suffix arrays. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 340–351. Springer, Heidelberg (2007)
Chapter Google Scholar
Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch-Verlag (2013)
Google Scholar
Prieur, E., Lecroq, T.: On-line construction of compact suffix vectors and maximal repeats. Theoretical Computer Science 407(1-3), 290–301 (2008)
Article MathSciNet MATH Google Scholar
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), article 4 (2007)
Google Scholar
Puglisi, S.J., Smyth, W.F., Yusufu, M.: Fast, practical algorithms for computing all the repeats in a string. Mathematics in Computer Science 3(4), 373–389 (2010)
Article MathSciNet MATH Google Scholar
Raffinot, M.: On maximal repeats in strings. Information Processing Letters 80(3), 165–169 (2001)
Article MathSciNet MATH Google Scholar
Taillefer, E., Miller, J.: Exhaustive computation of exact duplications via super and non-nested local maximal repeats. Journal of Bioinformatics and Computational Biology 12(1), article 1350018 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Theoretical Computer Science, University of Ulm, D-89069, Ulm, Germany
Enno Ohlebusch & Timo Beller

Authors

Enno Ohlebusch
View author publications
You can also search for this author in PubMed Google Scholar
Timo Beller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto de Computação, Universidade Federal do Amazonas, 6200, Manaus, Brazil
Edleno Moura
King’s College London, UK
Maxime Crochemore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ohlebusch, E., Beller, T. (2014). Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-11918-2_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

Abstract

Access this chapter

Preview

Similar content being viewed by others

On Context-Diverse Repeats and Their Incremental Computation

Composite Repetition-Aware Data Structures

Flexible Indexing of Repetitive Collections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

Abstract

Access this chapter

Preview

Similar content being viewed by others

On Context-Diverse Repeats and Their Incremental Computation

Composite Repetition-Aware Data Structures

Flexible Indexing of Repetitive Collections

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation