Skip to main content

Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time

  • Conference paper
String Processing and Information Retrieval (SPIRE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

  • 624 Accesses

Abstract

The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and there are dozens of algorithms that search for exact or approximate repeats. The notions of maximal and supermaximal (exact) repeats have received special attention, and it is possible to simultaneously compute them on index data structures like the suffix tree or the enhanced suffix array. Very recently, this research has been extended in two directions. Gallé and Tealdi devised an alphabet-independent linear-time algorithm that finds all context-diverse repeats (which subsume maximal and supermaximal repeats as special cases), while Taillefer and Miller gave a quadratic-time algorithm that simultaneously computes and classifies maximal, near-supermaximal, and supermaximal repeats. In this paper, we provide new alphabet-independent linear-time algorithms for both tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Becher, V., Deymonnaz, A., Heiber, P.: Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome. Bioinformatics 25(14), 1746–1753 (2009)

    Article  Google Scholar 

  3. Beller, T., Berger, K., Ohlebusch, E.: Space-efficient computation of maximal and supermaximal repeats in genome sequences. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 99–110. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Beller, T., Gog, S., Ohlebusch, E., Schnattinger, T.: Computing the longest common prefix array based on the Burrows-Wheeler transform. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 197–208. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: Towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)

    Google Scholar 

  7. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing 40(2), 465–492 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fischer, J., Heun, V., Kramer, S.: Optimal string mining under frequency constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 139–150. Springer, Heidelberg (2006)

    Google Scholar 

  9. Franěk, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. Journal of Automata, Languages and Combinatorics 8(4), 579–591 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Gallé, M., Tealdi, M.: On context-diverse repeats and their incremental computation. In: Dediu, A.-H., Martín-Vide, C., Sierra-Rodríguez, J.-L., Truthe, B. (eds.) LATA 2014. LNCS, vol. 8370, pp. 384–395. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)

    Google Scholar 

  12. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13, 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hui, L.C.K.: Color set size problem with applications to string matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  14. Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  16. Külekci, M.O., Vitter, J.S., Xu, B.: Efficient maximal repeat finding using the Burrows-Wheeler transform and wavelet tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(2), 421–429 (2012)

    Article  Google Scholar 

  17. Lian, C.N., Halachev, M., Shiri, N.: Searching for supermaximal repeats in large DNA sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds.) BIRD 2008. CCIS, vol. 13, pp. 87–101. Springer, Heidelberg (2008)

    Google Scholar 

  18. Narisawa, K., Inenaga, S., Bannai, H., Takeda, M.: Efficient computation of substring equivalence classes with suffix arrays. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 340–351. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch-Verlag (2013)

    Google Scholar 

  20. Prieur, E., Lecroq, T.: On-line construction of compact suffix vectors and maximal repeats. Theoretical Computer Science 407(1-3), 290–301 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  21. Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), article 4 (2007)

    Google Scholar 

  22. Puglisi, S.J., Smyth, W.F., Yusufu, M.: Fast, practical algorithms for computing all the repeats in a string. Mathematics in Computer Science 3(4), 373–389 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. Raffinot, M.: On maximal repeats in strings. Information Processing Letters 80(3), 165–169 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  24. Taillefer, E., Miller, J.: Exhaustive computation of exact duplications via super and non-nested local maximal repeats. Journal of Bioinformatics and Computational Biology 12(1), article 1350018 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ohlebusch, E., Beller, T. (2014). Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics