Encyclopedia of Algorithms

Editors: Ming-Yang Kao

Approximate Tandem Repeats

2001; Landau, Schmidt, Sokol2003; Kolpakov, Kucherov
  • Gregory Kucherov
  • Dina Sokol
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30162-4_24

Keywords and Synonyms

Approximate repetitions; Approximate periodicities      

Problem Definition

Identification of periodic structures in words (variants of which are known as tandem repeats, repetitions, powers or runs) is a fundamental algorithmic task (see entry  Squares and Repetitions). In many practical applications, such as DNA sequence analysis, considered repetitions admit a certain variation between copies of the repeated pattern. In other words, repetitions under interest are approximate tandem repeats and not necessarily exact repeats only.

The simplest instance of an approximate tandem repeat is an approximate square. An approximate square in a word w is a subword uv, where u and v are within a given distance kaccording to some distance measure between words, such as Hamming distance or edit (also called Levenstein) distance. There are several ways to define approximate tandem repeats as successions of approximate squares, i. e. to generalize to the approximate case...

This is a preview of subscription content, log in to check access

Notes

Acknowledgments

This work was supported in part by the National Science Foundation Grant DB&I 0542751.

Recommended Reading

  1. 1.
    Benson, G.: Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999)Google Scholar
  2. 2.
    Boeva, V.A., Régnier, M., Makeev, V.J.: SWAN: searching for highly divergent tandem repeats in DNA sequences with the evaluation of their statistical significance. Proceedings of JOBIM 2004, Montreal, Canada, p. 40 (2004)Google Scholar
  3. 3.
    Butler, J.M.: Forensic DNA Typing: Biology and Technology Behind STR Markers. Academic Press (2001)Google Scholar
  4. 4.
    Crochemore, M.: Recherche linéaire d'un carré dans un mot. Comptes Rendus Acad. Sci. Paris Sér. I Math. 296, 781–784 (1983)MathSciNetGoogle Scholar
  5. 5.
    Delgrange, O., Rivals, E.: STAR – an algorithm to Search for Tandem Approximate Repeats. Bioinform. 20, 2812–2820 (2004)Google Scholar
  6. 6.
    Gelfand, Y., Rodriguez, A., Benson, G.: TRDB – The Tandem Repeats Database. Nucl. Acids Res. 35(suppl. 1), D80–D87 (2007)Google Scholar
  7. 7.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)Google Scholar
  8. 8.
    Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Symp. Foundations of Computer Science (FOCS), pp. 596–604. IEEE Computer Society Press (1999)Google Scholar
  9. 9.
    Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucl. Acids Res. 31(13), 3672–3678 (2003)Google Scholar
  10. 10.
    Kolpakov, R., Kucherov, G.: Finding approximate repetitions under Hamming distance. Theoret. Comput. Sci. 33(1), 135–156, (2003)Google Scholar
  11. 11.
    Kolpakov, R., Kucherov, G.: Identification of periodic structures in words. In: Berstel, J., Perrin, D. (eds.) Applied combinatorics on words. Encyclopedia of Mathematics and its Applications. Lothaire books, vol. 104, pp. 430–477. Cambridge University Press (2005)Google Scholar
  12. 12.
    Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988)MathSciNetGoogle Scholar
  13. 13.
    Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998)MathSciNetGoogle Scholar
  14. 14.
    Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8, 1–18 (2001)Google Scholar
  15. 15.
    Main, M.: Detecting leftmost maximal periodicities. Discret. Appl. Math. 25, 145–153 (1989)MathSciNetGoogle Scholar
  16. 16.
    Main, M., Lorentz, R.: An \( { O(n \log n) } \) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)MathSciNetGoogle Scholar
  17. 17.
    Messer, P.W., Arndt, P.F.: The majority of recent short DNA insertions in the human genome are tandem duplications. Mol. Biol. Evol. 24(5), 1190–7 (2007)MathSciNetGoogle Scholar
  18. 18.
    Rodeh, M., Pratt, V., Even, S.: Linear algorithm for data compression via string matching. J. Assoc. Comput. Mach. 28(1), 16–24 (1981)MathSciNetGoogle Scholar
  19. 19.
    Sokol, D., Benson, G., Tojeira, J.: Tandem repeats over the edit distance. Bioinform. 23(2), e30–e35 (2006)Google Scholar
  20. 20.
    Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. J. Comput. Biol. 12(7), 928–42 (2005)Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Gregory Kucherov
    • 1
  • Dina Sokol
    • 2
  1. 1.LIFL and INRIAVilleneuve d'AscqFrance
  2. 2.Department of Computer and Information ScienceBrooklyn College of CUNYBrooklynUSA