Encyclopedia of Algorithms

2008 Edition
| Editors: Ming-Yang Kao

Approximate Tandem Repeats

2001; Landau, Schmidt, Sokol2003; Kolpakov, Kucherov
  • Gregory Kucherov
  • Dina Sokol
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30162-4_24

Keywords and Synonyms

Approximate repetitions; Approximate periodicities      

Problem Definition

Identification of periodic structures in words (variants of which are known as tandem repeats, repetitions, powers or runs) is a fundamental algorithmic task (see entry  Squares and Repetitions). In many practical applications, such as DNA sequence analysis, considered repetitions admit a certain variation between copies of the repeated pattern. In other words, repetitions under interest are approximate tandem repeats and not necessarily exact repeats only.

The simplest instance of an approximate tandem repeat is an approximate square. An approximate square in a word w is a subword uv, where u and v are within a given distance kaccording to some distance measure between words, such as Hamming distance or edit (also called Levenstein) distance. There are several ways to define approximate tandem repeats as successions of approximate squares, i. e. to generalize to the approximate case the...


Tandem Repeat Edit Distance Myotonic Dystrophy Maximal Repeat Suffix Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.



This work was supported in part by the National Science Foundation Grant DB&I 0542751.

Recommended Reading

  1. 1.
    Benson, G.: Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999)CrossRefGoogle Scholar
  2. 2.
    Boeva, V.A., Régnier, M., Makeev, V.J.: SWAN: searching for highly divergent tandem repeats in DNA sequences with the evaluation of their statistical significance. Proceedings of JOBIM 2004, Montreal, Canada, p. 40 (2004)Google Scholar
  3. 3.
    Butler, J.M.: Forensic DNA Typing: Biology and Technology Behind STR Markers. Academic Press (2001)Google Scholar
  4. 4.
    Crochemore, M.: Recherche linéaire d'un carré dans un mot. Comptes Rendus Acad. Sci. Paris Sér. I Math. 296, 781–784 (1983)MathSciNetMATHGoogle Scholar
  5. 5.
    Delgrange, O., Rivals, E.: STAR – an algorithm to Search for Tandem Approximate Repeats. Bioinform. 20, 2812–2820 (2004)CrossRefGoogle Scholar
  6. 6.
    Gelfand, Y., Rodriguez, A., Benson, G.: TRDB – The Tandem Repeats Database. Nucl. Acids Res. 35(suppl. 1), D80–D87 (2007)Google Scholar
  7. 7.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)MATHCrossRefGoogle Scholar
  8. 8.
    Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Symp. Foundations of Computer Science (FOCS), pp. 596–604. IEEE Computer Society Press (1999)Google Scholar
  9. 9.
    Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucl. Acids Res. 31(13), 3672–3678 (2003)CrossRefGoogle Scholar
  10. 10.
    Kolpakov, R., Kucherov, G.: Finding approximate repetitions under Hamming distance. Theoret. Comput. Sci. 33(1), 135–156, (2003)Google Scholar
  11. 11.
    Kolpakov, R., Kucherov, G.: Identification of periodic structures in words. In: Berstel, J., Perrin, D. (eds.) Applied combinatorics on words. Encyclopedia of Mathematics and its Applications. Lothaire books, vol. 104, pp. 430–477. Cambridge University Press (2005)Google Scholar
  12. 12.
    Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8, 1–18 (2001)CrossRefGoogle Scholar
  15. 15.
    Main, M.: Detecting leftmost maximal periodicities. Discret. Appl. Math. 25, 145–153 (1989)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Main, M., Lorentz, R.: An \( { O(n \log n) } \) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Messer, P.W., Arndt, P.F.: The majority of recent short DNA insertions in the human genome are tandem duplications. Mol. Biol. Evol. 24(5), 1190–7 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Rodeh, M., Pratt, V., Even, S.: Linear algorithm for data compression via string matching. J. Assoc. Comput. Mach. 28(1), 16–24 (1981)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Sokol, D., Benson, G., Tojeira, J.: Tandem repeats over the edit distance. Bioinform. 23(2), e30–e35 (2006)Google Scholar
  20. 20.
    Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. J. Comput. Biol. 12(7), 928–42 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Gregory Kucherov
    • 1
  • Dina Sokol
    • 2
  1. 1.LIFL and INRIAVilleneuve d'AscqFrance
  2. 2.Department of Computer and Information ScienceBrooklyn College of CUNYBrooklynUSA