Encyclopedia of Algorithms

Living Edition
| Editors: Ming-Yang Kao

Approximate Tandem Repeats

Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-27848-8_24-2

Years and Authors of Summarized Original Work

  • 2001; Landau, Schmidt, Sokol

  • 2003; Kolpakov, Kucherov

Problem Definition

Identification of periodic structures in words (variants of which are known as tandem repeats, repetitions, powers, or runs) is a fundamental algorithmic task (see entry Squares and Repetitions). In many practical applications, such as DNA sequence analysis, considered repetitions admit a certain variation between copies of the repeated pattern. In other words, repetitions under interest are approximate tandem repeats and not necessarily exact repeats only.

The simplest instance of an approximate tandem repeat is an approximate square. An approximate square in a word w is a subword uv, where u and v are within a given distance kaccording to some distance measure between words, such as Hamming distance or edit (also called Levenshtein) distance. There are several ways to define approximate tandem repeats as successions of approximate squares, i.e., to generalize to...

Keywords

Approximate repetitions Approximate periodicities 
This is a preview of subscription content, log in to check access.

Notes

Acknowledgements

This work was supported in part by the National Science Foundation Grant DB&I 0542751.

Recommended Reading

  1. 1.
    Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580CrossRefGoogle Scholar
  2. 2.
    Boeva VA, Régnier M, Makeev VJ (2004) SWAN: searching for highly divergent tandem repeats in DNA sequences with the evaluation of their statistical significance. In: Proceedings of JOBIM 2004, Montreal, p 40Google Scholar
  3. 3.
    Butler JM (2001) Forensic DNA typing: biology and technology behind STR markers. Academic Press, San DiegoGoogle Scholar
  4. 4.
    Crochemore M (1983) Recherche linéaire d’un carré dans un mot. C R Acad Sci Paris Sér I Math 296:781–784MathSciNetGoogle Scholar
  5. 5.
    Delgrange O, Rivals E (2004) STAR – an algorithm to search for tandem approximate repeats. Bioinformatics 20:2812–2820CrossRefGoogle Scholar
  6. 6.
    Gelfand Y, Rodriguez A, Benson G (2007) TRDB – the tandem repeats database. Nucleic Acids Res 35(suppl. 1):D80–D87CrossRefGoogle Scholar
  7. 7.
    Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge/New YorkCrossRefGoogle Scholar
  8. 8.
    Kolpakov R, Kucherov G (1999) Finding maximal repetitions in a word in linear time. In: 40th symposium foundations of computer science (FOCS), New York, pp 596–604. IEEE Computer Society PressGoogle Scholar
  9. 9.
    Kolpakov R, Kucherov G (2003) Finding approximate repetitions under Hamming distance. Theor Comput Sci 33(1):135–156MathSciNetCrossRefGoogle Scholar
  10. 10.
    Kolpakov R, Kucherov G (2005) Identification of periodic structures in words. In: Berstel J, Perrin D (eds) Applied combinatorics on words. Encyclopedia of mathematics and its applications. Lothaire books, vol 104, pp 430–477. Cambridge University Press, CambridgeGoogle Scholar
  11. 11.
    Kolpakov R, Bana G, Kucherov G (2003) mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 31(13):3672–3678CrossRefGoogle Scholar
  12. 12.
    Landau GM, Vishkin U (1988) Fast string matching with k differences. J Comput Syst Sci 37(1):63–78MathSciNetCrossRefGoogle Scholar
  13. 13.
    Landau GM, Myers EW, Schmidt JP (1998) Incremental string comparison. SIAM J Comput 27(2):557–582MathSciNetCrossRefGoogle Scholar
  14. 14.
    Landau GM, Schmidt JP, Sokol D (2001) An algorithm for approximate tandem repeats. J Comput Biol 8:1–18CrossRefGoogle Scholar
  15. 15.
    Main M (1989) Detecting leftmost maximal periodicities. Discret Appl Math 25:145–153MathSciNetCrossRefGoogle Scholar
  16. 16.
    Main M, Lorentz R (1984) An O(nlog n) algorithm for finding all repetitions in a string. J Algorithms 5(3):422–432MathSciNetCrossRefGoogle Scholar
  17. 17.
    Messer PW, Arndt PF (2007) The majority of recent short DNA insertions in the human genome are tandem duplications. Mol Biol Evol 24(5):1190–1197MathSciNetCrossRefGoogle Scholar
  18. 18.
    Rodeh M, Pratt V, Even S (1981) Linear algorithm for data compression via string matching. J Assoc Comput Mach 28(1):16–24MathSciNetCrossRefGoogle Scholar
  19. 19.
    Sokol D, Benson G, Tojeira J (2006) Tandem repeats over the edit distance. Bioinformatics 23(2):e30–e35CrossRefGoogle Scholar
  20. 20.
    Wexler Y, Yakhini Z, Kashi Y, Geiger D (2005) Finding approximate tandem repeats in genomic sequences. J Comput Biol 12(7):928–942CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.CNRS/LIGMUniversité Paris-EstMarne-la-ValléeFrance
  2. 2.Department of Computer and Information ScienceBrooklyn College of CUNYBrooklyn ,NYUSA