ESA 2001: Algorithms — ESA 2001 pp 170-181 | Cite as

Finding Approximate Repetitions under Hamming Distance

  • Roman Kolpakov
  • Gregory Kucherov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2161)

Abstract

The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [17].

Keywords

Tandem Repeat String Match Maximal Repetition Exact Repetition Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    G. Benson. An algorithm for finding tandem repeats of unspecified pattern size. In S. Istrail, P. Pevzner, and M. Waterman, editors, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB 98), pages 20–29. ACM Press, 1998.Google Scholar
  2. 2.
    G. Benson. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research, 27(2):573–580, 1999.CrossRefMathSciNetGoogle Scholar
  3. 3.
    G. Benson and M. Waterman. A method for fast database search for all k-nucleotide repeats. Nucleic Acids Research, 22:4828–4836, 1994.CrossRefGoogle Scholar
  4. 4.
    R. Cole and R. Hariharan. Approximate string matching: A simpler faster algorithm. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 463–472, San Francisco, California, 25–27 January 1998.Google Scholar
  5. 5.
    M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13:405–425, 1995.MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12:244–250, 1981.MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    M. Crochemore. Recherche linéaire d’un carré dans un mot. Comptes Rendus Acad. Sci. Paris Sér. I Math., 296:781–784, 1983.MATHMathSciNetGoogle Scholar
  8. 8.
    M. Giraud and G. Kucherov. Maximal repetitions and application to DNA sequences. In Proceedings of the Journées Ouvertes: Biologie, Informatique et Mathématiques, pages 165–172, Montpelier, 3–5 mai 2000.Google Scholar
  9. 9.
    Z. Galil and J. Seiferas. Time-space optimal string matching. Journal of Computer and System Sciences, 26(3):280–294, 1983.CrossRefMathSciNetGoogle Scholar
  10. 10.
    D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  11. 11.
    C.S. Iliopoulos, D. Moore, and W.F. Smyth. A characterization of the squares in a Fibonacci string. Theoretical Computer Science, 172:281–291, 1997.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In Proceedings of the 1999 Symposium on Foundations of Computer Science, New York (USA). IEEE Computer Society, October 17–19 1999.Google Scholar
  13. 13.
    R. Kolpakov and G. Kucherov. On maximal repetitions in words. In Proceedings of the 12-th International Symposium on Fundamentals of Computation Theory, 1999, Iasi (Romania), Lecture Notes in Computer Science, August 30–September 3 1999.Google Scholar
  14. 14.
    R. Kolpakov and G. Kucherov. Finding approximate repetitions under hamming distance. Technical Report RR-4163, INRIA, avril 2001. available at http://www.inria.fr/rrrt/rr-4163.html.
  15. 15.
    S. R. Kosaraju. Computation of squares in string. In M. Crochemore and D. Gusfield, editors, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, number 807 in Lecture Notes in Computer Science, pages 146–150. Springer Verlag, 1994.Google Scholar
  16. 16.
    M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications. Addison Wesley, 1983.Google Scholar
  17. 17.
    G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, number 684 in Lecture Notes in Computer Science, pages 120–133, Padova, Italy, 1993. Springer-Verlag, Berlin.CrossRefGoogle Scholar
  18. 18.
    G. Landau, J. Schmidt, and D. Sokol. An algorithm for approximate tandem repeats. Journal of Computational Biology, 8(1):1–18, 2001.CrossRefGoogle Scholar
  19. 19.
    M. G. Main. Detecting leftmost maximal periodicities. Discrete Applied Mathematics, 25:145–153, 1989.CrossRefMathSciNetMATHGoogle Scholar
  20. 20.
    M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422–432, 1984.MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    E. Rivals, O. Delgrange, J-P. Delahaye, and M. Dauchet. A first step towards chromosome analysis by compression algorithms. In N.G. Bourbakis, editor, Proceedings of the 1st IEEE Symposium on Intelligence in Neural and Biological Systems (INBS), Herndon, VA. IEEE Computer Society, May 29–31 1995.Google Scholar
  22. 22.
    M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16–24, Jan 1981.Google Scholar
  23. 23.
    J. Stoye and D. Gusfield. Linear time algorithms for finding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Computer Science Department, University of California, Davis, 1998.Google Scholar
  24. 24.
    A.O. Slisenko. Detection of periodicities and string matching in real time. Journal of Soviet Mathematics, 22:1316–1386, 1983.MATHCrossRefGoogle Scholar
  25. 25.
    M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In fS. Istrail, P. Pevzner, and M. Waterman, editors, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB 98), pages 234–242. ACM Press, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Roman Kolpakov
    • 1
  • Gregory Kucherov
    • 1
  1. 1.LORIA/INRIA-LorraineVillers-lès-NancyFrance

Personalised recommendations