Advertisement

Simple and flexible detection of contiguous repeats using a suffix tree Preliminary Version

  • Jens Stoye
  • Dan Gusfield
Session IV
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1448)

Abstract

We study the problem of detecting all occurrences of (primitive) tandem repeats and tandem arrays in a string. We first give a simple time- and space- optimal algorithm to find all tandem repeats, and then modify it to become a time and space-optimal algorithm for finding only the primitive tandem repeats. Both of these algorithms are then extended to handle tandem arrays. The contribution of this paper is both pedagogical and practical, giving simple algorithms and implementations based on a suffix tree, using only standard tree traversal techniques.

Keywords

Tandem Repeat Internal Node Basic Algorithm Tandem Array Adjacent Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 85–96. Springer Verlag, 1985.Google Scholar
  2. 2.
    A. Apostolico and F. P. Preparata. Optimal off-line detection of repetitions in a string. Theor. Comput. Sci., 22:297–315, 1983.Google Scholar
  3. 3.
    M. Crochemore. An optimal algorithm for computing the repetitions in a word. Inform. Process. Lett., 12(5):244–250, 1981.Google Scholar
  4. 4.
    M. Crochemore and W. Rytter. Periodic prefixes in texts. In R. Capodelli, A. De Santis, and U. Vaccaro, editors, Sequences II, pages 153–165. Springer Verlag, 1993.Google Scholar
  5. 5.
    M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.Google Scholar
  6. 6.
    M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13(5):405–425, 1995.Google Scholar
  7. 7.
    M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annu. Symp. Found. Comput. Sci., FOCS 97, 1997. IEEE Press.Google Scholar
  8. 8.
    D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997.Google Scholar
  9. 9.
    R. W. Irving, Personal Communication.Google Scholar
  10. 10.
    S. R. Kosaraju. Computation of squares in a string. In M. Crochemore and D. Gusfield, editors, Combinatorial Pattern Matching: 5th Annual Symposium, CPM 94. Proceedings, number 807 in Lecture Notes in Computer Science, pages 146–150, 1994. Springer Verlag.Google Scholar
  11. 11.
    G. M. Landau, Personal Communication.Google Scholar
  12. 12.
    G. M. Landau and J. P. Schmidt. An algorithm for approximate tandem repeats. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern. Matching: 4th Annual Symposium, CPM 93. Proceedings, number 684 in Lecture Notes in Computer Science, pages 120–133, 1993. Springer Verlag.Google Scholar
  13. 13.
    M. G. Main and R. J. Lorentz. An O (n log n) algorithm for finding all repetitions in a string. J. Algor., 5:422–432, 1984.Google Scholar
  14. 14.
    M. G. Main and R. J. Lorentz. Linear time recognition of squarefree strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer Verlag, Berlin, 1985.Google Scholar
  15. 15.
    U. Manber and E. W. Myers. Suffix arrays: A new method for on-line search. SIAM J. Computing, 22:935–948, 1993.Google Scholar
  16. 16.
    E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.Google Scholar
  17. 17.
    J. P. Schmidt, Personal Communication.Google Scholar
  18. 18.
    P. F. Stelling. Applications of Combinatorial Analysis to Repetitions in Strings, Phylogeny, and Parallel Multiplier Design. Ph.d. dissertation, Department of Computer Science, University of California, Davis, 1995.Google Scholar
  19. 19.
    E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249–260, 1995.Google Scholar
  20. 20.
    P. Weiner. Linear pattern matching algorithms. In IEEE 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE Press, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Jens Stoye
    • 1
  • Dan Gusfield
    • 1
  1. 1.Department of Computer ScienceUniversity of California, DavisDavis

Personalised recommendations