Finding Maximal Pairs with Bounded Gap

  • Gerth Stølting Brodal
  • Rune B. Lyngsø
  • Christian N. S. Pedersen
  • Jens Stoye
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1645)

Abstract

A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n+z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.

Keywords

Tandem Repeat Internal Node Annual Symposium Maximal Pair Combinatorial Pattern Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    G.M. Adel’son-Vel’skii and Y.M. Landis. An algorithm for the organization of information. Doklady Akademii Nauk SSSR, 146:263–266, 1962. English translation in Soviet Math. Dokl., 3:1259-1262.MathSciNetGoogle Scholar
  2. 2.
    A. Apostolico and F.P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22:297–315, 1983.MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    G.S. Brodal, R.B. Lyngsø, C.N.S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Technical Report RS-99-12, BRICS, April 1999.Google Scholar
  4. 4.
    M.R. Brown and R.E. Tarjan. A fast merging algorithm. Journal of the ACM, 26(2):211–226, 1979.MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981.MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    M. Crochemore. Tranducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    M. Farach. Optimal sufix tree construction with large alphabets. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.Google Scholar
  8. 8.
    L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21, 1978.Google Scholar
  9. 9.
    D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  10. 10.
    D. Gusfield and J. Stoye. Linear time algorithms for_nding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Department of Computer Science, UC Davis, 1998.Google Scholar
  11. 11.
    S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica, 17:157–184, 1982.MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    F.K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing, 1(1):31–39, 1972.MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    S. Karlin, M. Morris, G. Ghandour, and M.-Y. Leung. Efficient algorithms for molecular sequence analysis. Proceedings of the National Academy of Science, USA, 85:841–845, 1988.Google Scholar
  14. 14.
    R. Kolpakov and G. Kucherov. Maximal repetitions in words or how to find all squares in linear time. Technical Report 98-R-227, LORIA, 1998.Google Scholar
  15. 15.
    S.R. Kosaraju. Computation of squares in a string. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of Lecture Notes in Computer Science, pages 146–150, 1994.Google Scholar
  16. 16.
    G.M. Landau and J.P. Schmidt. An algorithm for approximate tandem repeats. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science, pages 120–133, 1993.Google Scholar
  17. 17.
    M.-Y. Leung, B.E. Blaisdell, C. Burge, and S. Karlin. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. Journal of Molecular Biology, 221:1367–1378, 1991.CrossRefGoogle Scholar
  18. 18.
    M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5:422–432, 1984.MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    M.G. Main and R.J. Lorentz. Linear time recognition of squarefree strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, Berlin, 1985.CrossRefGoogle Scholar
  20. 20.
    E.M. McCreight. A space-economical sufix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer-Verlag, 1994.Google Scholar
  22. 22.
    K. Mehlhorn and S. Näher. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. To appear. See http://www.mpisb.mpg.de/_mehlhorn/LEDAbook.html.
  23. 23.
    M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB), pages 234–242, 1998.Google Scholar
  24. 24.
    J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a sufix tree. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1448 of Lecture Notes in Computer Science, pages 140–152, 1998.Google Scholar
  25. 25.
    E. Ukkonen. On-line construction of sufix trees. Algorithmica, 14:249–260, 1995.MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Gerth Stølting Brodal
    • 1
  • Rune B. Lyngsø
    • 1
  • Christian N. S. Pedersen
    • 1
  • Jens Stoye
    • 2
  1. 1.Basic Research in Computer Science (BRICS), Centre of the Danish National Research Foundation, Department of Computer ScienceUniversity of AarhusÅrhus CDenmark
  2. 2.Deutsches Krebsforschungszentrum (DKFZ)Theoretische BioinformatikHeidelbergGermany

Personalised recommendations