Skip to main content

Finding Maximal Pairs with Bounded Gap

  • Conference paper
  • First Online:
Book cover Combinatorial Pattern Matching (CPM 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1645))

Included in the following conference series:

Abstract

A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n+z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.

Supported by the ESPRIT Long Term Research Programme of the EU under project number 20244 (ALCOM-IT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G.M. Adel’son-Vel’skii and Y.M. Landis. An algorithm for the organization of information. Doklady Akademii Nauk SSSR, 146:263–266, 1962. English translation in Soviet Math. Dokl., 3:1259-1262.

    MathSciNet  Google Scholar 

  2. A. Apostolico and F.P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22:297–315, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  3. G.S. Brodal, R.B. Lyngsø, C.N.S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Technical Report RS-99-12, BRICS, April 1999.

    Google Scholar 

  4. M.R. Brown and R.E. Tarjan. A fast merging algorithm. Journal of the ACM, 26(2):211–226, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  5. M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981.

    Article  MathSciNet  MATH  Google Scholar 

  6. M. Crochemore. Tranducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  7. M. Farach. Optimal sufix tree construction with large alphabets. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.

    Google Scholar 

  8. L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21, 1978.

    Google Scholar 

  9. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Google Scholar 

  10. D. Gusfield and J. Stoye. Linear time algorithms for_nding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Department of Computer Science, UC Davis, 1998.

    Google Scholar 

  11. S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica, 17:157–184, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  12. F.K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing, 1(1):31–39, 1972.

    Article  MathSciNet  MATH  Google Scholar 

  13. S. Karlin, M. Morris, G. Ghandour, and M.-Y. Leung. Efficient algorithms for molecular sequence analysis. Proceedings of the National Academy of Science, USA, 85:841–845, 1988.

    Google Scholar 

  14. R. Kolpakov and G. Kucherov. Maximal repetitions in words or how to find all squares in linear time. Technical Report 98-R-227, LORIA, 1998.

    Google Scholar 

  15. S.R. Kosaraju. Computation of squares in a string. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of Lecture Notes in Computer Science, pages 146–150, 1994.

    Google Scholar 

  16. G.M. Landau and J.P. Schmidt. An algorithm for approximate tandem repeats. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science, pages 120–133, 1993.

    Google Scholar 

  17. M.-Y. Leung, B.E. Blaisdell, C. Burge, and S. Karlin. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. Journal of Molecular Biology, 221:1367–1378, 1991.

    Article  Google Scholar 

  18. M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5:422–432, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  19. M.G. Main and R.J. Lorentz. Linear time recognition of squarefree strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, Berlin, 1985.

    Chapter  Google Scholar 

  20. E.M. McCreight. A space-economical sufix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.

    Article  MathSciNet  MATH  Google Scholar 

  21. K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer-Verlag, 1994.

    Google Scholar 

  22. K. Mehlhorn and S. Näher. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. To appear. See http://www.mpisb.mpg.de/_mehlhorn/LEDAbook.html.

  23. M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB), pages 234–242, 1998.

    Google Scholar 

  24. J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a sufix tree. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1448 of Lecture Notes in Computer Science, pages 140–152, 1998.

    Google Scholar 

  25. E. Ukkonen. On-line construction of sufix trees. Algorithmica, 14:249–260, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  26. P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brodal, G.S., Lyngsø, R.B., Pedersen, C.N.S., Stoye, J. (1999). Finding Maximal Pairs with Bounded Gap. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-48452-3_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66278-5

  • Online ISBN: 978-3-540-48452-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics