Finding Maximal Pairs with Bounded Gap

Brodal, Gerth Stølting; Lyngsø, Rune B.; Pedersen, Christian N. S.; Stoye, Jens

doi:10.1007/3-540-48452-3_11

Gerth Stølting Brodal⁶,
Rune B. Lyngsø⁶,
Christian N. S. Pedersen⁶ &
…
Jens Stoye⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1645))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

386 Accesses
14 Citations

Abstract

A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n+z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.

Supported by the ESPRIT Long Term Research Programme of the EU under project number 20244 (ALCOM-IT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G.M. Adel’son-Vel’skii and Y.M. Landis. An algorithm for the organization of information. Doklady Akademii Nauk SSSR, 146:263–266, 1962. English translation in Soviet Math. Dokl., 3:1259-1262.
MathSciNet Google Scholar
A. Apostolico and F.P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22:297–315, 1983.
Article MathSciNet MATH Google Scholar
G.S. Brodal, R.B. Lyngsø, C.N.S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Technical Report RS-99-12, BRICS, April 1999.
Google Scholar
M.R. Brown and R.E. Tarjan. A fast merging algorithm. Journal of the ACM, 26(2):211–226, 1979.
Article MathSciNet MATH Google Scholar
M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981.
Article MathSciNet MATH Google Scholar
M. Crochemore. Tranducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.
Article MathSciNet MATH Google Scholar
M. Farach. Optimal sufix tree construction with large alphabets. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.
Google Scholar
L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21, 1978.
Google Scholar
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
Google Scholar
D. Gusfield and J. Stoye. Linear time algorithms for_nding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Department of Computer Science, UC Davis, 1998.
Google Scholar
S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica, 17:157–184, 1982.
Article MathSciNet MATH Google Scholar
F.K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing, 1(1):31–39, 1972.
Article MathSciNet MATH Google Scholar
S. Karlin, M. Morris, G. Ghandour, and M.-Y. Leung. Efficient algorithms for molecular sequence analysis. Proceedings of the National Academy of Science, USA, 85:841–845, 1988.
Google Scholar
R. Kolpakov and G. Kucherov. Maximal repetitions in words or how to find all squares in linear time. Technical Report 98-R-227, LORIA, 1998.
Google Scholar
S.R. Kosaraju. Computation of squares in a string. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of Lecture Notes in Computer Science, pages 146–150, 1994.
Google Scholar
G.M. Landau and J.P. Schmidt. An algorithm for approximate tandem repeats. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science, pages 120–133, 1993.
Google Scholar
M.-Y. Leung, B.E. Blaisdell, C. Burge, and S. Karlin. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. Journal of Molecular Biology, 221:1367–1378, 1991.
Article Google Scholar
M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5:422–432, 1984.
Article MathSciNet MATH Google Scholar
M.G. Main and R.J. Lorentz. Linear time recognition of squarefree strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, Berlin, 1985.
Chapter Google Scholar
E.M. McCreight. A space-economical sufix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.
Article MathSciNet MATH Google Scholar
K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer-Verlag, 1994.
Google Scholar
K. Mehlhorn and S. Näher. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. To appear. See http://www.mpisb.mpg.de/_mehlhorn/LEDAbook.html.
M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB), pages 234–242, 1998.
Google Scholar
J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a sufix tree. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1448 of Lecture Notes in Computer Science, pages 140–152, 1998.
Google Scholar
E. Ukkonen. On-line construction of sufix trees. Algorithmica, 14:249–260, 1995.
Article MathSciNet MATH Google Scholar
P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.
Google Scholar

Download references

Author information

Authors and Affiliations

Basic Research in Computer Science (BRICS), Centre of the Danish National Research Foundation, Department of Computer Science, University of Aarhus, Ny Munkegade, 8000, Århus C, Denmark
Gerth Stølting Brodal, Rune B. Lyngsø & Christian N. S. Pedersen
Deutsches Krebsforschungszentrum (DKFZ), Theoretische Bioinformatik, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
Jens Stoye

Authors

Gerth Stølting Brodal
View author publications
You can also search for this author in PubMed Google Scholar
Rune B. Lyngsø
View author publications
You can also search for this author in PubMed Google Scholar
Christian N. S. Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Jens Stoye
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute Gaspard-Monge, University of Marne-la-Vallée, F 77454, Marne-la-Vallée Cedex 2, France
Maxime Crochemore
Department of Computer Science, University of Warwick, Coventry, CV4 7AL, England
Mike Paterson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brodal, G.S., Lyngsø, R.B., Pedersen, C.N.S., Stoye, J. (1999). Finding Maximal Pairs with Bounded Gap. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_11

Download citation

DOI: https://doi.org/10.1007/3-540-48452-3_11
Published: 08 July 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66278-5
Online ISBN: 978-3-540-48452-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics