Finding Maximal Pairs with Bounded Gap
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n+z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
KeywordsTandem Repeat Internal Node Annual Symposium Maximal Pair Combinatorial Pattern Match
Unable to display preview. Download preview PDF.
- 3.G.S. Brodal, R.B. Lyngsø, C.N.S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Technical Report RS-99-12, BRICS, April 1999.Google Scholar
- 7.M. Farach. Optimal sufix tree construction with large alphabets. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.Google Scholar
- 8.L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21, 1978.Google Scholar
- 9.D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
- 10.D. Gusfield and J. Stoye. Linear time algorithms for_nding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Department of Computer Science, UC Davis, 1998.Google Scholar
- 13.S. Karlin, M. Morris, G. Ghandour, and M.-Y. Leung. Efficient algorithms for molecular sequence analysis. Proceedings of the National Academy of Science, USA, 85:841–845, 1988.Google Scholar
- 14.R. Kolpakov and G. Kucherov. Maximal repetitions in words or how to find all squares in linear time. Technical Report 98-R-227, LORIA, 1998.Google Scholar
- 15.S.R. Kosaraju. Computation of squares in a string. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of Lecture Notes in Computer Science, pages 146–150, 1994.Google Scholar
- 16.G.M. Landau and J.P. Schmidt. An algorithm for approximate tandem repeats. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science, pages 120–133, 1993.Google Scholar
- 21.K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer-Verlag, 1994.Google Scholar
- 22.K. Mehlhorn and S. Näher. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. To appear. See http://www.mpisb.mpg.de/_mehlhorn/LEDAbook.html.
- 23.M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB), pages 234–242, 1998.Google Scholar
- 24.J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a sufix tree. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1448 of Lecture Notes in Computer Science, pages 140–152, 1998.Google Scholar
- 26.P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.Google Scholar