CPM 2014: Combinatorial Pattern Matching pp 212-221

# Searching of Gapped Repeats and Subrepetitions in a Word

• Roman Kolpakov
• Mikhail Podolskiy
• Mikhail Posypkin
• Nickolay Khrapov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8486)

## Abstract

A gapped repeat is a factor of the form uvu where u and v are nonempty words. The period of the gapped repeat is defined as |u| + |v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α-gapped if its period is not greater than α|u|. A δ-subrepetition is a factor which exponent is less than 2 but is not less than 1 + δ (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We obtain that in a word of length n the number of maximal α-gapped repeats is bounded by O(α2n) and the number of maximal δ-subrepetitions is bounded by O(n/δ2). Using the obtained upper bounds, we propose algorithms for finding all maximal α-gapped repeats and all maximal δ-subrepetitions in a word of length n. The algorithm for finding all maximal α-gapped repeats has O(α2n) time complexity for the case of constant alphabet size and O(nlogn + α2n) time complexity for the general case. For finding all maximal δ-subrepetitions we propose two algorithms. The first algorithm has $$O(\frac{n\log\log n}{\delta^2})$$ time complexity for the case of constant alphabet size and $$O(n\log n +\frac{n\log\log n}{\delta^2})$$ time complexity for the general case. The second algorithm has $$O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta})$$ expected time complexity.

## Preview

Unable to display preview. Download preview PDF.

### References

1. 1.
Brodal, G., Lyngso, R., Pedersen, C., Stoye, J.: Finding Maximal Pairs with Bounded Gap. J. of Discrete Algorithms 1(1), 77–104 (2000)
2. 2.
Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Information Processing Letters 12, 244–250 (1981)
3. 3.
Crochemore, M., Rytter, W.: Squares, cubes, and time-space efficient string searching. Algorithmica 13, 405–425 (1995)
4. 4.
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007)Google Scholar
5. 5.
Crochemore, M., Ilie, L., Tinta, L.: Towards a solution to the “runs” conjecture. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 290–302. Springer, Heidelberg (2008)
6. 6.
Crochemore, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a string from its runs structure. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 258–269. Springer, Heidelberg (2010)
7. 7.
van Emde Boas, P., Kaas, R., Zulstra, E.: Design and Implementation of an Efficient Priority Queue. Mathematical Systems Theory 10, 99–127 (1977)
8. 8.
Galil, Z., Seiferas, J.: Time-space optimal string matching. J. of Computer and System Sciences 26(3), 280–294 (1983)
9. 9.
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)Google Scholar
10. 10.
Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. of Computer and System Sciences 69(4), 525–546 (2004)
11. 11.
Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Efficient Data Structures for the Factor Periodicity Problem. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 284–294. Springer, Heidelberg (2012)
12. 12.
Kolpakov, R., Kucherov, G.: On Maximal Repetitions in Words. J. of Discrete Algorithms 1(1), 159–186 (2000)
13. 13.
Kolpakov, R., Kucherov, G.: Finding Repeats with Fixed Gap. In: 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), pp. 162–168 (2000)Google Scholar
14. 14.
Kolpakov, R., Kucherov, G.: Periodic structures in words. Chapter for the 3rd Lothaire volume Applied Combinatorics on Words. Cambridge University Press (2005)Google Scholar
15. 15.
Kolpakov, R., Kucherov, G., Ochem, P.: On maximal repetitions of arbitrary exponent. Information Processing Letters 110(7), 252–256 (2010)
16. 16.
Kolpakov, R.: On primary and secondary repetitions in words. Theoretical Computer Science 418, 71–81 (2012)
17. 17.
Kolpakov, R., Podolskiy, M., Posypkin, M., Khrapov, N.: Searching of gapped repeats and subrepetitions in a word, http://arxiv.org/abs/1309.4055
18. 18.
Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley (1983)Google Scholar
19. 19.
Storer, J.: Data compression: Methods and theory. Computer Science Press, Rockville (1988)Google Scholar

## Copyright information

© Springer International Publishing Switzerland 2014

## Authors and Affiliations

• Roman Kolpakov
• 1
• Mikhail Podolskiy
• 1
• Mikhail Posypkin
• 2
• Nickolay Khrapov
• 2
1. 1.Lomonosov Moscow State UniversityMoscowRussia
2. 2.Institute for Information Transmission ProblemsMoscowRussia