Searching of Gapped Repeats and Subrepetitions in a Word

  • Roman Kolpakov
  • Mikhail Podolskiy
  • Mikhail Posypkin
  • Nickolay Khrapov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8486)

Abstract

A gapped repeat is a factor of the form uvu where u and v are nonempty words. The period of the gapped repeat is defined as |u| + |v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α-gapped if its period is not greater than α|u|. A δ-subrepetition is a factor which exponent is less than 2 but is not less than 1 + δ (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We obtain that in a word of length n the number of maximal α-gapped repeats is bounded by O(α2n) and the number of maximal δ-subrepetitions is bounded by O(n/δ2). Using the obtained upper bounds, we propose algorithms for finding all maximal α-gapped repeats and all maximal δ-subrepetitions in a word of length n. The algorithm for finding all maximal α-gapped repeats has O(α2n) time complexity for the case of constant alphabet size and O(nlogn + α2n) time complexity for the general case. For finding all maximal δ-subrepetitions we propose two algorithms. The first algorithm has \(O(\frac{n\log\log n}{\delta^2})\) time complexity for the case of constant alphabet size and \(O(n\log n +\frac{n\log\log n}{\delta^2})\) time complexity for the general case. The second algorithm has \(O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta})\) expected time complexity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brodal, G., Lyngso, R., Pedersen, C., Stoye, J.: Finding Maximal Pairs with Bounded Gap. J. of Discrete Algorithms 1(1), 77–104 (2000)MathSciNetGoogle Scholar
  2. 2.
    Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Information Processing Letters 12, 244–250 (1981)CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Crochemore, M., Rytter, W.: Squares, cubes, and time-space efficient string searching. Algorithmica 13, 405–425 (1995)CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007)Google Scholar
  5. 5.
    Crochemore, M., Ilie, L., Tinta, L.: Towards a solution to the “runs” conjecture. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 290–302. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Crochemore, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a string from its runs structure. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 258–269. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    van Emde Boas, P., Kaas, R., Zulstra, E.: Design and Implementation of an Efficient Priority Queue. Mathematical Systems Theory 10, 99–127 (1977)CrossRefMATHGoogle Scholar
  8. 8.
    Galil, Z., Seiferas, J.: Time-space optimal string matching. J. of Computer and System Sciences 26(3), 280–294 (1983)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)Google Scholar
  10. 10.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. of Computer and System Sciences 69(4), 525–546 (2004)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Efficient Data Structures for the Factor Periodicity Problem. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 284–294. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Kolpakov, R., Kucherov, G.: On Maximal Repetitions in Words. J. of Discrete Algorithms 1(1), 159–186 (2000)MathSciNetGoogle Scholar
  13. 13.
    Kolpakov, R., Kucherov, G.: Finding Repeats with Fixed Gap. In: 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), pp. 162–168 (2000)Google Scholar
  14. 14.
    Kolpakov, R., Kucherov, G.: Periodic structures in words. Chapter for the 3rd Lothaire volume Applied Combinatorics on Words. Cambridge University Press (2005)Google Scholar
  15. 15.
    Kolpakov, R., Kucherov, G., Ochem, P.: On maximal repetitions of arbitrary exponent. Information Processing Letters 110(7), 252–256 (2010)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Kolpakov, R.: On primary and secondary repetitions in words. Theoretical Computer Science 418, 71–81 (2012)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Kolpakov, R., Podolskiy, M., Posypkin, M., Khrapov, N.: Searching of gapped repeats and subrepetitions in a word, http://arxiv.org/abs/1309.4055
  18. 18.
    Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley (1983)Google Scholar
  19. 19.
    Storer, J.: Data compression: Methods and theory. Computer Science Press, Rockville (1988)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Roman Kolpakov
    • 1
  • Mikhail Podolskiy
    • 1
  • Mikhail Posypkin
    • 2
  • Nickolay Khrapov
    • 2
  1. 1.Lomonosov Moscow State UniversityMoscowRussia
  2. 2.Institute for Information Transmission ProblemsMoscowRussia

Personalised recommendations