Skip to main content

Double String Tandem Repeats

Abstract

A tandem repeat is an occurrence of two adjacent identical substrings. In this paper, we introduce the notion of a double string, which consists of two parallel strings, and we study the problem of locating all tandem repeats in a double string. The problem introduced here has applications beyond actual double strings, as we illustrate by solving two different problems with the algorithm of the double string tandem repeats problem. The first problem is that of finding all corner-sharing tandems in a 2-dimensional text, defined by Apostolico and Brimkov. The second problem is that of finding all scaled tandem repeats in a 1d text, where a scaled tandem repeat is defined as a string \(UU'\) such that \(U'\) is discrete scale of U. In addition to the algorithms for exact tandem repeats, we also present algorithms that solve the problem in the inexact sense, allowing up to k mismatches. We believe that this framework will open a new perspective for other problems in the future.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. In DNA there are specific relationships between corresponding bases, while our definition of a double string does not imply any such relationship.

References

  1. Amir, A., Butman, A., Lewenstein, M.: Real scaled matching. Inf. Process. Lett. 70(4), 185–190 (1999)

    MathSciNet  Article  Google Scholar 

  2. Apostolico, A., Brimkov, V.E.: Optimal discovery of repetitions in 2d. Discret. Appl. Math. 151(1–3), 5–20 (2005)

    MathSciNet  Article  Google Scholar 

  3. Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)

    MathSciNet  Article  Google Scholar 

  4. Crochemore, M., Ilie, L., Rytter, W.: Repetitions in strings: Algorithms and combinatorics. Theoretical Computer Science, 410(50):5227 – 5235 (2009). Mathematical Foundations of Computer Science (MFCS 2007)

  5. Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. SIGACT News 17(4), 52–54 (1986)

    Article  Google Scholar 

  6. Geizhals, S.H., Sokol, D.: Finding maximal 2-dimensional palindromes. Inf. Comput. 266, 161–172 (2019)

    MathSciNet  Article  Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  Google Scholar 

  8. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    MathSciNet  Article  Google Scholar 

  9. Iliopoulos, C.S., Moore, D., Smyth, W.F.: A characterization of the squares in a fibonacci string. Theoret. Comput. Sci. 172(1), 281–291 (1997)

    MathSciNet  Article  Google Scholar 

  10. Karp, R. M., Miller, R. E., Rosenberg, A. L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing (STOC), pp. 125–136 (1972)

  11. Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    MathSciNet  Article  Google Scholar 

  12. Kolpakov, R. M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pp. 596–604. IEEE Computer Society (1999)

  13. Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8, 1–18 (2001)

    Article  Google Scholar 

  14. Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988)

    MathSciNet  Article  Google Scholar 

  15. Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    MathSciNet  Article  Google Scholar 

  16. Liu, J.J., Huang, G.S., Wang, Y.L.: A fast algorithm for finding the positions of all squares in a run-length encoded string. Theoret. Comput. Sci. 410(38), 3942–3948 (2009)

    MathSciNet  Article  Google Scholar 

  17. Main, M.G., Lorentz, R.J.: An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)

    MathSciNet  Article  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    MathSciNet  Article  Google Scholar 

Download references

Funding

The authors A. Amir and G. M. Landau have been partially supported by Grant No. 2018141 from the United States-Israel Binational Science Foundation (BSF) and Israel Science Foundation Grant 1475-18. D. Sokol was also partially supported by BSF Grant No. 2018141. S. Marcus was partially supported by the Professional Staff Congress City University of New York Research Award 63164-00 51.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dina Sokol.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amir, A., Butman, A., Landau, G.M. et al. Double String Tandem Repeats. Algorithmica (2022). https://doi.org/10.1007/s00453-022-01016-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00453-022-01016-9

Keywords

  • Double string
  • Tandem repeat
  • 2d corner sharing tandem
  • Scaled tandem repeat