Advertisement

Algorithms for Finding a Minimum Repetition Representation of a String

  • Atsuyoshi Nakamura
  • Tomoya Saito
  • Ichigaku Takigawa
  • Hiroshi Mamitsuka
  • Mineichi Kudo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6393)

Abstract

A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r) h . We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (·) h which are defined as w R (h) using a repetition weight function w R . We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(nlogn) in the worst case, CMR-C is an O(n 2logn)-time O(nlogn)-space algorithm, which is faster than CMR by ((logn)/n)-factor.

Keywords

tandem repeat string algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fraenkel, A., Simpson, J.: The exact number of squares in Fibonacci words. Theoretical Computer Science 218, 95–106 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences 69, 525–546 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Main, M., Lorentz, R.: An O(nlogn) algorithm for finding all repetitions in a string. Journal of Algorithms 5, 422–432 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Stoye, J., Gusfield, D.: Simple and Flexible Detection of Contiguous Repeats Using a Suffix Tree. Theoretical Computer Science 270, 843–856 (2002)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Atsuyoshi Nakamura
    • 1
  • Tomoya Saito
    • 1
  • Ichigaku Takigawa
    • 2
  • Hiroshi Mamitsuka
    • 2
  • Mineichi Kudo
    • 1
  1. 1.Hokkaido UniversitySapporoJapan
  2. 2.Institute for Chemical ResearchKyoto UniversityUji, KyotoJapan

Personalised recommendations