# Algorithms for Finding a Minimum Repetition Representation of a String

## Abstract

A string with many repetitions can be written compactly by replacing *h*-fold contiguous repetitions of substring *r* with (*r*)^{ h }. We refer to such a compact representation as a *repetition representation string* or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a *minimum RRS* or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (·)^{ h } which are defined as *w* _{ R }(*h*) using a repetition weight function *w* _{ R }. We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an *O*(*w*(*n* + *z*))-time algorithm using *O*(*n* + *z*) space for a given string with length *n*, where *w* and *z* are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since *w* = *O*(*n*) and *z* = *O*(*n*log*n*) in the worst case, CMR-C is an *O*(*n* ^{2}log*n*)-time *O*(*n*log*n*)-space algorithm, which is faster than CMR by ((log*n*)/*n*)-factor.

## Keywords

tandem repeat string algorithm## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Fraenkel, A., Simpson, J.: The exact number of squares in Fibonacci words. Theoretical Computer Science 218, 95–106 (1999)MathSciNetCrossRefMATHGoogle Scholar
- 2.Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences 69, 525–546 (2004)MathSciNetCrossRefMATHGoogle Scholar
- 3.Main, M., Lorentz, R.: An
*O*(*n*log*n*) algorithm for finding all repetitions in a string. Journal of Algorithms 5, 422–432 (1984)MathSciNetCrossRefMATHGoogle Scholar - 4.Stoye, J., Gusfield, D.: Simple and Flexible Detection of Contiguous Repeats Using a Suffix Tree. Theoretical Computer Science 270, 843–856 (2002)MathSciNetCrossRefMATHGoogle Scholar