Algorithms for Finding a Minimum Repetition Representation of a String
A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r) h . We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (·) h which are defined as w R (h) using a repetition weight function w R . We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(nlogn) in the worst case, CMR-C is an O(n 2logn)-time O(nlogn)-space algorithm, which is faster than CMR by ((logn)/n)-factor.
Keywordstandem repeat string algorithm
Unable to display preview. Download preview PDF.