SPIRE 2010: String Processing and Information Retrieval pp 185-190 | Cite as
Algorithms for Finding a Minimum Repetition Representation of a String
Abstract
A string with many repetitions can be written compactly by replacing h-fold contiguous repetitions of substring r with (r) h . We refer to such a compact representation as a repetition representation string or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined to be the sum of its component letter sizes and the sizes needed to describe the repetitions (·) h which are defined as w R (h) using a repetition weight function w R . We develop two dynamic programming algorithms to solve the problem. One is CMR that works for any repetition weight function, and the other is CMR-C that is faster but can be applied only when the repetition weight function is constant. CMR-C is an O(w(n + z))-time algorithm using O(n + z) space for a given string with length n, where w and z are the number of distinct primitive tandem repeats and the number of their occurrences, respectively. Since w = O(n) and z = O(nlogn) in the worst case, CMR-C is an O(n 2logn)-time O(nlogn)-space algorithm, which is faster than CMR by ((logn)/n)-factor.
Keywords
tandem repeat string algorithmPreview
Unable to display preview. Download preview PDF.
References
- 1.Fraenkel, A., Simpson, J.: The exact number of squares in Fibonacci words. Theoretical Computer Science 218, 95–106 (1999)MathSciNetCrossRefMATHGoogle Scholar
- 2.Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences 69, 525–546 (2004)MathSciNetCrossRefMATHGoogle Scholar
- 3.Main, M., Lorentz, R.: An O(nlogn) algorithm for finding all repetitions in a string. Journal of Algorithms 5, 422–432 (1984)MathSciNetCrossRefMATHGoogle Scholar
- 4.Stoye, J., Gusfield, D.: Simple and Flexible Detection of Contiguous Repeats Using a Suffix Tree. Theoretical Computer Science 270, 843–856 (2002)MathSciNetCrossRefMATHGoogle Scholar