CPM 2007: Combinatorial Pattern Matching pp 307-315

# Fast and Practical Algorithms for Computing All the Runs in a String

• Gang Chen
• Simon J. Puglisi
• W. F. Smyth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4580)

## Abstract

A repetition in a string x is a substring $${ \bf{w}} = {\it \bf{u}}^e$$ of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring $${\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}$$ of “maximal periodicity”, where $${\it \bf{u}}^e$$ is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as $$|{\it \bf{u}}|$$ repetitions. The maximum number of repetitions in any string $${\it \bf{x}} = {\it \bf{x}}[1..n]$$ is well known to be Θ(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach’s Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main’s Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.

## Keywords

Practical Algorithm Suffix Array Array Construction Maximal Periodicity Large Alphabet
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algs. 2, 53–86 (2004)
2. 2.
Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theoret. comput. sci. 22, 297–315 (1983)
3. 3.
Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inform. process. lett. 12(5), 244–250 (1981)
4. 4.
Fan, K., Puglisi, S.J., Smyth, W.F., Turpin, A.: A new periodicity lemma. SIAM J. Discrete Math. 20(3), 656–668 (2006)
5. 5.
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th FOCS, pp. 137–143 (1997)Google Scholar
6. 6.
Franek, F., Holub, J., Smyth, W.F., Xiao, X.: Computing quasi suffix arrays. J. Automata, Languages & Combinatorics 8(4), 593–606 (2003)
7. 7.
Franek, F., Simpson, R. J., Smyth, W. F.: The maximum number of runs in a string. In: Miller, M., Park, K.(eds.) Proc. 14th AWOCA, pp. 26–35 (2003)Google Scholar
8. 8.
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing & string matching. SIAM J. Computing 35(2), 378–407 (2005)
9. 9.
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. 30th ICALP. pp. 943–955 (2003)Google Scholar
10. 10.
Karlin, S., Ghandour, G., Ost, F., Tavare, S., Korn, L.J.: New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. USA 80, 5660–5664 (1983)
11. 11.
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, Springer, Heidelberg (2001)Google Scholar
12. 12.
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, Springer, Heidelberg (2003)
13. 13.
Kolpakov, R., Kucherov, G.: http://bioinfo.lifl.fr/mreps/
14. 14.
Kolpakov, R., Kucherov, G.: On maximal repetitions in words. J. Discrete Algs. 1, 159–186 (2000)
15. 15.
Kurtz, S.: Reducing the space requirement of suffix trees. Software Practice & Experience 29(13), 1149–1171 (1999)
16. 16.
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Information Theory 22, 75–81 (1976)
17. 17.
Lentin, A., Schützenberger, M.P.: A combinatorial problem in the theory of free monoids, Combinatorial Mathematics & Its Applications. In: Bose, R.C., Dowling, T.A. (eds.) University of North Carolina Press, pp. 128–144 (1969)Google Scholar
18. 18.
Main, M.G.: Detecting leftmost maximal periodicities. Discrete Applied Maths 25, 145–153 (1989)
19. 19.
Main, M.G., Lorentz, R.J.: An O(n log n) Algorithm for Recognizing Repetition, Tech. Rep. CS-79–056, Computer Science Department, Washington State University (1979)Google Scholar
20. 20.
Main, M.G., Lorentz, R.J.: An O(nlog n) algorithm for finding all repetitions in a string. J. Algs. 5, 422–432 (1984)
21. 21.
Mäkinen, V., Navarro, G.: Compressed full-text indices. ACM Computing Surveys (to appear)Google Scholar
22. 22.
Maniscalco, M., Puglisi, S.J.: Faster lightweight suffix array construction. In: Ryan, J., Dafik (eds.) Proc. 17th AWOCA pp. 16–29 (2006)Google Scholar
23. 23.
Manzini, G.: Two space-saving tricks for linear time LCP computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, Springer, Heidelberg (2004)Google Scholar
24. 24.
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)
25. 25.
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 32(2), 262–272 (1976)
26. 26.
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys (to appear)Google Scholar
27. 27.
Rytter, W.: The number of runs in a string: improved analysis of the linear upper bound. In: Durand, B., Thomas, W. (eds.) Proc. 23rd STACS. LNCS, vol. 2884, pp. 184–195. Springer, Heidelberg (2006)Google Scholar
28. 28.
Sadakane, K.: Space-efficient data structures for flexible text retrieval systems. In: Bose, P., Morin, P. (eds.) ISAAC 2002. LNCS, vol. 2518, Springer, Heidelberg (2002)Google Scholar
29. 29.
Smyth, B.: Computing Patterns in Strings, Pearson Addison-Wesley, p. 423 (2003)Google Scholar
30. 30.
Thue, A.: Über unendliche zeichenreihen. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana 7, 1–22 (1906)Google Scholar
31. 31.
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
32. 32.
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Annual IEEE Symp. Switching & Automata Theory, pp. 1–11 (1973)Google Scholar
33. 33.
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Information Theory 23, 337–343 (1977)

## Authors and Affiliations

• Gang Chen
• 1
• Simon J. Puglisi
• 2
• W. F. Smyth
• 1
• 2
1. 1.Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1Canada
2. 2.Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845Australia