CPM 2007: Combinatorial Pattern Matching pp 307-315

# Fast and Practical Algorithms for Computing All the Runs in a String

• Gang Chen
• Simon J. Puglisi
• W. F. Smyth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4580)

## Abstract

A repetition in a string x is a substring $${ \bf{w}} = {\it \bf{u}}^e$$ of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring $${\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}$$ of “maximal periodicity”, where $${\it \bf{u}}^e$$ is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as $$|{\it \bf{u}}|$$ repetitions. The maximum number of repetitions in any string $${\it \bf{x}} = {\it \bf{x}}[1..n]$$ is well known to be Θ(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach’s Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main’s Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.

## Keywords

Practical Algorithm Suffix Array Array Construction Maximal Periodicity Large Alphabet
## Authors and Affiliations

• Gang Chen
• 1
• Simon J. Puglisi
• 2
• W. F. Smyth
• 1
• 2
1. 1.Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1Canada
2. 2.Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845Australia