# A 2 2/3-approximation algorithm for the shortest superstring problem

## Abstract

Given a collection of strings *S={s*_{1}, ..., *s*_{n}*}* over an alphabet *Σ*, a *superstring α* of *S* is a string containing each *s*_{i} as a substring; that is, for each *i*, 1≤*i≤n, α* contains a block of ¦*s*_{i}¦ consecutive characters that match *s*_{i} exactly. The *shortest superstring problem* is the problem of finding a superstring *α* of minimum length.

The shortest superstring problem has applications in both data compression and computational biology. It was shown by Blum et al. [5] to be MAX SNP-hard. The first *O*(1)-approximation algorithm also appeared in [5], which returns a superstring no more than 3 times the length of an optimal solution. There have been several published results that improve on the approximation ratio; of these, the best is our algorithm ShortString, a 2 3/4-approximation [1].

We present our new algorithm, G-ShortString, which achieves a ratio of 2 2/3. Our approach builds on the work in [1], in which we identified classes of strings that have a nested periodic structure, and which must be present in the worst case for our algorithms. We introduced machinery to describe these strings and proved strong structural properties about them. In this paper we extend this study to strings that exhibit a more relaxed form of the same structure, and we use this understanding to obtain our improved result.

## Keywords

Approximation Algorithm Full Extension Distance Graph Dartmouth College Cycle Cover## Preview

Unable to display preview. Download preview PDF.

## References

- 1.C. Armen and C.Stein. Improved length bounds for the shortest superstring problem. In
*Proceedings of Workshop on Algorithms and Data Structures*, pages 494–505, 1995.Google Scholar - 2.C. Armen and C. Stein. A 2 2/3-approximation algorithm for the shortest superstring problem. Technical Report PCS-TR95-262, Dartmouth College, June 1995.Google Scholar
- 3.C. Armen and C. Stein. Short superstrings and the structure of overlapping strings. To appear in J. of Computational Biology, 1996.Google Scholar
- 4.Chris Armen.
*Approximation Algorithms for the Shortest Superstring Problem*. PhD thesis, Dartmouth College, July 1995.Google Scholar - 5.A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings.
*Journal of the ACM*, 41(4):630–647, July 1994.CrossRefGoogle Scholar - 6.Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest.
*Introduction to Algorithms*. MIT Press/McGraw-Hill, 1990.Google Scholar - 7.A. Czumaj, L. Gasieniec, M. Piotrow, and W. Rytter. Parallel and sequential approximations of shortest superstrings. In
*Proceedings of Fourth Scandinavian Workshop on Algorithm Theory*, pages 95–106, 1994.Google Scholar - 8.A. Lesk (edited).
*Computational Molecular Biology, Sources and Methods for Sequence Analysis*. Oxford University Press, 1988.Google Scholar - 9.A.M. Frieze, G. Galbiati, and F. Maffoli. On the worst case performance of some algorithms for the asymmetric travelling salesman problem.
*Networks*, 12:23–39, 1982.Google Scholar - 10.J. Gallant, D. Maier, and J. Storer. On finding minimal length superstrings.
*Journal of Computer and System Sciences*, 20:50–58, 1980.CrossRefGoogle Scholar - 11.D. Gusfield, G. Landau, and B. Schieber. An efficient algorithm for the all pairs suffix-prefix problem.
*Information Processing Letters*, (41):181–185, March 1992.MathSciNetGoogle Scholar - 12.Tao Jiang and Zhigen Jiang. Rotation of periodic strings and short superstrings. Unpublished Manuscript courtesy Tao Jiang March 1996.Google Scholar
- 13.Tao Jiang and Ming Li. Approximating shortest superstrings with constraints.
*Therotical Computer Science*, (134):473–491, 1994.Google Scholar - 14.J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly.
*Algorithmica*, 13(1/2):7–51, 1995.Google Scholar - 15.John D. Kececioglu.
*Exact and approximation algorithms for DNA sequence reconstruction*. PhD thesis, University of Arizona, 1991.Google Scholar - 16.R. Kosaraju, J. Park, and C. Stein. Long tours and short superstrings. In
*Proceedings of the 35th Annual Symposium on Foundations of Computer Science*, November 1994.Google Scholar - 17.M. Li. Towards a DNA sequencing theory (learning a string). In
*Proceedings of the 31st Annual Symposium on Foundations of Computer Science*, pages 125–134, 1990.Google Scholar - 18.Christos H. Papadimitriou and Kenneth Steiglitz.
*Combinatorial Optimization, Algorithms and Complexity*. Prentice-Hall, Englewood Cliffs, NJ, 1982.Google Scholar - 19.H. Peltola, H. Soderlund, J. Tarjio, and E. Ukkonen. Algorithms for some string matching problems arising in molecular genetics. In
*Proceedings of the IFIP Congress*, pages 53–64, 1983.Google Scholar - 20.Graham A. Stephen.
*String searching algorithms*. World Scientific, 1994.Google Scholar - 21.J. Storer.
*Data compression: methods and theory*. Computer Science Press, 1988.Google Scholar - 22.Shang-Hua Teng and Frances Yao. Approximating shortest superstrings. In
*Proceedings of the 34th Annual Symposium on Foundations of Computer Science*, pages 158–165, November 1993.Google Scholar - 23.J. Turner. Approximation algorithms for the shortest common superstring problem.
*Information and Computation*, 83:1–20, 1989.CrossRefGoogle Scholar