Advertisement

A 2 2/3-approximation algorithm for the shortest superstring problem

  • Chris Armen
  • Clifford Stein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1075)

Abstract

Given a collection of strings S={s1, ..., sn} over an alphabet Σ, a superstring α of S is a string containing each si as a substring; that is, for each i, 1≤i≤n, α contains a block of ¦si¦ consecutive characters that match si exactly. The shortest superstring problem is the problem of finding a superstring α of minimum length.

The shortest superstring problem has applications in both data compression and computational biology. It was shown by Blum et al. [5] to be MAX SNP-hard. The first O(1)-approximation algorithm also appeared in [5], which returns a superstring no more than 3 times the length of an optimal solution. There have been several published results that improve on the approximation ratio; of these, the best is our algorithm ShortString, a 2 3/4-approximation [1].

We present our new algorithm, G-ShortString, which achieves a ratio of 2 2/3. Our approach builds on the work in [1], in which we identified classes of strings that have a nested periodic structure, and which must be present in the worst case for our algorithms. We introduced machinery to describe these strings and proved strong structural properties about them. In this paper we extend this study to strings that exhibit a more relaxed form of the same structure, and we use this understanding to obtain our improved result.

Keywords

Approximation Algorithm Full Extension Distance Graph Dartmouth College Cycle Cover 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Armen and C.Stein. Improved length bounds for the shortest superstring problem. In Proceedings of Workshop on Algorithms and Data Structures, pages 494–505, 1995.Google Scholar
  2. 2.
    C. Armen and C. Stein. A 2 2/3-approximation algorithm for the shortest superstring problem. Technical Report PCS-TR95-262, Dartmouth College, June 1995.Google Scholar
  3. 3.
    C. Armen and C. Stein. Short superstrings and the structure of overlapping strings. To appear in J. of Computational Biology, 1996.Google Scholar
  4. 4.
    Chris Armen. Approximation Algorithms for the Shortest Superstring Problem. PhD thesis, Dartmouth College, July 1995.Google Scholar
  5. 5.
    A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Journal of the ACM, 41(4):630–647, July 1994.CrossRefGoogle Scholar
  6. 6.
    Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press/McGraw-Hill, 1990.Google Scholar
  7. 7.
    A. Czumaj, L. Gasieniec, M. Piotrow, and W. Rytter. Parallel and sequential approximations of shortest superstrings. In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory, pages 95–106, 1994.Google Scholar
  8. 8.
    A. Lesk (edited). Computational Molecular Biology, Sources and Methods for Sequence Analysis. Oxford University Press, 1988.Google Scholar
  9. 9.
    A.M. Frieze, G. Galbiati, and F. Maffoli. On the worst case performance of some algorithms for the asymmetric travelling salesman problem. Networks, 12:23–39, 1982.Google Scholar
  10. 10.
    J. Gallant, D. Maier, and J. Storer. On finding minimal length superstrings. Journal of Computer and System Sciences, 20:50–58, 1980.CrossRefGoogle Scholar
  11. 11.
    D. Gusfield, G. Landau, and B. Schieber. An efficient algorithm for the all pairs suffix-prefix problem. Information Processing Letters, (41):181–185, March 1992.MathSciNetGoogle Scholar
  12. 12.
    Tao Jiang and Zhigen Jiang. Rotation of periodic strings and short superstrings. Unpublished Manuscript courtesy Tao Jiang March 1996.Google Scholar
  13. 13.
    Tao Jiang and Ming Li. Approximating shortest superstrings with constraints. Therotical Computer Science, (134):473–491, 1994.Google Scholar
  14. 14.
    J.D. Kececioglu and E.W. Myers. Combinatorial algorithms for dna sequence assembly. Algorithmica, 13(1/2):7–51, 1995.Google Scholar
  15. 15.
    John D. Kececioglu. Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, 1991.Google Scholar
  16. 16.
    R. Kosaraju, J. Park, and C. Stein. Long tours and short superstrings. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, November 1994.Google Scholar
  17. 17.
    M. Li. Towards a DNA sequencing theory (learning a string). In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, pages 125–134, 1990.Google Scholar
  18. 18.
    Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization, Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982.Google Scholar
  19. 19.
    H. Peltola, H. Soderlund, J. Tarjio, and E. Ukkonen. Algorithms for some string matching problems arising in molecular genetics. In Proceedings of the IFIP Congress, pages 53–64, 1983.Google Scholar
  20. 20.
    Graham A. Stephen. String searching algorithms. World Scientific, 1994.Google Scholar
  21. 21.
    J. Storer. Data compression: methods and theory. Computer Science Press, 1988.Google Scholar
  22. 22.
    Shang-Hua Teng and Frances Yao. Approximating shortest superstrings. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science, pages 158–165, November 1993.Google Scholar
  23. 23.
    J. Turner. Approximation algorithms for the shortest common superstring problem. Information and Computation, 83:1–20, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Chris Armen
    • 1
  • Clifford Stein
    • 2
  1. 1.University of HartfordW. HartfordUSA
  2. 2.Dartmouth CollegeHanoverUSA

Personalised recommendations