Advertisement

Algorithms for Three Versions of the Shortest Common Superstring Problem

  • Maxime Crochemore
  • Marek Cygan
  • Costas Iliopoulos
  • Marcin Kubica
  • Jakub Radoszewski
  • Wojciech Rytter
  • Tomasz Waleń
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6129)

Abstract

The input to the Shortest Common Superstring (SCS) problem is a set S of k words of total length n. In the classical version the output is an explicit word SCS(S) in which each s ∈ S occurs at least once. In our paper we consider two versions with multiple occurrences, in which the input includes additional numbers (multiplicities), given in binary. Our output is the word SCS(S) given implicitly in a compact form, since its real size could be exponential. We also consider a case when all input words are of length two, where our main algorithmic tool is a compact representation of Eulerian cycles in multigraphs. Due to exponential multiplicities of edges such cycles can be exponential and the compact representation is needed. Other tools used in our paper are a polynomial case of integer linear programming and a min-plus product of matrices.

Keywords

Short Path Integer Linear Programming Regular Expression Compact Representation Real Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Armen, C., Stein, C.: A 2 2/3-approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Myers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)Google Scholar
  2. 2.
    Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41(4), 630–647 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. Journal of Algorithms 24(2), 340–353 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company, Singapore (2002)CrossRefGoogle Scholar
  5. 5.
    Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome research 17(11), 1697–1706 (2007)CrossRefGoogle Scholar
  6. 6.
    Eisenbrand, F.: Fast integer programming in fixed dimension. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 196–207. Springer, Heidelberg (2003)Google Scholar
  7. 7.
    Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)zbMATHGoogle Scholar
  9. 9.
    Gusfield, D., Landau, G.M., Schieber, B.: An efficient algorithm for the all pairs suffix-prefix problem. Inf. Process. Lett. 41(4), 181–185 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading (1979)zbMATHGoogle Scholar
  11. 11.
    Lenstra Jr., H.W.: Integer programming with a fixed number of variables. Mathematics of Operations Research 8(4), 538–548 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)CrossRefGoogle Scholar
  13. 13.
    Myers, E.W., et al.: A whole-genome assembly of drosophila. Science 287(5461), 2196–2204 (2000)CrossRefGoogle Scholar
  14. 14.
    Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P., Batzoglou, S.: Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE 2(5), e484 (2007)CrossRefGoogle Scholar
  15. 15.
    Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Warren, R.L., Sutton, G.G., Jones, S.J., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)CrossRefGoogle Scholar
  17. 17.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research 18(5), 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Maxime Crochemore
    • 1
    • 3
  • Marek Cygan
    • 2
  • Costas Iliopoulos
    • 1
    • 4
  • Marcin Kubica
    • 2
  • Jakub Radoszewski
    • 2
  • Wojciech Rytter
    • 2
    • 5
  • Tomasz Waleń
    • 2
  1. 1.King’s College LondonLondonUK
  2. 2.Dept. of Mathematics, Computer Science and MechanicsUniversity of WarsawWarsawPoland
  3. 3.Université Paris-EstFrance
  4. 4.Digital Ecosystems & Business Intelligence InstituteCurtin University of TechnologyPerthAustralia
  5. 5.Dept. of Math. and InformaticsCopernicus UniversityToruńPoland

Personalised recommendations