On Algorithmic Complexity of Biomolecular Sequence Assembly Problem

  • Giuseppe Narzisi
  • Bud Mishra
  • Michael C. Schatz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8542)


Because of its connection to the well-known \(\mathcal{NP}\)-complete shortest superstring combinatorial optimization problem, the Sequence Assembly Problem (SAP) has been formulated in simple and sometimes unrealistic string and graph-theoretic frameworks. This paper revisits this problem by re-examining the relationship between the most common formulations of the SAP and their computational tractability under different theoretical frameworks. For each formulation we show examples of logically-consistent candidate solutions which are nevertheless unfeasible in the context of the underlying biological problem. This material is hoped to be valuable to theoreticians as they develop new formulations of SAP as well as of guidance to developers of new pipelines and algorithms for sequence assembly and variant detection.


Genome Assembly Sequence Assembly Problem Optimality \(\mathcal{NP}\)-complete Problem 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armen, C., Stein, C.: A 2 2/3-approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  2. 2.
    Bradnam, K., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2(1), 10 (2013)CrossRefGoogle Scholar
  3. 3.
    Church, A., Rosser, J.B.: Some properties of conversion. Transactions of the American Mathematical Society 39(3), 472–482 (1936)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Earl, D.A., et al.: Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Research (2011)Google Scholar
  5. 5.
    Gallant, J., Maier, D., Astorer, J.: On finding minimal length superstrings. Journal of Computer and System Sciences 20(1), 50–58 (1980)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Gnerre, S., et al.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108(4), 1513–1518 (2011)CrossRefGoogle Scholar
  7. 7.
    Hierholzer, C., Wiener, C.: Ueber die mglichkeit, einen linienzug ohne wiederholung und ohne unterbrechung zu umfahren. Mathematische Annalen 6(1), 30–32 (1873)Google Scholar
  8. 8.
    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics 44(2), 226–232 (2012)CrossRefGoogle Scholar
  9. 9.
    Karp, R.M.: The role of algorithmic research in computational genomics. In: Computational Systems Bioinformatics Conf, p. 10. IEEE Computer Society (2003)Google Scholar
  10. 10.
    Kingsford, C., Schatz, M., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(1), 21 (2010)Google Scholar
  11. 11.
    Koren, S., et al.: Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology 30(7), 693–700 (2012)CrossRefGoogle Scholar
  12. 12.
    Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265–272 (2010)CrossRefGoogle Scholar
  13. 13.
    Li, S., Li, R., Li, H., Lu, J., Li, Y., Bolund, L., Schierup, M., Wang, J.: Soapindel: Efficient identification of indels from short paired reads. Genome Research (2012)Google Scholar
  14. 14.
    Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3), 133–141 (2008)CrossRefGoogle Scholar
  15. 15.
    Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology 2, 275–290 (1995)CrossRefGoogle Scholar
  17. 17.
    Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)Google Scholar
  18. 18.
    Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Narzisi, G., Mishra, B.: Comparing de novo genome assembly: The long and short of it. PLoS ONE 6(4), e19175 (2011)Google Scholar
  20. 20.
    Narzisi, G., Mishra, B.: Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics 27(2), 153–160 (2011)Google Scholar
  21. 21.
    Narzisi, G., O’Rawe, J.A., Iossifov, I.: ha Lee, Y., Wang, Z., Wu, Y., Lyon, G.J., Wigler, M., Schatz, M.C.: Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly. bioRxiv (2013)Google Scholar
  22. 22.
    Rittel, H.W.J., Webber, M.M.: Dilemmas in a general theory of planning. Policy Sciences 4, 155–169 (1973)CrossRefGoogle Scholar
  23. 23.
    Roberts, R., Carneiro, M., Schatz, M.: The advantages of smrt sequencing. Genome Biology 14(7), 405 (2013)CrossRefGoogle Scholar
  24. 24.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefGoogle Scholar
  25. 25.
    Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988)zbMATHMathSciNetGoogle Scholar
  26. 26.
    Turner, J.S.: Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83(1), 1–20 (1989)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Giuseppe Narzisi
    • 1
  • Bud Mishra
    • 1
    • 2
  • Michael C. Schatz
    • 1
  1. 1.Cold Spring Harbor LaboratorySimons Center for Quantitative BiologyUSA
  2. 2.Courant Institute of Mathematical SciencesNew York UniversityNew YorkUSA

Personalised recommendations