Skip to main content

On Algorithmic Complexity of Biomolecular Sequence Assembly Problem

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 8542)

Abstract

Because of its connection to the well-known \(\mathcal{NP}\)-complete shortest superstring combinatorial optimization problem, the Sequence Assembly Problem (SAP) has been formulated in simple and sometimes unrealistic string and graph-theoretic frameworks. This paper revisits this problem by re-examining the relationship between the most common formulations of the SAP and their computational tractability under different theoretical frameworks. For each formulation we show examples of logically-consistent candidate solutions which are nevertheless unfeasible in the context of the underlying biological problem. This material is hoped to be valuable to theoreticians as they develop new formulations of SAP as well as of guidance to developers of new pipelines and algorithms for sequence assembly and variant detection.

Keywords

  • Genome Assembly
  • Sequence Assembly Problem
  • Optimality
  • \(\mathcal{NP}\)-complete Problem

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-07953-0_15
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-07953-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Armen, C., Stein, C.: A 2 2/3-approximation algorithm for the shortest superstring problem. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 87–101. Springer, Heidelberg (1996)

    CrossRef  Google Scholar 

  2. Bradnam, K., et al.: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2(1), 10 (2013)

    CrossRef  Google Scholar 

  3. Church, A., Rosser, J.B.: Some properties of conversion. Transactions of the American Mathematical Society 39(3), 472–482 (1936)

    CrossRef  MathSciNet  Google Scholar 

  4. Earl, D.A., et al.: Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Research (2011)

    Google Scholar 

  5. Gallant, J., Maier, D., Astorer, J.: On finding minimal length superstrings. Journal of Computer and System Sciences 20(1), 50–58 (1980)

    CrossRef  MATH  MathSciNet  Google Scholar 

  6. Gnerre, S., et al.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108(4), 1513–1518 (2011)

    CrossRef  Google Scholar 

  7. Hierholzer, C., Wiener, C.: Ueber die mglichkeit, einen linienzug ohne wiederholung und ohne unterbrechung zu umfahren. Mathematische Annalen 6(1), 30–32 (1873)

    Google Scholar 

  8. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics 44(2), 226–232 (2012)

    CrossRef  Google Scholar 

  9. Karp, R.M.: The role of algorithmic research in computational genomics. In: Computational Systems Bioinformatics Conf, p. 10. IEEE Computer Society (2003)

    Google Scholar 

  10. Kingsford, C., Schatz, M., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(1), 21 (2010)

    Google Scholar 

  11. Koren, S., et al.: Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology 30(7), 693–700 (2012)

    CrossRef  Google Scholar 

  12. Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265–272 (2010)

    CrossRef  Google Scholar 

  13. Li, S., Li, R., Li, H., Lu, J., Li, Y., Bolund, L., Schierup, M., Wang, J.: Soapindel: Efficient identification of indels from short paired reads. Genome Research (2012)

    Google Scholar 

  14. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3), 133–141 (2008)

    CrossRef  Google Scholar 

  15. Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  16. Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology 2, 275–290 (1995)

    CrossRef  Google Scholar 

  17. Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)

    Google Scholar 

  18. Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)

    CrossRef  MathSciNet  Google Scholar 

  19. Narzisi, G., Mishra, B.: Comparing de novo genome assembly: The long and short of it. PLoS ONE 6(4), e19175 (2011)

    Google Scholar 

  20. Narzisi, G., Mishra, B.: Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics 27(2), 153–160 (2011)

    Google Scholar 

  21. Narzisi, G., O’Rawe, J.A., Iossifov, I.: ha Lee, Y., Wang, Z., Wu, Y., Lyon, G.J., Wigler, M., Schatz, M.C.: Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly. bioRxiv (2013)

    Google Scholar 

  22. Rittel, H.W.J., Webber, M.M.: Dilemmas in a general theory of planning. Policy Sciences 4, 155–169 (1973)

    CrossRef  Google Scholar 

  23. Roberts, R., Carneiro, M., Schatz, M.: The advantages of smrt sequencing. Genome Biology 14(7), 405 (2013)

    CrossRef  Google Scholar 

  24. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)

    CrossRef  Google Scholar 

  25. Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988)

    MATH  MathSciNet  Google Scholar 

  26. Turner, J.S.: Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83(1), 1–20 (1989)

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Narzisi, G., Mishra, B., Schatz, M.C. (2014). On Algorithmic Complexity of Biomolecular Sequence Assembly Problem. In: Dediu, AH., Martín-Vide, C., Truthe, B. (eds) Algorithms for Computational Biology. AlCoB 2014. Lecture Notes in Computer Science(), vol 8542. Springer, Cham. https://doi.org/10.1007/978-3-319-07953-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07953-0_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07952-3

  • Online ISBN: 978-3-319-07953-0

  • eBook Packages: Computer ScienceComputer Science (R0)