Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads

  • Paul Medvedev
  • Michael Brudno
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)

Abstract

Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms, and applications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1993)Google Scholar
  2. 2.
    Appa, G., Kotnyek, B.: A bidirected generalization of network matrices. Networks 47(4), 185–198 (2006)CrossRefMathSciNetMATHGoogle Scholar
  3. 3.
    Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: Arachne: a whole-genome shotgun assembler. Genome Res. 12(1), 177–189 (2002)CrossRefGoogle Scholar
  4. 4.
    Chaisson, M., Pevzner, P.A., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)CrossRefGoogle Scholar
  5. 5.
    Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome. Published online before print Doi:10.1101/gr.7088808 (2007)Google Scholar
  6. 6.
    Dohm, J., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007)CrossRefGoogle Scholar
  7. 7.
    Edmonds, J.: An introduction to matching. In: Notes of engineering summer conference, University of Michigan, Ann Arbor (1967)Google Scholar
  8. 8.
    Gabow, H.N.: An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems. In: STOC, pp. 448–456 (1983)Google Scholar
  9. 9.
    Goldberg, A.V.: An efficient implementation of a scaling minimum-cost flow algorithm. J. Algorithms 22(1), 1–29 (1997)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Hochbaum, D.S.: Monotonizing linear programs with up to two nonzeroes per column. Oper. Res. Lett. 32(1), 49–58 (2004)CrossRefMathSciNetMATHGoogle Scholar
  11. 11.
    Chen, J., Skiena, S.: Assembly For Double-Ended Short-Read Sequencing Technologies. In: Mardis, E., Kim, S., Tang, H. (eds.) Advances in Genome Sequencing Technology and Algorithms, pp. 123–141. Artech House Publishers (2007)Google Scholar
  12. 12.
    Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R., Dangl, J.L., Jones, C.D.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007)CrossRefGoogle Scholar
  13. 13.
    Kececioglu, J.D.: Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, Tucson, AZ, USA (1992)Google Scholar
  14. 14.
    Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: WABI, pp. 289–301 (2007)Google Scholar
  15. 15.
    Myers, E.W.: The fragment assembly string graph. In: ECCB/JBI, p. 85 (2005)Google Scholar
  16. 16.
    Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H.J., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.-H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., Venter, J.C.: A Whole-Genome Assembly of Drosophila. Science 287, 2196–2204 (2000)CrossRefGoogle Scholar
  17. 17.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98, 9748–9753 (2001)CrossRefMathSciNetMATHGoogle Scholar
  18. 18.
    Pevzner, P.A., Tang, H.: Fragment assembly with double-barreled data. In: ISMB (Supplement of Bioinformatics), pp. 225–233 (2001)Google Scholar
  19. 19.
    Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: RECOMB, pp. 213–222 (2004)Google Scholar
  20. 20.
    Warren, R.L., Sutton, G.G., Jones, S.J., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501 (2007)CrossRefGoogle Scholar
  21. 21.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Paul Medvedev
    • 1
  • Michael Brudno
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of TorontoCanada
  2. 2.Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoCanada

Personalised recommendations