Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers

  • Paul Medvedev
  • Son Pham
  • Mark Chaisson
  • Glenn Tesler
  • Pavel Pevzner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6577)

Abstract

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future.

Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: ARACHNE: A Whole-Genome Shotgun Assembler. Genome Research 12(1), 177–189 (2002)CrossRefGoogle Scholar
  2. 2.
    Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G., Hall, K.P., Evers, D.J., Barnes, C.L., Bignell, H.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218), 53–59 (2008)CrossRefGoogle Scholar
  3. 3.
    Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18, 810–820 (2008)CrossRefGoogle Scholar
  4. 4.
    Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19, 336–346 (2009)CrossRefGoogle Scholar
  5. 5.
    Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324–330 (2008)CrossRefGoogle Scholar
  6. 6.
    Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al.: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961), 78 (2010)CrossRefGoogle Scholar
  7. 7.
    Genome 10K Community of Scientists: Genome 10K: A proposal to obtain whole-genome sequence for 10000 vertebrate species. Journal of Heredity 100(6), 659–674 (2009)Google Scholar
  8. 8.
    Harris, T.D., Buzby, P.R., Babcock, H., Beer, E., Bowers, J., Braslavsky, I., Causey, M., Colonell, J., DiMeo, J., William Efcavitch, J., et al.: Single-molecule DNA sequencing of a viral genome. Science 320(5872), 106 (2008)CrossRefGoogle Scholar
  9. 9.
    Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. Journal of Computational Biology 2, 291–306 (1995)CrossRefGoogle Scholar
  10. 10.
    Kececioglu, J.D.: Exact and approximation algorithms for DNA sequence reconstruction. PhD thesis, University of Arizona, Tucson, AZ, USA (1992)Google Scholar
  11. 11.
    Medvedev, P., Brudno, M.: Ab initio whole genome shotgun assembly with mated short reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 50–64. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Medvedev, P., Georgiou, K., Myers, G., Brudno, M.: Computability of models for sequence assembly. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 289–301. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology 2, 275–290 (1995)CrossRefGoogle Scholar
  14. 14.
    Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl 2), ii79–ii85 (2005)CrossRefGoogle Scholar
  15. 15.
    Pevzner, P.A.: L-Tuple DNA sequencing: computer analysis. J. Biomol. Struct. Dyn. 7(1), 63–73 (1989)Google Scholar
  16. 16.
    Pevzner, P.A., Tang, H.: Fragment assembly with double-barreled data. Bioinformatics 17(suppl 1), S223–S225 (2001)CrossRefGoogle Scholar
  17. 17.
    Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Research 14(9), 1786–1796 (2004)CrossRefGoogle Scholar
  18. 18.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98(17), 9748–9753 (2001)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Schatz, M.C., Delcher, A.L., Salzberg, S.L.: Assembly of large genomes using second-generation sequencing. Genome Research 20(9), 1165–1173 (2010)CrossRefGoogle Scholar
  20. 20.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: A parallel assembler for short read sequence data. Genome Research 6, 1117 (2009)CrossRefGoogle Scholar
  21. 21.
    Weber, J.L., Myers, E.W.: Human whole-genome shotgun sequencing. Genome Research 7, 401–409 (1997)CrossRefGoogle Scholar
  22. 22.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18, 821–829 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Paul Medvedev
    • 1
  • Son Pham
    • 1
  • Mark Chaisson
    • 2
  • Glenn Tesler
    • 3
  • Pavel Pevzner
    • 1
  1. 1.Department of Computer Science and EngineeringUniv. of CaliforniaSan DiegoUSA
  2. 2.Pacific Biosciences of CaliforniaMenlo ParkUSA
  3. 3.Department of MathematicsUniv. of CaliforniaSan DiegoUSA

Personalised recommendations