Skip to main content

Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2012)

Abstract

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.

This work was supported by grants from the National Institutes of Health, USA (NIH grant 3P41RR024851-02S1) and the Government of the Russian Federation (grant 11.G34.31.0018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A., Dvorkin, M., Kulikov, A., Lesin, V., Nikolenko, S., Pham, S., Prjibelski, A., Pyshkin, A., Sirotkin, A., Vyahhi, N., Tesler, G., Alekseyev, M., Pevzner, P.: SPAdes: a New Genome Assembler and its Applications to Single Cell Sequencing (submitted, 2012)

    Google Scholar 

  2. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I., Belmonte, M., Lander, E., Nusbaum, C., Jaffe, D.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18(5), 810 (2008)

    Article  Google Scholar 

  3. Chaisson, M., Brinza, D., Pevzner, P.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19(2), 336 (2009)

    Article  Google Scholar 

  4. Chaisson, M., Pevzner, P.: Short read fragment assembly of bacterial genomes. Genome Research 18(2), 324 (2008)

    Article  Google Scholar 

  5. Chen, K., Wallis, J., McLellan, M., Larson, D., Kalicki, J., Pohl, C., McGrath, S., Wendl, M., Zhang, Q., Locke, D., et al.: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6(9), 677–681 (2009)

    Article  Google Scholar 

  6. Chikhi, R., Lavenier, D.: Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 39–48. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Donmez, N., Brudno, M.: Hapsembler: An Assembler for Highly Polymorphic Genomes. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 38–52. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)

    Google Scholar 

  9. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265 (2010)

    Article  Google Scholar 

  10. Medvedev, P., Pham, S., Chaisson, M., Tesler, G., Pevzner, P.: Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 238–251. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of gaussians. In: 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 93–102. IEEE (2010)

    Google Scholar 

  12. Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Pevzner, P., Tang, H.: Fragment assembly with double-barreled data. Bioinformatics 17(suppl. 1), S225 (2001)

    Article  Google Scholar 

  14. Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. PNAS 98(17), 9748 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, İ.: ABySS: a parallel assembler for short read sequence data. Genome Research 19(6), 1117 (2009)

    Article  Google Scholar 

  16. Young, S., Barthelson, R., McFarlin, A., Rounsley, S.: Plantagora toolset (2011), http://www.plantagora.org

  17. Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pham, S.K., Antipov, D., Sirotkin, A., Tesler, G., Pevzner, P.A., Alekseyev, M.A. (2012). Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics