dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes

  • Yana Safonova
  • Anton Bankevich
  • Pavel A. Pevzner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)

Abstract

While the number of sequenced diploid genomes of interest have been steadily increasing in the last few years, assembly of highly polymorphic (HP) diploid genomes remains challenging. As a result, there is shortage of tools for assembling HP genomes from NGS data. The initial approaches to assembling HP genomes were proposed in the pre-NGS era and are not well suited for NGS projects. We present the first de Bruijn graph assembler dipSPAdes for HP genomes and demonstrate that it significantly improves on the state-of-the-art in the HP genome assembly.

Keywords

genome assembly polymorphism de Bruijn graph SPAdes 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguiar, D., Istrail, S.: Hapcompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology 19, 577–590 (2012)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Aparicio, S., et al.: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301–1310 (2002)CrossRefGoogle Scholar
  3. 3.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A.: SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 19, 455–477 (2012)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research 18, 1336–1346 (2008)CrossRefGoogle Scholar
  5. 5.
    Barriere, A., Yang, S., Pekarek, E., Thomas, C., Haag, E., Ruvinsky, I.: Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. Genome Research 19, 470–480 (2009)CrossRefGoogle Scholar
  6. 6.
    Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J., Lander, E.: Arachne: a whole-genome shotgun assembler. Genome Research 12, 177–189 (2002)CrossRefGoogle Scholar
  7. 7.
    Compeau, F., Pevzner, P., Tesler, G.: How to apply de bruijn graphs to genome assembly. Nature Biotechnology 29, 987–991 (2011)CrossRefGoogle Scholar
  8. 8.
    Dehal, P., et al.: The draft genome of Ciona intestinalis: Insights into chordate and vertebrate origins. Science 298, 2157–2167 (2002)CrossRefGoogle Scholar
  9. 9.
    Donmez, N., Brudno, M.: Hapsembler: An Assembler for Highly Polymorphic Genomes. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 38–52. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: QUAST: Quality Assessment Tool for Genome Assemblies. Bioinformatics 29, 1072–1075 (2013)CrossRefGoogle Scholar
  11. 11.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26, i183–i190 (2010)Google Scholar
  12. 12.
    Huang, S., Chen, Z., Huang, G., Yu, T., Yang, P., Li, J., Fu, Y., Yuan, S., Chen, S., Xu, A.: HaploMerger: reconstructing allelic relationships for polymorphic diploid genome assemblies. Genome Research 22, 1581–1588 (2012)CrossRefGoogle Scholar
  13. 13.
    Magoc, T., Pabinger, S., Canzar, S., Liu, X., Su, Q., Puiu, D., Tallon, L.J., Salzberg, S.L.: GAGE-B: An evaluation of genome assemblers for bacterial organiss. Bioinformatics 29, 1718–1725 (2013)CrossRefGoogle Scholar
  14. 14.
    Ohm, R.A., et al.: Genome sequence of the model mushroom Schizophyllum commune. Nature 28, 957–963 (2010)Google Scholar
  15. 15.
    Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U S A 98, 9748–9753 (2001)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Salzberg, S.L., et al.: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Research 22, 557–567 (2012)CrossRefGoogle Scholar
  17. 17.
    Vinson, J.P., Jaffe, D.B., O’Neill, K., Karlsson, E.K., Stange-Thomann, N., Anderson, S., Mesirov, J.P., Satoh, N., Satou, Y., Nusbaum, C., Birren, B., Galagan, J.E., Lander, E.S.: Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Research 15, 1127–1135 (2005)CrossRefGoogle Scholar
  18. 18.
    Xie, M., Wang, J., Chen, J.: A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors. Bioinformatics 24, i105–i113 (2008)Google Scholar
  19. 19.
    Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)CrossRefGoogle Scholar
  20. 20.
    Zhao, Y.Y., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29, 281–287 (2005)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yana Safonova
    • 1
  • Anton Bankevich
    • 1
    • 2
  • Pavel A. Pevzner
    • 1
    • 3
  1. 1.Algorithmic Biology LaboratorySt. Petersburg Academic University, Russian Academy of SciencesSt. PetersburgRussia
  2. 2.Theodosius Dobzhansky Center for Genome BioinformaticsSt. Petersburg State UniversitySt. PetersburgRussia
  3. 3.Dept. of Computer Science and EngineeringUniversity of CaliforniaSan Diego, La JollaUSA

Personalised recommendations