Current Microbiology

, Volume 70, Issue 3, pp 338–344 | Cite as

Comparison of Genome Sequencing Technology and Assembly Methods for the Analysis of a GC-Rich Bacterial Genome



Improvements in technology and decreases in price have made de novo bacterial genomic sequencing a reality for many researchers, but it has created a need to evaluate the methods for generating a complete and accurate genome assembly. We sequenced the GC-rich Caulobacter henricii genome using the Illumina MiSeq, Roche 454, and Pacific Biosciences RS II sequencing systems. To generate a complete genome sequence, we performed assemblies using eight readily available programs and found that builds using the Illumina MiSeq and the Roche 454 data produced accurate yet numerous contigs. SPAdes performed the best followed by PANDAseq. In contrast, the Celera assembler produced a single genomic contig using the Pacific Biosciences data after error correction with the Illumina MiSeq data. In addition, we duplicated this build using the Pacific Biosciences data with HGAP2.0. The accuracy of these builds was verified by pulsed-field gel electrophoresis of genomic DNA cut with restriction enzymes.


Genome Assembly Illumina MiSeq Pacific Bioscience Genome Fraction Celera Assembler 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was funded in part by a fellowship from The Southern Region Educational Board (SREB) to DS and NIH grant GM076277 to BE. We would like to thank Nicole Rapicavoli at Pacific Biosciences for her assistance with the HGAP2 assembly, Alexey Gurevich and Anton Korobeynikov at the Algorithmic Biology Lab, St. Petersburg, Russia for their support with the SPAdes and QUAST programs, and special thanks to Nathan Elger and Paul Sagona who are a part of the Research Cyberinfrastructure at The University of South Carolina.

Supplementary material

284_2014_721_MOESM1_ESM.doc (56 kb)
Supplementary material 1 (DOC 56 kb)


  1. 1.
    Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Bartram AK, Lynch MD, Stearns JC, Moreno-Hagelsieb G, Neufeld JD (2011) Generation of Multimillion-Sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads. Appl Environ Microbiol. 77:3846–3852PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
  4. 4.
    Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569PubMedCrossRefGoogle Scholar
  5. 5.
    Consortium, T.H.M.P (2012) A framework for human microbiome research. Nature 486:215–221CrossRefGoogle Scholar
  6. 6.
    Consortium, T.H.M.P (2012) Structure, function and diversity of the healthy human microbiome. Nature 486:207–214CrossRefGoogle Scholar
  7. 7.
    Darling AE, Mau B, Perna NT (2010) ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5:e11147PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Ely B, Gerardot CJ (1988) Use of pulsed-field-gradient gel electrophoresis to construct a physical map of the Caulobacter crescentus genome. Gene 68:323–333PubMedCrossRefGoogle Scholar
  9. 9.
    Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512PubMedCrossRefGoogle Scholar
  10. 10.
    Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Jackman SD, Birol I (2010) Assembling genomes using short-read sequencing technology. Genome Biol 11:202PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, McVey SD, Radune D, Bergman NH, Phillippy AM (2013) Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol 14:R101PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Magoc T, Pabinger S, Canzar S, Liu X, Su Q, Puiu D, Tallon LJ, Salzberg SL (2013) GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics 29:1718–1725PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380PubMedCentralPubMedGoogle Scholar
  15. 15.
    Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204PubMedCrossRefGoogle Scholar
  16. 16.
    Narzisi G, Mishra B (2011) Comparing de novo genome assembly: the long and short of it. PLoS ONE 6:e19175PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Phillippy AM, Schatz MC, Pop M (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 9:R55PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom 13:341CrossRefGoogle Scholar
  19. 19.
    Schatz MC, Phillippy AM, Sommer DD, Delcher AL, Puiu D, Narzisi G, Salzberg SL, Pop M (2011) Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 14:213–224PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Shin SC, Ahn do H, Kim SJ, Lee H, oh TJ, Lee JE, Park H (2013) Advantages of single-molecule real-time sequencing in high-GC content genomes. PLoS One 8:e68824PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21. Accessed 2014
  22. 22.
  23. 23. Accessed 2014
  24. 24.
    Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669–2677PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Biological SciencesUniversity of South CarolinaColumbiaUSA

Personalised recommendations