Metagenome Assembly and Contig Assignment

  • Qingpeng Zhang
Part of the Methods in Molecular Biology book series (MIMB, volume 1849)


The recent development of metagenomic assembly has revolutionized metagenomic data analysis, thanks to the improvement of sequencing techniques, more powerful computational infrastructure and the development of novel algorithms and methods. Using longer assembled contigs rather than raw reads improves the process of metagenomic binning and annotation significantly, ultimately resulting in a deeper understanding of the microbial dynamics of the metagenomic samples being analyzed. In this chapter, we demonstrate a typical metagenomic analysis pipeline including raw read quality evaluation and trimming, assembly and contig binning. Alternative tools that can be used for each step are also discussed.

Key words

Metagenomics Assembly Binning Quality evaluation Annotation 


  1. 1.
    Pell J, Hintze A, Canino-Koning R et al (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc Natl Acad Sci U S A 109:13272–13277. Scholar
  2. 2.
    Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8. Scholar
  3. 3.
    Kang DD, Froula J, Egan R, Wang Z (2014) MetaBAT: Metagenome binning based on abundance and tetranucleotide frequency. No. LBNL-7106E. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CAGoogle Scholar
  4. 4.
    Alneberg J, Bjarnason BS, de Bruijn I, et al (2014) Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi: Scholar
  5. 5.
    Sieber CMK, Probst AJ, Sharrar A et al (2017) Recovery of genomes from metagenomes via a dereplication, aggregation, and scoring strategy. bioRxiv:107789Google Scholar
  6. 6.
    Vollmers J, Wiegand S, Kaster AK (2017) Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective—not only size matters! PLoS One 12:e0169662CrossRefGoogle Scholar
  7. 7.
    Sczyrba A, Hofmann P, Belmann P et al (2017) Critical Assessment of Metagenome Interpretation—a benchmark of computational metagenomics software. bioRxiv:99127.
  8. 8.
    Awad S, Irber L, Brown CT (2017) Evaluating metagenome assembly on a simple defined community with many strain variants. bioRxiv:155358Google Scholar
  9. 9.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. Scholar
  10. 10.
    Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. Scholar
  11. 11.
    Li R, Zhu H, Ruan J et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272. Scholar
  12. 12.
    Luo R, Liu B, Xie Y et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. Scholar
  13. 13.
    Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090. Scholar
  14. 14.
    Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. Scholar
  15. 15.
    Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. Scholar
  16. 16.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. Scholar
  17. 17.
    Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. gr.186072.114CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetJ 17:10. Scholar
  19. 19.
    Zhang Q, Awad S, Brown CT (2015) Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. PeerJ Preprints.
  20. 20.
    Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Res 4:900. Scholar
  21. 21.
    Zhang Q, Pell J, Canino-Koning R et al (2014) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 9:e101271. Scholar
  22. 22.
    Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048. Scholar
  23. 23.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. Scholar
  24. 24.
    Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. Scholar
  25. 25.
    Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:e155–e155. Scholar
  26. 26.
    Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834. Scholar
  27. 27.
    Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428. Scholar
  28. 28.
    Boisvert S, Raymond F, Godzaridis É et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122. Scholar
  29. 29.
    Brown CT, Howe A, Zhang Q et al (2012) A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv preprint arXiv 1203:4802Google Scholar
  30. 30.
    Howe AC, Jansson JK, Malfatti SA et al (2014) Tackling soil diversity with the assembly of large, complex metagenomes. Proc Natl Acad Sci U S A 111:4904–4909. Scholar
  31. 31.
    Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. Scholar
  32. 32.
    Gregor I, Dröge J, Schirmer M et al (2016) PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4:e1603. Scholar
  33. 33.
    Dröge J, Gregor I, McHardy AC (2015) Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31:817–824. Scholar
  34. 34.
    Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386. Scholar
  35. 35.
    Markowitz VM, Chen IMA, Chu K et al (2013) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42(D1):D568–D573CrossRefGoogle Scholar
  36. 36.
    Wilke A, Bischof J, Gerlach W et al (2015) The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. Scholar
  37. 37.
    Wu Y, Simmons BA, Singer SW (2015) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics:1–2. Scholar
  38. 38.
    Imelfort M, Parks D, Woodcroft BJ et al (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2:e603. Scholar
  39. 39.
    Lin H-H, Liao Y-C (2016) Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep.
  40. 40.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. Scholar
  41. 41.
    Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM arXiv Preprint arXiv:1303.3997Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Qingpeng Zhang
    • 1
  1. 1.Department of EnergyJoint Genome InstituteWalnut CreekUSA

Personalised recommendations