Advertisement

Metagenome Assembly and Contig Assignment

  • Qingpeng Zhang
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1849)

Abstract

The recent development of metagenomic assembly has revolutionized metagenomic data analysis, thanks to the improvement of sequencing techniques, more powerful computational infrastructure and the development of novel algorithms and methods. Using longer assembled contigs rather than raw reads improves the process of metagenomic binning and annotation significantly, ultimately resulting in a deeper understanding of the microbial dynamics of the metagenomic samples being analyzed. In this chapter, we demonstrate a typical metagenomic analysis pipeline including raw read quality evaluation and trimming, assembly and contig binning. Alternative tools that can be used for each step are also discussed.

Key words

Metagenomics Assembly Binning Quality evaluation Annotation 

References

  1. 1.
    Pell J, Hintze A, Canino-Koning R et al (2012) Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc Natl Acad Sci U S A 109:13272–13277.  https://doi.org/10.1073/pnas.1121464109CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Sangwan N, Xia F, Gilbert JA (2016) Recovering complete and draft population genomes from metagenome datasets. Microbiome 4:8.  https://doi.org/10.1186/s40168-016-0154-5CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Kang DD, Froula J, Egan R, Wang Z (2014) MetaBAT: Metagenome binning based on abundance and tetranucleotide frequency. No. LBNL-7106E. Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CAGoogle Scholar
  4. 4.
    Alneberg J, Bjarnason BS, de Bruijn I, et al (2014) Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi:  https://doi.org/10.1038/nmeth.3103CrossRefGoogle Scholar
  5. 5.
    Sieber CMK, Probst AJ, Sharrar A et al (2017) Recovery of genomes from metagenomes via a dereplication, aggregation, and scoring strategy. bioRxiv:107789Google Scholar
  6. 6.
    Vollmers J, Wiegand S, Kaster AK (2017) Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective—not only size matters! PLoS One 12:e0169662CrossRefGoogle Scholar
  7. 7.
    Sczyrba A, Hofmann P, Belmann P et al (2017) Critical Assessment of Metagenome Interpretation—a benchmark of computational metagenomics software. bioRxiv:99127.  https://doi.org/10.1101/099127
  8. 8.
    Awad S, Irber L, Brown CT (2017) Evaluating metagenome assembly on a simple defined community with many strain variants. bioRxiv:155358Google Scholar
  9. 9.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.  https://doi.org/10.1093/bioinformatics/btu170CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Li D, Liu C-M, Luo R et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676.  https://doi.org/10.1093/bioinformatics/btv033CrossRefPubMedGoogle Scholar
  11. 11.
    Li R, Zhu H, Ruan J et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272.  https://doi.org/10.1101/gr.097261.109CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Luo R, Liu B, Xie Y et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18.  https://doi.org/10.1186/2047-217X-1-18CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090.  https://doi.org/10.1093/bioinformatics/btv697CrossRefPubMedGoogle Scholar
  14. 14.
    Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165.  https://doi.org/10.7717/peerj.1165CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359.  https://doi.org/10.1038/nmeth.1923CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079.  https://doi.org/10.1093/bioinformatics/btp352CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055.  https://doi.org/10.1101/GR.186072.114 gr.186072.114CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetJ 17:10.  https://doi.org/10.14806/ej.17.1.200CrossRefGoogle Scholar
  19. 19.
    Zhang Q, Awad S, Brown CT (2015) Crossing the streams: a framework for streaming analysis of short DNA sequencing reads. PeerJ Preprints.  https://doi.org/10.7287/peerj.preprints.890v1
  20. 20.
    Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Res 4:900.  https://doi.org/10.12688/f1000research.6924.1CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Zhang Q, Pell J, Canino-Koning R et al (2014) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 9:e101271.  https://doi.org/10.1371/journal.pone.0101271CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048.  https://doi.org/10.1093/bioinformatics/btw354CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829.  https://doi.org/10.1101/gr.074492.107CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477.  https://doi.org/10.1089/cmb.2012.0021CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40:e155–e155.  https://doi.org/10.1093/nar/gks678CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res 27:824–834.  https://doi.org/10.1101/gr.213959.116CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.  https://doi.org/10.1093/bioinformatics/bts174CrossRefPubMedGoogle Scholar
  28. 28.
    Boisvert S, Raymond F, Godzaridis É et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122.  https://doi.org/10.1186/gb-2012-13-12-r122CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Brown CT, Howe A, Zhang Q et al (2012) A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv preprint arXiv 1203:4802Google Scholar
  30. 30.
    Howe AC, Jansson JK, Malfatti SA et al (2014) Tackling soil diversity with the assembly of large, complex metagenomes. Proc Natl Acad Sci U S A 111:4904–4909.  https://doi.org/10.1073/pnas.1402564111CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46.  https://doi.org/10.1186/gb-2014-15-3-r46CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Gregor I, Dröge J, Schirmer M et al (2016) PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4:e1603.  https://doi.org/10.7717/peerj.1603CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Dröge J, Gregor I, McHardy AC (2015) Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31:817–824.  https://doi.org/10.1093/bioinformatics/btu745CrossRefPubMedGoogle Scholar
  34. 34.
    Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386.  https://doi.org/10.1101/gr.5969107CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Markowitz VM, Chen IMA, Chu K et al (2013) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42(D1):D568–D573CrossRefGoogle Scholar
  36. 36.
    Wilke A, Bischof J, Gerlach W et al (2015) The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res.  https://doi.org/10.1093/nar/gkv1322CrossRefGoogle Scholar
  37. 37.
    Wu Y, Simmons BA, Singer SW (2015) MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics:1–2.  https://doi.org/10.1093/bioinformatics/btv638CrossRefGoogle Scholar
  38. 38.
    Imelfort M, Parks D, Woodcroft BJ et al (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2:e603.  https://doi.org/10.7717/peerj.603CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Lin H-H, Liao Y-C (2016) Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep.  https://doi.org/10.1038/srep24175
  40. 40.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760.  https://doi.org/10.1093/bioinformatics/btp324CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM arXiv Preprint arXiv:1303.3997Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Qingpeng Zhang
    • 1
  1. 1.Department of EnergyJoint Genome InstituteWalnut CreekUSA

Personalised recommendations