Skip to main content

Metagenomics Using Next-Generation Sequencing

  • Protocol
  • First Online:
Book cover Environmental Microbiology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1096))

Abstract

Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture [1]. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as “metagenomics” or “community genomics”. However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun (“random”) sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amann R, Ludwig W, Schleifer K (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59:143–169

    CAS  PubMed Central  PubMed  Google Scholar 

  2. Breitbart M et al (2002) Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A 99:14250–14255

    CAS  PubMed Central  PubMed  Google Scholar 

  3. Venter JC et al (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66–74

    CAS  PubMed  Google Scholar 

  4. Breitbart M et al (2003) Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol 185: 6220–6223

    CAS  PubMed Central  PubMed  Google Scholar 

  5. Hallam SJ et al (2004) Reverse methanogenesis: testing the hypothesis with environmental genomics. Science 305:1457–1462

    CAS  PubMed  Google Scholar 

  6. Gill SR et al (2006) Metagenomic analysis of the human distal gut microbiome. Science 312:1355–1359

    CAS  PubMed Central  PubMed  Google Scholar 

  7. Warnecke F et al (2007) Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 450: 560–565

    CAS  PubMed  Google Scholar 

  8. Tringe SG et al (2005) Comparative metagenomics of microbial communities. Science 308:554–557

    CAS  PubMed  Google Scholar 

  9. Tyson GW et al (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37–43

    CAS  PubMed  Google Scholar 

  10. Béjà O et al (2000) Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289:1902–1906

    PubMed  Google Scholar 

  11. Hess M et al (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331:463–467

    CAS  PubMed  Google Scholar 

  12. Hemme CL et al (2010) Metagenomic insights into evolution of a heavy metal-contaminated groundwater microbial community. ISME J 4:660–672

    CAS  PubMed  Google Scholar 

  13. Pagani I et al (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 40:D571–D579

    CAS  PubMed Central  PubMed  Google Scholar 

  14. Peterson J et al (2009) The NIH human microbiome project. Genome Res 19:2317–2323

    PubMed  Google Scholar 

  15. Kroeber M et al (2009) Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing. J Biotechnol 142:38–49

    CAS  Google Scholar 

  16. Boetius A et al (2000) A marine microbial consortium apparently mediating anaerobic oxidation of methane. Nature 407:623–626

    CAS  PubMed  Google Scholar 

  17. DeAngelis KM et al (2011) Characterization of trapped lignin-degrading microbes in tropical forest soil. PLoS ONE 6:e19306

    CAS  PubMed Central  PubMed  Google Scholar 

  18. Ding H, Valentine DL (2008) Methanotrophic bacteria occupy benthic microbial mats in shallow marine hydrocarbon seeps, Coal Oil Point, California. J Geophys Res 113:G01015

    Google Scholar 

  19. Edwards R et al (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7:57

    PubMed Central  PubMed  Google Scholar 

  20. Havelsrud O et al (2011) A metagenomic study of methanotrophic microorganisms in coal oil Point seep sediments. BMC Microbiol 11:221

    PubMed Central  PubMed  Google Scholar 

  21. Poinar HN et al (2006) Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311:392–394

    CAS  PubMed  Google Scholar 

  22. Turnbaugh PJ et al (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1131

    PubMed  Google Scholar 

  23. Coetzee B et al (2010) Deep sequencing analysis of viruses infecting grapevines: virome of a vineyard. Virology 400:157–163

    CAS  PubMed  Google Scholar 

  24. Lazarevic V et al (2009) Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J Microbiol Meth 79:266–271

    CAS  Google Scholar 

  25. Qin J et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–65

    CAS  PubMed Central  PubMed  Google Scholar 

  26. Sorek R et al (2007) Genome-wide experimental determination of barriers to horizontal gene transfer. Science 318:1449–1452

    CAS  PubMed  Google Scholar 

  27. Huse SM et al (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143

    PubMed Central  PubMed  Google Scholar 

  28. Gilles A et al (2011) Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12:245

    PubMed Central  PubMed  Google Scholar 

  29. Bordoni R et al (2008) Evaluation of human gene variant detection in amplicon pools by the GS-FLX parallel pyrosequencer. BMC Genomics 9:464

    PubMed Central  PubMed  Google Scholar 

  30. Moore M et al (2006) Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol 6:17

    PubMed Central  PubMed  Google Scholar 

  31. Hornshøj H et al (2009) Transcriptomic and proteomic profiling of two porcine tissues using high-throughput technologies. BMC Genomics 10:30

    PubMed Central  PubMed  Google Scholar 

  32. Jimnez DJ et al (2012) Structural and functional insights from the metagenome of an acidic hot spring microbial planktonic community in the Columbian Andes. PLoS ONE 7(12):e50269

    Google Scholar 

  33. Kunin V et al (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123

    CAS  PubMed  Google Scholar 

  34. Dohm JC et al (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36:e105

    PubMed Central  PubMed  Google Scholar 

  35. Hillier LW et al (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat Meth 5:183–188

    CAS  Google Scholar 

  36. Aird D et al (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12:R18

    CAS  PubMed Central  PubMed  Google Scholar 

  37. Quail MA et al (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Meth 5:1005–1010

    CAS  Google Scholar 

  38. Kozarewa I et al (2009) Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G + C)-biased genomes. Nat Meth 6:291–295

    CAS  Google Scholar 

  39. Dohm JC et al (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17:1697–1706

    CAS  PubMed  Google Scholar 

  40. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    CAS  PubMed  Google Scholar 

  41. DiGuistini S et al (2009) De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol 10:R94

    PubMed Central  PubMed  Google Scholar 

  42. Reinhardt JA et al (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 19:294–305

    CAS  PubMed  Google Scholar 

  43. Whiteford N et al (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res 33:e171

    PubMed Central  PubMed  Google Scholar 

  44. Kassai-Jáger E et al (2008) Distribution and evolution of short tandem repeats in closely related bacterial genomes. Gene 410:18–25

    PubMed  Google Scholar 

  45. Rothberg JM et al (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348–352

    CAS  PubMed  Google Scholar 

  46. Bragg LM et al (2013) Shining a light on dark sequencing: characterising errors in ion torrent PGM data. PLoS Comp Biol 9(4):e1003031

    CAS  Google Scholar 

  47. Quail MA et al (2012) A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genom 13:341

    CAS  Google Scholar 

  48. Loman NJ et al (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotech 30(5):434–439

    CAS  Google Scholar 

  49. Jünemann S et al (2013) Bacterial community shift in treated periodontitis patients revealed by ion torrent 16S rRNA gene amplicon sequencing. PLoS ONE 7(8):e41606

    Google Scholar 

  50. Yergeau E et al (2012) Next-generation sequencing of microbial communities in the Athabasca river and its tributaries in relation to oil sands mining activities. Appl Environ Microbiol 78(21):7626–7637

    CAS  PubMed Central  PubMed  Google Scholar 

  51. Solonenko SA et al (2013) Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics 14:320

    CAS  PubMed Central  PubMed  Google Scholar 

  52. Whitely AS et al (2012) Microbial 16S rRNA Ion Tag and community metagenome sequencing using the Ion Torrent (PGM) Platform. J Microbiol Meth 91:80–88

    Google Scholar 

  53. Seshadri R et al (2007) CAMERA: a community resource for metagenomics. PLoS Biol 5:e75

    PubMed Central  PubMed  Google Scholar 

  54. Markowitz VM et al (2006) An experimental metagenome data management and analysis system. Bioinformatics 22:e359–e367

    CAS  PubMed  Google Scholar 

  55. Meyer F et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386

    CAS  PubMed Central  PubMed  Google Scholar 

  56. The Hannon Lab FASTX toolkit. http://hannonlab.cshl.edu/fastx_toolkit/index.html

  57. Babraham Bioinformatics FASTQC. FASTQC at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

  58. Blanca J et al (2011) ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics 12:285

    PubMed Central  PubMed  Google Scholar 

  59. Quinlan AR et al (2008) Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Meth 5:179–181

    CAS  Google Scholar 

  60. Ossowski S et al (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033

    CAS  PubMed  Google Scholar 

  61. Balzer S et al (2010) Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim. Bioinformatics 26:i420–i425

    CAS  PubMed  Google Scholar 

  62. Quince C et al (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6:639–641

    CAS  PubMed  Google Scholar 

  63. Bragg LM et al (2012) Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods 9(5):425–426

    CAS  PubMed  Google Scholar 

  64. Salzberg SL et al (2008) Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol 4:e1000186

    PubMed Central  PubMed  Google Scholar 

  65. Simpson JT et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123

    CAS  PubMed  Google Scholar 

  66. MacCallum I et al (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10:R103

    PubMed Central  PubMed  Google Scholar 

  67. Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330

    CAS  PubMed  Google Scholar 

  68. Pop M et al (2004) Comparative genome assembly. Brief Bioinform 5(3):237–248

    CAS  PubMed  Google Scholar 

  69. Peng Y et al (2011) Meta-IDBA: a De Novo assembler for metagenomic data. Bioinformatics 27(13):i94–i101

    CAS  PubMed  Google Scholar 

  70. Ye Y, Tang H (2009) An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 7: 455–471

    CAS  PubMed Central  PubMed  Google Scholar 

  71. Namiki T et al (2012) Metavelvet: an extension of Velvet Assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40(20):e155

    CAS  PubMed Central  PubMed  Google Scholar 

  72. Treangen TJ et al (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinform 33:11.8.1–11.8.18

    Google Scholar 

  73. Chevreux B, Wetter T, Suhai S (1999) Genome sequence assembly using trace signals and additional sequence information. Computer Sci Biol 99:45–56

    Google Scholar 

  74. Boisvert S et al (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122

    PubMed  Google Scholar 

  75. Morowitz MJ et al (2011) Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. Proc Natl Acad Sci U S A 108:1128–1133

    CAS  PubMed Central  PubMed  Google Scholar 

  76. Bonfield JK, Whitwham A (2010) Gap5—editing the billion fragment sequence assembly. Bioinformatics 26:1699–1703

    CAS  PubMed  Google Scholar 

  77. Boetzer M et al (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579

    CAS  PubMed  Google Scholar 

  78. Salmela L et al (2011) Fast scaffolding with small independent mixed integer programs. Bioinformatics 27:3259–3265

    CAS  PubMed  Google Scholar 

  79. Koren S, Treangen TJ, Pop M (2011) Bambus 2: scaffolding metagenomes. Bioinformatics 27:2964–2971

    CAS  PubMed  Google Scholar 

  80. Eppley J et al (2007) Strainer: software for analysis of population variation in community genomic datasets. BMC Bioinformatics 8:398

    PubMed Central  PubMed  Google Scholar 

  81. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–595

    PubMed  Google Scholar 

  82. Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    PubMed Central  PubMed  Google Scholar 

  83. Cole JR et al (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145

    CAS  PubMed Central  PubMed  Google Scholar 

  84. DeSantis TZ et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069–5072

    CAS  PubMed Central  PubMed  Google Scholar 

  85. Pruesse E et al (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196

    CAS  PubMed Central  PubMed  Google Scholar 

  86. Huang Y, Gilna P, Li W (2009) Identification of ribosomal RNA genes in metagenomic fragments. Bioinformatics 25:1338–1340

    CAS  PubMed  Google Scholar 

  87. Brady A, Salzberg SL (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Meth 6:673–676

    CAS  Google Scholar 

  88. Teeling H et al (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5:163

    PubMed Central  PubMed  Google Scholar 

  89. McHardy AC et al (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Meth 4:63–72

    CAS  Google Scholar 

  90. Mrázek J (2009) Phylogenetic signals in DNA composition: limitations and prospects. Mol Biol Evol 26:1163–1169

    PubMed  Google Scholar 

  91. Albertsen M et al (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotech 31:533–538

    CAS  Google Scholar 

  92. Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 39:e91

    CAS  PubMed Central  PubMed  Google Scholar 

  93. Huson DH et al (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res 21:1552–1560

    CAS  PubMed  Google Scholar 

  94. Chatterji S et al (2008) CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. Res Comput Mol Biol Proc 4955:17–28

    Google Scholar 

  95. Patil KR et al (2011) Taxonomic metagenome sequence assignment with structured output models. Nat Meth 8:191–192

    CAS  Google Scholar 

  96. Chan C-K et al (2008) Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 9:215

    PubMed Central  PubMed  Google Scholar 

  97. Diaz N et al (2009) TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10:56

    PubMed Central  PubMed  Google Scholar 

  98. Weber M et al (2011) Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME J 5:918–928

    CAS  PubMed  Google Scholar 

  99. Meinicke P, Aßhauer KP, Lingner T (2011) Mixture models for analysis of the taxonomic composition of metagenomes. Bioinformatics 27:1618–1624

    CAS  PubMed  Google Scholar 

  100. Schreiber F et al (2010) Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 26:960–961

    CAS  PubMed  Google Scholar 

  101. Besemer J, Borodovsky M (1999) Heuristic approach to deriving models for gene finding. Nucleic Acids Res 27:3911–3920

    CAS  PubMed Central  PubMed  Google Scholar 

  102. Noguchi H, Park J, Takagi T (2006) MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34:5623–5630

    CAS  PubMed Central  PubMed  Google Scholar 

  103. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191

    PubMed Central  PubMed  Google Scholar 

  104. Hoff K et al (2008) Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9:217

    PubMed Central  PubMed  Google Scholar 

  105. Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    CAS  PubMed Central  PubMed  Google Scholar 

  106. Karp PD et al (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089

    CAS  PubMed Central  PubMed  Google Scholar 

  107. Overbeek R et al (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33:5691–5702

    CAS  PubMed Central  PubMed  Google Scholar 

  108. Finn RD et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222

    CAS  PubMed Central  PubMed  Google Scholar 

  109. Tatusov R et al (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41

    PubMed Central  PubMed  Google Scholar 

  110. Ye Y, Doak TG (2009) A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 5:e1000465

    PubMed Central  PubMed  Google Scholar 

  111. Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386

    CAS  PubMed  Google Scholar 

  112. Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71:8228–8235

    CAS  PubMed Central  PubMed  Google Scholar 

  113. Kristiansson E, Hugenholtz P, Dalevi D (2009) ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics 25:2737–2738

    CAS  PubMed  Google Scholar 

  114. Rodriguez-Brito B, Rohwer F, Edwards RA (2006) An application of statistics to comparative metagenomics. BMC Bioinformatics 7:162

    PubMed Central  PubMed  Google Scholar 

  115. Segata N et al (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12:R60

    PubMed Central  PubMed  Google Scholar 

  116. Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26:715–721

    CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Bragg, L., Tyson, G.W. (2014). Metagenomics Using Next-Generation Sequencing. In: Paulsen, I., Holmes, A. (eds) Environmental Microbiology. Methods in Molecular Biology, vol 1096. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-712-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-712-9_15

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-711-2

  • Online ISBN: 978-1-62703-712-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics