Metagenomics and CAZyme Discovery

  • Benoit J. Kunath
  • Andreas Bremges
  • Aaron Weimann
  • Alice C. McHardy
  • Phillip B. PopeEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1588)


Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems using common culture-dependent techniques restricts access to potentially novel cellulolytic bacteria and beneficial enzymes. The development of molecular-based culture-independent methods such as metagenomics enables researchers to study microbial communities directly from environmental samples, and presents a platform from which enzymes of interest can be sourced. We outline key methodological stages that are required as well as describe specific protocols that are currently used for metagenomic projects dedicated to CAZyme discovery.

Key words

Metagenomics Carbohydrate active enzymes Microbial communities Assembly Binning 


  1. 1.
    Morrison M et al (2009) Plant biomass degradation by gut microbiomes: more of the same or something new? Curr Opin Biotechnol 20:358–363CrossRefPubMedGoogle Scholar
  2. 2.
    Warnecke F et al (2007) Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 450:560–565CrossRefPubMedGoogle Scholar
  3. 3.
    Hess M et al (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331(6016):463–467CrossRefPubMedGoogle Scholar
  4. 4.
    Pope PB et al (2010) Adaptation to herbivory by the Tammar wallaby includes bacterial and glycoside hydrolase profiles different to other herbivores. Proc Natl Acad Sci U S A 107:14793–14798CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Liu J et al (2011) Cloning and functional characterization of a novel endo-beta-1,4-glucanase gene from a soil-derived metagenomic library. Appl Microbiol Biotechnol 89:1083–1092CrossRefPubMedGoogle Scholar
  6. 6.
    Pope PB et al (2012) Metagenomics of the Svalbard reindeer rumen microbiome reveals abundance of polysaccharide utilization loci. PLoS One 7:e38571CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Zhou Y et al (2016) A novel efficient β-glucanase from a paddy soil microbial metagenome with versatile activities. Biotechnol Biofuels 9:36CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Ouwerkerk D et al (2005) Characterization of culturable anaerobic bacteria from the forestomach of an eastern grey kangaroo, Macropus giganteus. Lett Appl Microbiol 41:327–333CrossRefPubMedGoogle Scholar
  9. 9.
    Naas AE et al (2014) Do rumen Bacteroidetes utilize an alternative mechanism for cellulose degradation? MBio 5:e01401–e01414CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Zhou Y et al (2014) Omics-based interpretation of synergism in a soil-derived cellulose-degrading microbial community. Sci Rep 4:5288CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Sims D et al (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15:121–132CrossRefPubMedGoogle Scholar
  12. 12.
    Kuczynski J et al (2011) Using QIIME to analyze 16S rRNA gene sequences from microbial communities (Chapter:Unit). Curr Protoc Bioinformatics 10:7PubMedGoogle Scholar
  13. 13.
    Gilbert JA, Jansson JK, Knight R (2014) The Earth Microbiome project: successes and aspirations. BMC Biol 12:69CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Yilmaz P et al (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 29:415–420CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Yilmaz P et al (2011) The genomic standards consortium: bringing standards to life for microbial ecology. ISME J 5:1565–1567CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Field D et al (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26:541–547CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Burke C, Kjelleberg S, Thomas T (2009) Selective extraction of bacterial DNA from the surfaces of macroalgae. Appl Environ Microbiol 75:252–256CrossRefPubMedGoogle Scholar
  18. 18.
    Delmont TO et al (2011) Metagenomic comparison of direct and indirect soil DNA extraction approaches. J Microbiol Methods 86:397–400CrossRefPubMedGoogle Scholar
  19. 19.
    Rosewarne CP et al (2011) High-yield and phylogenetically robust methods of DNA recovery for analysis of microbial biofilms adherent to plant biomass in the herbivore gut. Microb Ecol 61:448–454CrossRefPubMedGoogle Scholar
  20. 20.
    Denman SE et al (2015) Metagenomic analysis of the rumen microbial community following inhibition of methane formation by a halogenated methane analog. Front Microbiol 6:1087CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Cardenas E et al (2015) Forest harvesting reduces the soil metagenomic potential for biomass decomposition. ISME J 9:2465–2476CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Marine R et al (2014) Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome. Microbiome 2:3CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Binga EK, Lasken RS, Neufeld JD (2008) Something from (almost) nothing: the impact of multiple displacement amplification on microbial ecology. ISME J 2:233–241CrossRefPubMedGoogle Scholar
  24. 24.
    Bragg L, Tyson GW (2014) Metagenomics using next-generation sequencing. Methods Mol Biol 1096:183–201CrossRefPubMedGoogle Scholar
  25. 25.
    Di Bella JM et al (2013) High throughput sequencing methods and analysis for microbiome research. J Microbiol Methods 95:401–414CrossRefPubMedGoogle Scholar
  26. 26.
    Laehnemann D, Borkhardt A, McHardy AC (2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 17:154–179CrossRefPubMedGoogle Scholar
  27. 27.
    Frank JA et al (2016) Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci Rep 6:25373CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167CrossRefPubMedGoogle Scholar
  29. 29.
    Peng Y et al (2012) IDBA-UD: a denovo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428CrossRefPubMedGoogle Scholar
  30. 30.
    Li D et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676CrossRefPubMedGoogle Scholar
  31. 31.
    Nurk S et al (2016) MetaSPAdes: a new versatile de novo metagenomics assembler. arXiv:1604.03071Google Scholar
  32. 32.
    Scholz M, Lo CC, Chain PS (2014) Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. Sci Rep 4:e6480CrossRefGoogle Scholar
  33. 33.
    Tsai YC et al (2016) Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing. MBio 7:e01948CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Koren S et al (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30:693–700CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Utturkar SM et al (2014) Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. Bioinformatics 30:2709–2716CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Chevreux B, Wetter T, Suhai S (1999) Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Comput Sci Biol 99:45–46Google Scholar
  37. 37.
    Eren AM et al (2015) Anvi’o: an advanced analysis and visualization platform for ‘omics data. Peer J 3:e1319CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Zhu Z et al (2013) MGAviewer: a desktop visualization tool for analysis of metagenomics alignment data. Bioinformatics 29:122–123CrossRefPubMedGoogle Scholar
  39. 39.
    McHardy AC, Rigoutsos I (2007) What’s in the mix: phylogenetic classification of metagenome sequence samples. Curr Opin Microbiol 10:499–503CrossRefPubMedGoogle Scholar
  40. 40.
    Huson DH et al (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Gregor I et al (2016) PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. Peer J 4:e1603Google Scholar
  42. 42.
    Dröge J, Gregor I, McHardy AC (2015) Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31:817–824CrossRefPubMedGoogle Scholar
  43. 43.
    Federhen S (2012) The NCBI Taxonomy database. Nucleic Acids Res 40:D136–D143CrossRefPubMedGoogle Scholar
  44. 44.
    Teeling H et al (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5:163CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Iverson V et al (2012) Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335:587–590CrossRefPubMedGoogle Scholar
  46. 46.
    Wu YW et al (2014) MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2:26CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Imelfort M et al (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. Peer J 2:e409v1CrossRefGoogle Scholar
  48. 48.
    Alneberg J et al (2014) Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146CrossRefPubMedGoogle Scholar
  49. 49.
    Kang DD et al (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. Peer J 3:e1165CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Albertsen M et al (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31:533–538CrossRefPubMedGoogle Scholar
  51. 51.
    Parks DH et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Sczyrba A et al (2017) Critical Assessment of Metagenome Interpretation − a benchmark of computational metagenomics software. bioRxiv: 099127Google Scholar
  53. 53.
    Kunin V et al (2008) A bioinformatician’s guide to metagenomics. Microbiol Mol Biol Rev 72:557–578CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38:e132CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Lopez-Lopez O et al (2015) Metagenomics of an alkaline hot spring in Galicia (Spain): microbial diversity analysis and screening for novel lipolytic enzymes. Front Microbiol 6:1291CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Mhuantong W et al (2015) Comparative analysis of sugarcane bagasse metagenome reveals unique and conserved biomass-degrading enzymes among lignocellulolytic microbial communities. Biotechnol Biofuels 8:16CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Jimenez DJ, Chaves-Moreno D, van Elsas JD (2015) Unveiling the metabolic potential of two soil-derived microbial consortia selected on wheat straw. Sci Rep 5:13845CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637CrossRefPubMedGoogle Scholar
  59. 59.
    Finn RD et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285CrossRefPubMedGoogle Scholar
  60. 60.
    Haft DH (2003) The TIGRFAMs database of protein families. Nucleic Acids Res 31:371–373CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Kanehisa M et al (2015) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 43:1–6CrossRefGoogle Scholar
  63. 63.
    Caspi R et al (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res 42:D459–D471CrossRefPubMedGoogle Scholar
  64. 64.
    Marchler-Bauer A et al (2015) CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226CrossRefPubMedGoogle Scholar
  65. 65.
    Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461CrossRefPubMedGoogle Scholar
  67. 67.
    Markowitz VM et al (2014) IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42:D568–D573CrossRefPubMedGoogle Scholar
  68. 68.
    Lombard V et al (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495CrossRefPubMedGoogle Scholar
  69. 69.
    Cantarel BL et al (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res 37:233–238CrossRefGoogle Scholar
  70. 70.
    Yin Y et al (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451CrossRefPubMedPubMedCentralGoogle Scholar
  71. 71.
    Park BH et al (2010) CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database. Glycobiology 20:1574–1584CrossRefPubMedGoogle Scholar
  72. 72.
    Rosewarne CP et al (2014) Analysis of the bovine rumen microbiome reveals a diversity of Sus-like polysaccharide utilization loci from the bacterial phylum Bacteroidetes. J Ind Microbiol Biotechnol 41:601–606CrossRefPubMedGoogle Scholar
  73. 73.
    Martens EC et al (2009) Complex glycan catabolism by the human gut microbiota: The bacteroidetes Sus-like paradigm. J Biol Chem 284:24673–24677CrossRefPubMedPubMedCentralGoogle Scholar
  74. 74.
    Hemsworth GR et al (2014) Discovery and characterization of a new family of lytic polysaccharide monooxygenases. Nat Chem Biol 10:122–126CrossRefPubMedGoogle Scholar
  75. 75.
    Liu Y et al (2006) An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits. PLoS Comput Biol 2:e159CrossRefPubMedPubMedCentralGoogle Scholar
  76. 76.
    Korbel JO et al (2005) Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol 3:e134CrossRefPubMedPubMedCentralGoogle Scholar
  77. 77.
    Lingner T et al (2010) Predicting phenotypic traits of prokaryotes from protein domain frequencies. BMC Bioinformatics 11:481CrossRefPubMedPubMedCentralGoogle Scholar
  78. 78.
    Feldbauer R et al (2015) Prediction of microbial phenotypes based on comparative genomics. BMC Bioinformatics 16(Suppl. 14):S1.CrossRefPubMedPubMedCentralGoogle Scholar
  79. 79.
    Boser B, Guyon I, and Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Fifth proceedings of the fifth annual workshop on computational learning theory, Pittsburgh, ACM, pp 144–152Google Scholar
  80. 80.
    Weimann A et al (2013) De novo prediction of the genomic components and capabilities for microbial plant biomass degradation from (meta-)genomes. Biotechnol Biofuels 6:24CrossRefPubMedPubMedCentralGoogle Scholar
  81. 81.
    Konietzny SG et al (2014) Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders. Biotechnol Biofuels 7:124CrossRefPubMedPubMedCentralGoogle Scholar
  82. 82.
    Weimann A et al (2016) From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer. mSystems 1:e00101–16Google Scholar
  83. 83.
    Wang A et al (2010) Enrichment strategy to select functional consortium from mixed cultures: consortium from rumen liquor for simultaneous cellulose degradation and hydrogen production. Int J Hydrogen Energy 35:13413–13418CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Benoit J. Kunath
    • 1
  • Andreas Bremges
    • 2
    • 3
  • Aaron Weimann
    • 2
  • Alice C. McHardy
    • 2
  • Phillip B. Pope
    • 1
    Email author
  1. 1.Department of Chemistry, Biotechnology and Food ScienceNorwegian University of Life SciencesÅsNorway
  2. 2.Computational Biology of Infection ResearchHelmholtz Centre for Infection ResearchBraunschweigGermany
  3. 3.German Center for Infection Research (DZIF)BraunschweigGermany

Personalised recommendations