Abstract
Metagenomics, as the genomic analysis of DNA materials from environmental samples containing multiple genomic components, is attracting more and more interests due to its wide applications on microbial, cancer, and immunology researches. This chapter provides an overview on the topic covering the major steps involved in data collection, processing, and analysis. We describe and discuss experiment design, sample processing and quality control, sequencing and assembly, annotation, and downstream analyses. For each step, we summarize the current points of views, key issues, and popular tools. A step-by-step tutorial is then given using the popular QIIME pipeline on a bacterial 16S rRNA study case, which would benefit new scientists of the field for the startup of a successful metagenome project.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3 (2012)
R.I. Amann, B.J. Binder, R.J. Olson, S.W. Chisholm, R. Devereux, D.A. Stahl, Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Appl. Environ. Microbiol. 56, 1919–1925 (1990)
J. Handelsman, J. Tiedje, L. Alvarez-Cohen et al., The new science of metagenomics: revealing the secrets of our microbial planet. Nat. Res. Counc. Rep. 13, 60–65 (2007)
J.M.D. Bella, Y. Bao, G.B. Gloor, J.P. Burton, G. Rrid, High throughput sequencing methods and analysis for microbiome research. J. Microbiol. Methods 95, 401–414 (2013)
S.F. Altschul, T.L. Madden, A.A. Schaffer et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997)
K.W. James, BLAT—the BLAST-like alignment tool. Genome Res. 12(4), 656–664 (2002)
L. Krause, N.N. Diaz, A. Goesmann et al., Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 36(7), 2230–2239 (2008)
M. Wu, J.A. Eisen, A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), R151 (2008)
E.P. Nawrocki, L.K. Diana, L. Kolbe, S.R. Eddy, Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10), 1335–1337 (2009)
H. Teeling, J. Waldmann, T. Lombardot et al., TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)
S. Chatterji, I. Yamazaki, Z. Bai, et.al., CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads, in Research in Computational Molecular Biology (Springer, Berlin, 2008), pp. 17–28
H.C.M. Leung, S.M. Yiu, B. Yang et al., A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)
R.C. Edgar, Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19), 2460–2461 (2010)
Y. Cai, Y. Sun, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 39(14), e95 (2011)
Y. Liu, J. Guo, G. Hu, H. Zhu, Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14, S12 (2013)
J.H. Badger, G.J. Olsen, CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524 (1999)
D. Frishman, A. Mironov, H.-W. Mewes, M. Gelfand, Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26, 2941–2947 (1998)
W. Zhu, A. Lomsadze, M. Borodovsky, Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132–e132 (2010)
D. Hyatt, P.F. LoCascio, L.J. Hauser, E.C. Uberbacher, Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012)
D.R. Kelley, B. Liu, A.L. Delcher, M. Pop, S.L. Salzberg, Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012)
K.J. Hoff, M. Tech, T. Lingner, R. Daniel, B. Morgenstern, P. Meinicke, Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9, 217 (2008)
M. Rho, H. Tang, Y. Ye, FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191–e191 (2010)
J. Qin, R. Li, J. Raes, M. Arumugam, K.S. Burgdorf, C. Manichanh et al., A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)
T. Namiki, T. Hachiya, H. Tanaka, Y. Sakakibara, MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155–e155 (2012)
T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2 (2012)
R.L. Tatusov, N.D. Fedorova, J.D. Jackson, A.R. Jacobs, B. Kiryutin, E.V. Koonin et al., The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003)
J. Muller, D. Szklarczyk, P. Julien, I. Letunic, A. Roth, M. Kuhn et al., eggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)
M. Punta, P.C. Coggill, R.Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell et al., The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)
J.D. Selengut, D.H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio, W.C. Nelson et al., TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35, D260–D264 (2007)
J.A. Gilbert, D. Field, P. Swift, S. Thomas, D. Cummings, B. Temperton et al., The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS ONE 5, e15545 (2010)
A. Chao, Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11, 265–270 (1984)
A. Chao, S.M. Lee, Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210–217 (1992)
S.H. Hurlbert, The non-concept of species diversity: a critique and alternative parameters. Ecology 52, 577–586 (1971)
C. Lozupone, R. Knight, UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71(12), 8228–8235 (2005)
T.J. Wheeler, Large-scale neighbor-joining with NINJA, in Algorithms in Bioinformatics (Springer, Berlin, 2009), pp. 375–389
K. Howe, A. Bateman, R. Durbin, QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)
M.N. Price, P.S. Dehal, A.P. Arkin, FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26(7), 1641–1650 (2009)
S. Guindon, et al., New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010)
Alexandros Stamatakis, RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
M.N. Price, P.S. Dehal, A.P. Arkin, FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)
M. Arumugam et al., Enterotypes of the human gut microbiome. Nature 473(7346), 174–180 (2011)
V. Friedman, Data visualization and infographics. Graph. Monday Inspiration 14, 2008 (2008)
V.M. Markowitz, I.-M.A. Chen, K. Chu, E. Szeto, K. Palaniappan, Y. Grechkin et al., IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 40, D123–D129 (2012)
D.H. Huson, S. Mitra, H.-J. Ruscheweyh, N. Weber, S.C. Schuster, Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21, 1552–1560 (2011)
B.D. Ondov, N.H. Bergman, A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011)
B. Song, X. Su, J. Xu, K. Ning, MetaSee: an interactive and extendable visualization toolbox for metagenomic sample analysis and comparison. PLoS ONE 7, e48998 (2012)
S.M. Huse, D.B.M. Welch, A. Voorhis, A. Shipunova, H.G. Morrison, A.M. Eren et al., VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformatics 15, 41 (2014)
C. Kerepesi, B. Szalkai, V. Grolmusz, Visual analysis of the quantitative composition of metagenomic communities: the AmphoraVizu Webserver. Microb. Ecol. 1–3 (2014)
C.E. Robertson, J.K. Harris, B.D. Wagner, D. Granger, K. Browne, B. Tatem, et al., Explicet: Graphical user interface software for metadata-driven management, analysis, and visualization of microbiome data. Bioinformatics btt526 (2013)
P. Lechat, E. Souche, I. Moszer, SynTView—an interactive multi-view genome browser for next-generation comparative microorganism genomics. BMC Bioinformatics 14, 277 (2013)
S. Möller, M.D. Croning, R. Apweiler, Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646–653 (2001)
S. Sun, J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier et al., Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. 39, D546–D551 (2011)
S. Hunter, M. Corbett, H. Denise, M. Fraser, A. Gonzalez-Beltran, C. Hunter et al., EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 42, D600–D606 (2014)
D. Field, L. Amaral-Zettler, G. Cochrane, J.R. Cole, P. Dawyndt, G.M. Garrity, et al., The genomic standards consortium. PLoS Biol. 9 (2011)
P. Yilmaz, R. Kottmann, D. Field, R. Knight, J.R. Cole, L. Amaral-Zettler et al., Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415–420 (2011)
E. Glass, F. Meyer, J.A. Gilbert, D. Field, S. Hunter, R. Kottmann et al., Meeting report from the genomic standards consortium (GSC) workshop 10. Stand. Genomic Sci. 3, 225 (2010)
J.G. Caporaso, J. Kuczynski, J. Stombaugh et al., QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)
P.D. Schloss, S.L. Westcott, T. Ryabin et al., Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)
J.R. Cole, Q. Wang, J.A. Fish et al., Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucl. Acids Res. 41, D633–D642 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Chandramohan, R., Yang, C., Cai, Y., Wang, M.D. (2017). Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities. In: Xu, D., Wang, M., Zhou, F., Cai, Y. (eds) Health Informatics Data Analysis. Health Information Science. Springer, Cham. https://doi.org/10.1007/978-3-319-44981-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-44981-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44979-1
Online ISBN: 978-3-319-44981-4
eBook Packages: Computer ScienceComputer Science (R0)