Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities

Chandramohan, Raghu; Yang, Cheng; Cai, Yunpeng; Wang, May D.

doi:10.1007/978-3-319-44981-4_5

Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities

Raghu Chandramohan^6,7,
Cheng Yang^6,7,
Yunpeng Cai⁸ &
…
May D. Wang^6,7

Chapter
First Online: 10 September 2017

1558 Accesses
1 Citations

Part of the book series: Health Information Science ((HIS))

Abstract

Metagenomics, as the genomic analysis of DNA materials from environmental samples containing multiple genomic components, is attracting more and more interests due to its wide applications on microbial, cancer, and immunology researches. This chapter provides an overview on the topic covering the major steps involved in data collection, processing, and analysis. We describe and discuss experiment design, sample processing and quality control, sequencing and assembly, annotation, and downstream analyses. For each step, we summarize the current points of views, key issues, and popular tools. A step-by-step tutorial is then given using the popular QIIME pipeline on a bacterial 16S rRNA study case, which would benefit new scientists of the field for the startup of a successful metagenome project.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3 (2012)
Article Google Scholar
R.I. Amann, B.J. Binder, R.J. Olson, S.W. Chisholm, R. Devereux, D.A. Stahl, Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Appl. Environ. Microbiol. 56, 1919–1925 (1990)
Google Scholar
J. Handelsman, J. Tiedje, L. Alvarez-Cohen et al., The new science of metagenomics: revealing the secrets of our microbial planet. Nat. Res. Counc. Rep. 13, 60–65 (2007)
Google Scholar
J.M.D. Bella, Y. Bao, G.B. Gloor, J.P. Burton, G. Rrid, High throughput sequencing methods and analysis for microbiome research. J. Microbiol. Methods 95, 401–414 (2013)
Article Google Scholar
S.F. Altschul, T.L. Madden, A.A. Schaffer et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
K.W. James, BLAT—the BLAST-like alignment tool. Genome Res. 12(4), 656–664 (2002)
Article Google Scholar
L. Krause, N.N. Diaz, A. Goesmann et al., Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 36(7), 2230–2239 (2008)
Article Google Scholar
M. Wu, J.A. Eisen, A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), R151 (2008)
Article Google Scholar
E.P. Nawrocki, L.K. Diana, L. Kolbe, S.R. Eddy, Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10), 1335–1337 (2009)
Article Google Scholar
H. Teeling, J. Waldmann, T. Lombardot et al., TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)
Article Google Scholar
S. Chatterji, I. Yamazaki, Z. Bai, et.al., CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads, in Research in Computational Molecular Biology (Springer, Berlin, 2008), pp. 17–28
Google Scholar
H.C.M. Leung, S.M. Yiu, B. Yang et al., A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)
Article Google Scholar
R.C. Edgar, Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19), 2460–2461 (2010)
Article Google Scholar
Y. Cai, Y. Sun, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 39(14), e95 (2011)
Article Google Scholar
Y. Liu, J. Guo, G. Hu, H. Zhu, Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14, S12 (2013)
Article Google Scholar
J.H. Badger, G.J. Olsen, CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524 (1999)
Article Google Scholar
D. Frishman, A. Mironov, H.-W. Mewes, M. Gelfand, Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26, 2941–2947 (1998)
Article Google Scholar
W. Zhu, A. Lomsadze, M. Borodovsky, Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132–e132 (2010)
Article Google Scholar
D. Hyatt, P.F. LoCascio, L.J. Hauser, E.C. Uberbacher, Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012)
Article Google Scholar
D.R. Kelley, B. Liu, A.L. Delcher, M. Pop, S.L. Salzberg, Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012)
Article Google Scholar
K.J. Hoff, M. Tech, T. Lingner, R. Daniel, B. Morgenstern, P. Meinicke, Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9, 217 (2008)
Article Google Scholar
M. Rho, H. Tang, Y. Ye, FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191–e191 (2010)
Article Google Scholar
J. Qin, R. Li, J. Raes, M. Arumugam, K.S. Burgdorf, C. Manichanh et al., A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)
Article Google Scholar
T. Namiki, T. Hachiya, H. Tanaka, Y. Sakakibara, MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155–e155 (2012)
Article Google Scholar
T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2 (2012)
Google Scholar
R.L. Tatusov, N.D. Fedorova, J.D. Jackson, A.R. Jacobs, B. Kiryutin, E.V. Koonin et al., The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003)
Article Google Scholar
J. Muller, D. Szklarczyk, P. Julien, I. Letunic, A. Roth, M. Kuhn et al., eggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)
Article Google Scholar
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)
Article Google Scholar
M. Punta, P.C. Coggill, R.Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell et al., The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)
Article Google Scholar
J.D. Selengut, D.H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio, W.C. Nelson et al., TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35, D260–D264 (2007)
Article Google Scholar
J.A. Gilbert, D. Field, P. Swift, S. Thomas, D. Cummings, B. Temperton et al., The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS ONE 5, e15545 (2010)
Article Google Scholar
A. Chao, Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11, 265–270 (1984)
Google Scholar
A. Chao, S.M. Lee, Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210–217 (1992)
Google Scholar
S.H. Hurlbert, The non-concept of species diversity: a critique and alternative parameters. Ecology 52, 577–586 (1971)
Article Google Scholar
C. Lozupone, R. Knight, UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71(12), 8228–8235 (2005)
Article Google Scholar
T.J. Wheeler, Large-scale neighbor-joining with NINJA, in Algorithms in Bioinformatics (Springer, Berlin, 2009), pp. 375–389
Google Scholar
K. Howe, A. Bateman, R. Durbin, QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)
Article Google Scholar
M.N. Price, P.S. Dehal, A.P. Arkin, FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26(7), 1641–1650 (2009)
Article Google Scholar
S. Guindon, et al., New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010)
Google Scholar
Alexandros Stamatakis, RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Article Google Scholar
M.N. Price, P.S. Dehal, A.P. Arkin, FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)
Article Google Scholar
M. Arumugam et al., Enterotypes of the human gut microbiome. Nature 473(7346), 174–180 (2011)
Article Google Scholar
V. Friedman, Data visualization and infographics. Graph. Monday Inspiration 14, 2008 (2008)
Google Scholar
V.M. Markowitz, I.-M.A. Chen, K. Chu, E. Szeto, K. Palaniappan, Y. Grechkin et al., IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 40, D123–D129 (2012)
Article Google Scholar
D.H. Huson, S. Mitra, H.-J. Ruscheweyh, N. Weber, S.C. Schuster, Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21, 1552–1560 (2011)
Article Google Scholar
B.D. Ondov, N.H. Bergman, A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011)
Article Google Scholar
B. Song, X. Su, J. Xu, K. Ning, MetaSee: an interactive and extendable visualization toolbox for metagenomic sample analysis and comparison. PLoS ONE 7, e48998 (2012)
Article Google Scholar
S.M. Huse, D.B.M. Welch, A. Voorhis, A. Shipunova, H.G. Morrison, A.M. Eren et al., VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformatics 15, 41 (2014)
Article Google Scholar
C. Kerepesi, B. Szalkai, V. Grolmusz, Visual analysis of the quantitative composition of metagenomic communities: the AmphoraVizu Webserver. Microb. Ecol. 1–3 (2014)
Google Scholar
C.E. Robertson, J.K. Harris, B.D. Wagner, D. Granger, K. Browne, B. Tatem, et al., Explicet: Graphical user interface software for metadata-driven management, analysis, and visualization of microbiome data. Bioinformatics btt526 (2013)
Google Scholar
P. Lechat, E. Souche, I. Moszer, SynTView—an interactive multi-view genome browser for next-generation comparative microorganism genomics. BMC Bioinformatics 14, 277 (2013)
Article Google Scholar
S. Möller, M.D. Croning, R. Apweiler, Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646–653 (2001)
Article Google Scholar
S. Sun, J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier et al., Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. 39, D546–D551 (2011)
Article Google Scholar
S. Hunter, M. Corbett, H. Denise, M. Fraser, A. Gonzalez-Beltran, C. Hunter et al., EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 42, D600–D606 (2014)
Article Google Scholar
D. Field, L. Amaral-Zettler, G. Cochrane, J.R. Cole, P. Dawyndt, G.M. Garrity, et al., The genomic standards consortium. PLoS Biol. 9 (2011)
Google Scholar
P. Yilmaz, R. Kottmann, D. Field, R. Knight, J.R. Cole, L. Amaral-Zettler et al., Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415–420 (2011)
Article Google Scholar
E. Glass, F. Meyer, J.A. Gilbert, D. Field, S. Hunter, R. Kottmann et al., Meeting report from the genomic standards consortium (GSC) workshop 10. Stand. Genomic Sci. 3, 225 (2010)
Article Google Scholar
J.G. Caporaso, J. Kuczynski, J. Stombaugh et al., QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)
Article Google Scholar
P.D. Schloss, S.L. Westcott, T. Ryabin et al., Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)
Article Google Scholar
J.R. Cole, Q. Wang, J.A. Fish et al., Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucl. Acids Res. 41, D633–D642 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Joint Department of Biomedical Engineering Department, Georgia Institute of Technology, Emory University, Atlanta, USA
Raghu Chandramohan, Cheng Yang & May D. Wang
Peking University, Beijing, China
Raghu Chandramohan, Cheng Yang & May D. Wang
Research Center for Biomedical Informatics, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
Yunpeng Cai

Authors

Raghu Chandramohan
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Cai
View author publications
You can also search for this author in PubMed Google Scholar
May D. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to May D. Wang .

Editor information

Editors and Affiliations

Digital Biology Laboratory, Computer Science Department, University of Missouri-Columbia, Columbia, Missouri, USA
Dong Xu
Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
May D. Wang
College of Computer Science and Technology, Jilin University, Changchun, China
Fengfeng Zhou
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
Yunpeng Cai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chandramohan, R., Yang, C., Cai, Y., Wang, M.D. (2017). Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities. In: Xu, D., Wang, M., Zhou, F., Cai, Y. (eds) Health Informatics Data Analysis. Health Information Science. Springer, Cham. https://doi.org/10.1007/978-3-319-44981-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-44981-4_5
Published: 10 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44979-1
Online ISBN: 978-3-319-44981-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics