Skip to main content

Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities

  • Chapter
  • First Online:

Part of the book series: Health Information Science ((HIS))

Abstract

Metagenomics, as the genomic analysis of DNA materials from environmental samples containing multiple genomic components, is attracting more and more interests due to its wide applications on microbial, cancer, and immunology researches. This chapter provides an overview on the topic covering the major steps involved in data collection, processing, and analysis. We describe and discuss experiment design, sample processing and quality control, sequencing and assembly, annotation, and downstream analyses. For each step, we summarize the current points of views, key issues, and popular tools. A step-by-step tutorial is then given using the popular QIIME pipeline on a bacterial 16S rRNA study case, which would benefit new scientists of the field for the startup of a successful metagenome project.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3 (2012)

    Article  Google Scholar 

  2. R.I. Amann, B.J. Binder, R.J. Olson, S.W. Chisholm, R. Devereux, D.A. Stahl, Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Appl. Environ. Microbiol. 56, 1919–1925 (1990)

    Google Scholar 

  3. J. Handelsman, J. Tiedje, L. Alvarez-Cohen et al., The new science of metagenomics: revealing the secrets of our microbial planet. Nat. Res. Counc. Rep. 13, 60–65 (2007)

    Google Scholar 

  4. J.M.D. Bella, Y. Bao, G.B. Gloor, J.P. Burton, G. Rrid, High throughput sequencing methods and analysis for microbiome research. J. Microbiol. Methods 95, 401–414 (2013)

    Article  Google Scholar 

  5. S.F. Altschul, T.L. Madden, A.A. Schaffer et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  6. K.W. James, BLAT—the BLAST-like alignment tool. Genome Res. 12(4), 656–664 (2002)

    Article  Google Scholar 

  7. L. Krause, N.N. Diaz, A. Goesmann et al., Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 36(7), 2230–2239 (2008)

    Article  Google Scholar 

  8. M. Wu, J.A. Eisen, A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9(10), R151 (2008)

    Article  Google Scholar 

  9. E.P. Nawrocki, L.K. Diana, L. Kolbe, S.R. Eddy, Infernal 1.0: inference of RNA alignments. Bioinformatics 25(10), 1335–1337 (2009)

    Article  Google Scholar 

  10. H. Teeling, J. Waldmann, T. Lombardot et al., TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163 (2004)

    Article  Google Scholar 

  11. S. Chatterji, I. Yamazaki, Z. Bai, et.al., CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads, in Research in Computational Molecular Biology (Springer, Berlin, 2008), pp. 17–28

    Google Scholar 

  12. H.C.M. Leung, S.M. Yiu, B. Yang et al., A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)

    Article  Google Scholar 

  13. R.C. Edgar, Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19), 2460–2461 (2010)

    Article  Google Scholar 

  14. Y. Cai, Y. Sun, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 39(14), e95 (2011)

    Article  Google Scholar 

  15. Y. Liu, J. Guo, G. Hu, H. Zhu, Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 14, S12 (2013)

    Article  Google Scholar 

  16. J.H. Badger, G.J. Olsen, CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol. 16, 512–524 (1999)

    Article  Google Scholar 

  17. D. Frishman, A. Mironov, H.-W. Mewes, M. Gelfand, Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26, 2941–2947 (1998)

    Article  Google Scholar 

  18. W. Zhu, A. Lomsadze, M. Borodovsky, Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132–e132 (2010)

    Article  Google Scholar 

  19. D. Hyatt, P.F. LoCascio, L.J. Hauser, E.C. Uberbacher, Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012)

    Article  Google Scholar 

  20. D.R. Kelley, B. Liu, A.L. Delcher, M. Pop, S.L. Salzberg, Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012)

    Article  Google Scholar 

  21. K.J. Hoff, M. Tech, T. Lingner, R. Daniel, B. Morgenstern, P. Meinicke, Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinformatics 9, 217 (2008)

    Article  Google Scholar 

  22. M. Rho, H. Tang, Y. Ye, FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191–e191 (2010)

    Article  Google Scholar 

  23. J. Qin, R. Li, J. Raes, M. Arumugam, K.S. Burgdorf, C. Manichanh et al., A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)

    Article  Google Scholar 

  24. T. Namiki, T. Hachiya, H. Tanaka, Y. Sakakibara, MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155–e155 (2012)

    Article  Google Scholar 

  25. T. Thomas, J. Gilbert, F. Meyer, Metagenomics—a guide from sampling to data analysis. Microb. Inform. Exp. 2 (2012)

    Google Scholar 

  26. R.L. Tatusov, N.D. Fedorova, J.D. Jackson, A.R. Jacobs, B. Kiryutin, E.V. Koonin et al., The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003)

    Article  Google Scholar 

  27. J. Muller, D. Szklarczyk, P. Julien, I. Letunic, A. Roth, M. Kuhn et al., eggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)

    Article  Google Scholar 

  28. M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)

    Article  Google Scholar 

  29. M. Punta, P.C. Coggill, R.Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell et al., The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)

    Article  Google Scholar 

  30. J.D. Selengut, D.H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio, W.C. Nelson et al., TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35, D260–D264 (2007)

    Article  Google Scholar 

  31. J.A. Gilbert, D. Field, P. Swift, S. Thomas, D. Cummings, B. Temperton et al., The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS ONE 5, e15545 (2010)

    Article  Google Scholar 

  32. A. Chao, Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11, 265–270 (1984)

    Google Scholar 

  33. A. Chao, S.M. Lee, Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210–217 (1992)

    Google Scholar 

  34. S.H. Hurlbert, The non-concept of species diversity: a critique and alternative parameters. Ecology 52, 577–586 (1971)

    Article  Google Scholar 

  35. C. Lozupone, R. Knight, UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71(12), 8228–8235 (2005)

    Article  Google Scholar 

  36. T.J. Wheeler, Large-scale neighbor-joining with NINJA, in Algorithms in Bioinformatics (Springer, Berlin, 2009), pp. 375–389

    Google Scholar 

  37. K. Howe, A. Bateman, R. Durbin, QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11), 1546–1547 (2002)

    Article  Google Scholar 

  38. M.N. Price, P.S. Dehal, A.P. Arkin, FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26(7), 1641–1650 (2009)

    Article  Google Scholar 

  39. S. Guindon, et al., New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59(3), 307–321 (2010)

    Google Scholar 

  40. Alexandros Stamatakis, RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  41. M.N. Price, P.S. Dehal, A.P. Arkin, FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)

    Article  Google Scholar 

  42. M. Arumugam et al., Enterotypes of the human gut microbiome. Nature 473(7346), 174–180 (2011)

    Article  Google Scholar 

  43. V. Friedman, Data visualization and infographics. Graph. Monday Inspiration 14, 2008 (2008)

    Google Scholar 

  44. V.M. Markowitz, I.-M.A. Chen, K. Chu, E. Szeto, K. Palaniappan, Y. Grechkin et al., IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 40, D123–D129 (2012)

    Article  Google Scholar 

  45. D.H. Huson, S. Mitra, H.-J. Ruscheweyh, N. Weber, S.C. Schuster, Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21, 1552–1560 (2011)

    Article  Google Scholar 

  46. B.D. Ondov, N.H. Bergman, A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011)

    Article  Google Scholar 

  47. B. Song, X. Su, J. Xu, K. Ning, MetaSee: an interactive and extendable visualization toolbox for metagenomic sample analysis and comparison. PLoS ONE 7, e48998 (2012)

    Article  Google Scholar 

  48. S.M. Huse, D.B.M. Welch, A. Voorhis, A. Shipunova, H.G. Morrison, A.M. Eren et al., VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformatics 15, 41 (2014)

    Article  Google Scholar 

  49. C. Kerepesi, B. Szalkai, V. Grolmusz, Visual analysis of the quantitative composition of metagenomic communities: the AmphoraVizu Webserver. Microb. Ecol. 1–3 (2014)

    Google Scholar 

  50. C.E. Robertson, J.K. Harris, B.D. Wagner, D. Granger, K. Browne, B. Tatem, et al., Explicet: Graphical user interface software for metadata-driven management, analysis, and visualization of microbiome data. Bioinformatics btt526 (2013)

    Google Scholar 

  51. P. Lechat, E. Souche, I. Moszer, SynTView—an interactive multi-view genome browser for next-generation comparative microorganism genomics. BMC Bioinformatics 14, 277 (2013)

    Article  Google Scholar 

  52. S. Möller, M.D. Croning, R. Apweiler, Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646–653 (2001)

    Article  Google Scholar 

  53. S. Sun, J. Chen, W. Li, I. Altintas, A. Lin, S. Peltier et al., Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource. Nucleic Acids Res. 39, D546–D551 (2011)

    Article  Google Scholar 

  54. S. Hunter, M. Corbett, H. Denise, M. Fraser, A. Gonzalez-Beltran, C. Hunter et al., EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 42, D600–D606 (2014)

    Article  Google Scholar 

  55. D. Field, L. Amaral-Zettler, G. Cochrane, J.R. Cole, P. Dawyndt, G.M. Garrity, et al., The genomic standards consortium. PLoS Biol. 9 (2011)

    Google Scholar 

  56. P. Yilmaz, R. Kottmann, D. Field, R. Knight, J.R. Cole, L. Amaral-Zettler et al., Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415–420 (2011)

    Article  Google Scholar 

  57. E. Glass, F. Meyer, J.A. Gilbert, D. Field, S. Hunter, R. Kottmann et al., Meeting report from the genomic standards consortium (GSC) workshop 10. Stand. Genomic Sci. 3, 225 (2010)

    Article  Google Scholar 

  58. J.G. Caporaso, J. Kuczynski, J. Stombaugh et al., QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)

    Article  Google Scholar 

  59. P.D. Schloss, S.L. Westcott, T. Ryabin et al., Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537–7541 (2009)

    Article  Google Scholar 

  60. J.R. Cole, Q. Wang, J.A. Fish et al., Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucl. Acids Res. 41, D633–D642 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to May D. Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Chandramohan, R., Yang, C., Cai, Y., Wang, M.D. (2017). Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities. In: Xu, D., Wang, M., Zhou, F., Cai, Y. (eds) Health Informatics Data Analysis. Health Information Science. Springer, Cham. https://doi.org/10.1007/978-3-319-44981-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44981-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44979-1

  • Online ISBN: 978-3-319-44981-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics