Abstract
Advances in next-generation sequencing technologies allow comparative analyses of the diversity and abundance of whole microbial communities, and of important ecosystem functional genes, at far greater depths than ever before. However, the current major challenge for the use of this immense amount of genetic information is undoubtedly how to convert the information into rational biological conclusions. As an attempt to solve this issue, we now rely on a set of complex computational/statistical analyses, the use of which, however, could be a drawback for most researchers in the biological sciences. In this chapter, we outline the main approaches applied for microbiome studies based on high-throughput sequencing technologies and we introduce the most commonly used strategies for data handling, sequence clustering, taxonomic and functional assignment, and microbial community comparisons. We also draw readers’ attention to recent advances in the microbiome research field, illustrating the Brazilian case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cardenas E, Tiedje JM (2008) New tools for discovering and characterizing microbial diversity. Curr Opin Biotechnol 19:544–549
Chen T, Yu W-H, Izard J, Baranova O V, Lakshmanan A, Dewhirst FE (2010) The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database 2010:baq013. http://www.ncbi.nlm.nih.gov/pubmed/20624719
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590–D596. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531112/
Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y et al (2014) Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42(Database issue):D633–D642. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965039/
Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233. http://www.ncbi.nlm.nih.gov/pubmed/26791506
Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E et al (2017) IMG/VR: a database of cultured and uncultured DNA viruses and retroviruses. Nucleic Acids Res 45(Database issue):D457–D465. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210529/
Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F et al (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459):431–437. http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17(6):333–351. http://www.nature.com/nrg/journal/v17/n6/full/nrg.2016.49.html
Schloss PD, Gevers D, Westcott SL (2011) Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6(12):e27310. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0027310
Sokal RR (1963) The principles and practice of numerical taxonomy. Taxon 12(5):190–199. http://www.jstor.org/stable/1217562
McCaig AE, Glover LA, Prosser JI (1999) Molecular analysis of bacterial community structure and diversity in unimproved and improved upland grass pastures. Appl Environ Microbiol 65(4):1721–1730. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC91243/
Schloss PD, Handelsman J (2005) Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 71(3):1501–1506. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065144/
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. http://www.ncbi.nlm.nih.gov/pubmed/2231712
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461. doi:10.1093/bioinformatics/btq461
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009) Introducing Mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541. http://aem.asm.org/content/75/23/7537
Navas-Molina JA, Peralta-Sánchez JM, González A, McMurdie PJ, Vázquez-Baeza Y, Xu Z et al (2013) Advancing our understanding of the human microbiome using QIIME. Methods Enzymol 531:371–444
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7):5069–5072. doi:10.1128/aem.03006-05
Schloss PD, Westcott SL (2011) Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77(10):3219–3226. http://www.ncbi.nlm.nih.gov/pubmed/21421784
Westcott SL, Schloss PD (2015) De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 13(3):e1487. https://peerj.com/articles/1487
Bonder MJ, Abeln S, Zaura E, Brandt BW (2012) Comparing clustering and pre-processing in taxonomy analysis. Bioinformatics 28(22):2891–2897. https://academic.oup.com/bioinformatics/article/28/22/2891/241231/Comparing-clustering-and-pre-processing-in
Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 10(10):996–998. doi:10.1038/nmeth.2604
Kopylova E, Navas-Molina JA, Mercier C, Xu ZZ, Mahé F, He Y, et al (2016) Open-source sequence clustering methods improve the state of the art. mSystems 13(1):e00003–15. http://msystems.asm.org/content/1/1/e00003-15
Schloss PD (2016) Application of a database-independent approach to assess the quality of operational taxonomic unit picking methods. mSystems 13(2):e00027–16. http://msystems.asm.org/content/1/2/e00027-16
He Y, Caporaso JG, Jiang X-T, Sheng H-F, Huse SM, Rideout JR, et al (2015) Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome 13(3):20. 10.1186/s40168-015-0081-x
Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73(16):5261–5267. http://www.ncbi.nlm.nih.gov/pubmed/17586664
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336. doi:10.1038/nmeth.f.303
Kuczynski J, Lauber CL, Walters WA, Parfrey LW, Clemente JC, Gevers D et al (2012) Experimental and analytical tools for studying the human microbiome. Nat Rev Genet 13(1):47–58. http://www.nature.com/nrg/journal/v13/n1/full/nrg3129.html
Bokulich NA, Rideout JR, Kopylova E, Bolyen E, Patnode J, Ellett Z, et al (2015) A standardized, extensible framework for optimizing classification improves marker-gene taxonomic assignments. PeerJ PrePrints. https://peerj.com/preprints/934
Lan Y, Wang Q, Cole JR, Rosen GL (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One 13(3):e32491. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032491
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145. http://www.ncbi.nlm.nih.gov/pubmed/19004872
Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J et al (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35(21):7188–7196. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2175337/
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A et al (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3):610–618. http://www.nature.com/ismej/journal/v6/n3/full/ismej2011139a.html
Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. (2016) A new view of the tree of life. Nat Microbiol 1(5):16048. http://www.nature.com/articles/nmicrobiol201648
Oksanen J, Blanchet F, Kindt R, Legendre P, O’Hara R (2016) Vegan: community ecology package. R package 2.3–3. https://cran.r-project.org/web/packa. https://cran.r-project.org/package=vegan
Lemos LN, Fulthorpe RR, Triplett EW, Roesch LFW (2011) Rethinking microbial diversity analysis in the high throughput sequencing era. J Microbiol Methods 86(1):42–51
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423
Simpsom EH (1949) Measurement of diversity. Nature 163:688. http://www.nature.com/nature/journal/v163/n4148/abs/163688a0.html
Bray JR, Curtis JT (1957) An ordination of the upland forest communities of Southern Wisconsin. Ecol Monogr 27(4):325–349. http://www.jstor.org/stable/1942268
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1317376/
Hoff KJ, Lingner T, Meinicke P, Tech M (2009) Ophelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 37:W101–W105. http://www.ncbi.nlm.nih.gov/pubmed/19429689
Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):e191. http://www.ncbi.nlm.nih.gov/pubmed/20805240
Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38(12):e132. http://www.ncbi.nlm.nih.gov/pubmed/20403810
Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL (2012) Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 40(1):e9. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245904/
Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12(1):59–60. http://www.nature.com/nmeth/journal/v12/n1/full/nmeth.3176.html
Database Resources of the National Center for Biotechnology Information (2017). Nucleic Acids Res 45(D1):D12–D17. 10.1093/nar/gkw1071
Huson DH, Auch AF, Qi J, Schuster SC (2017) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386. http://www.ncbi.nlm.nih.gov/pubmed/17255551
Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/
Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. http://genome.cshlp.org/content/early/2016/11/16/gr.210641.116
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811–814. http://www.nature.com/nmeth/journal/v9/n8/full/nmeth.2066.html
Prosser JI (2015) Dispersing misconceptions and identifying opportunities for the use of “omics” in soil microbial ecology. Nat Rev Microbiol 13(7):439–446. doi:10.1038/nrmicro3468
Brulc JM, Antonopoulos DA, Berg Miller ME, Wilson MK, Yannarell AC, Dinsdale EA, et al. (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci 106(6). http://www.pnas.org/content/early/2009/01/30/0806191105.abstract
Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31(6):533–538. http://www.nature.com/nbt/journal/v31/n6/abs/nbt.2579.html
Lemos LN, Pereira RV, Quaggio RB, Martins LF, Moura LMS, Silva D, et al (2017) Genome-centric analysis of a thermophilic and cellulolytic bacterial consortium derived from composting. Front Microbiol 8. http://journal.frontiersin.org/article/10.3389/fmicb.2017.00644/abstract
Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40(20):e155. http://www.ncbi.nlm.nih.gov/pubmed/22821567
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol A J Comput Mol Cell Biol 19(5):455–477. http://www.ncbi.nlm.nih.gov/pubmed/22506599
Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW (2014) MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2:26. 10.1186/2049-2618-2-26
Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW. (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4183954/
Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. https://peerj.com/articles/1165
Ledford H (2015). How to solve the world’s biggest problems. Nature News 525(7569):308. http://www.nature.com/news/how-to-solve-the-world-s-biggest-problems-1.18367
Marchesi JR, Ravel J (2015). The vocabulary of microbiome research: a proposal. Microbiome 3(1):31. http://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-015-0094-5
Mullard A (2008) Microbiology: the inside story. Nature News 453(7195):578–580. http://www.nature.com/news/2008/080528/full/453578a.html
Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA et al (2009) The NIH Human Microbiome Project. Genome Res 19(12):2317–2323. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2792171/
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65. http://www.ncbi.nlm.nih.gov/pubmed/20203603
Vogel TM, Simonet P, Jansson JK, Hirsch PR, Tiedje JM, Elsas V, et al (2009) TerraGenome: a consortium for the sequencing of a soil metagenome. Nat Rev Microbiol 2009. http://www.nature.com/nrmicro/journal/v7/n4/full/nrmicro2119.html
Gilbert JA, Jansson JK, Knight R (2014). The Earth Microbiome project: successes and aspirations. BMC Biol. 12:69. 10.1186/s12915-014-0069-1
Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L et al (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 29(5):415–420. http://www.scopus.com/inward/record.url?eid=2-s2.0-79955749319&partnerID=40&md5=692f7e8d6edcdff39c4928d95a5b6bb7%5Cnhttp://precedings.nature.com/documents/5252/version/2%5Cnhttp://www.nature.com/nbt/journal/v29/n5/abs/nbt.1823.html
Pylro VS, Roesch LFW, Ortega JM, do Amaral AM, Tola MR, Hirsch PR, et al (2014) Brazilian Microbiome Project: revealing the unexplored microbial diversity-challenges and prospects. Microb Ecol 67(2):237–241
Pylro VS, Morais DK, Roesch LFW (2015). Microbiology: microbiome studies need local leaders. Nature 528(7580)
Nesme J, Achouak W, Agathos SN, Bailey M, Baldrian P, Brunel D, et al (2016) Back to the future of soil metagenomics. Front Microbiol 7
Pylro VS, Roesch LFW, Morais DK, Clark IM, Hirsch PR, Tótola MR (2014) Data analysis for 16S microbial profiling from different benchtop sequencing platforms. J Microbiol Methods 107:30–37
Pylro VS, Morais DK, de Oliveira FS, dos Santos FG, Lemos LN, Oliveira G, et al (2016) BMPOS: a flexible and user-friendly tool sets for microbiome studies. Microb Ecol 72(2)
Pylro VS, Mui TS, Rodrigues JLM, Andreote FD, Roesch LFW (2016) A step forward to empower global microbiome research through local leadership. Trends Microbiol 24:767–771
Acknowledgments
Both Leandro Nascimento Lemos and Victor Satler Pylro received fellowships from FAPESP (São Paulo Research Foundation) (Processes 2016/18215-1 and 2014/50320-4 and Processes 2016/02219-8 and 2014/50320-4, respectively). All authors are supported by the Brazilian Microbiome Project (http://www.brmicrobiome.org/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Lemos, L.N., Morais, D.K., Tsai, S.M., Roesch, L., Pylro, V. (2017). Bioinformatics for Microbiome Research: Concepts, Strategies, and Advances. In: Pylro, V., Roesch, L. (eds) The Brazilian Microbiome. Springer, Cham. https://doi.org/10.1007/978-3-319-59997-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-59997-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59995-3
Online ISBN: 978-3-319-59997-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)