Skip to main content

Bioinformatics for Microbiome Research: Concepts, Strategies, and Advances

  • Chapter
  • First Online:
The Brazilian Microbiome

Abstract

Advances in next-generation sequencing technologies allow comparative analyses of the diversity and abundance of whole microbial communities, and of important ecosystem functional genes, at far greater depths than ever before. However, the current major challenge for the use of this immense amount of genetic information is undoubtedly how to convert the information into rational biological conclusions. As an attempt to solve this issue, we now rely on a set of complex computational/statistical analyses, the use of which, however, could be a drawback for most researchers in the biological sciences. In this chapter, we outline the main approaches applied for microbiome studies based on high-throughput sequencing technologies and we introduce the most commonly used strategies for data handling, sequence clustering, taxonomic and functional assignment, and microbial community comparisons. We also draw readers’ attention to recent advances in the microbiome research field, illustrating the Brazilian case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cardenas E, Tiedje JM (2008) New tools for discovering and characterizing microbial diversity. Curr Opin Biotechnol 19:544–549

    Google Scholar 

  2. Chen T, Yu W-H, Izard J, Baranova O V, Lakshmanan A, Dewhirst FE (2010) The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database 2010:baq013. http://www.ncbi.nlm.nih.gov/pubmed/20624719

  3. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590–D596. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531112/

    Article  CAS  PubMed  Google Scholar 

  4. Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y et al (2014) Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res 42(Database issue):D633–D642. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965039/

    Article  CAS  PubMed  Google Scholar 

  5. Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. Methods Mol Biol 1399:207–233. http://www.ncbi.nlm.nih.gov/pubmed/26791506

    Article  CAS  PubMed  Google Scholar 

  6. Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K, Szeto E et al (2017) IMG/VR: a database of cultured and uncultured DNA viruses and retroviruses. Nucleic Acids Res 45(Database issue):D457–D465. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210529/

    Article  PubMed  Google Scholar 

  7. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F et al (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459):431–437. http://www.nature.com/nature/journal/v499/n7459/full/nature12352.html

    Article  CAS  PubMed  Google Scholar 

  8. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17(6):333–351. http://www.nature.com/nrg/journal/v17/n6/full/nrg.2016.49.html

    Article  CAS  PubMed  Google Scholar 

  9. Schloss PD, Gevers D, Westcott SL (2011) Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6(12):e27310. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0027310

  10. Sokal RR (1963) The principles and practice of numerical taxonomy. Taxon 12(5):190–199. http://www.jstor.org/stable/1217562

    Article  Google Scholar 

  11. McCaig AE, Glover LA, Prosser JI (1999) Molecular analysis of bacterial community structure and diversity in unimproved and improved upland grass pastures. Appl Environ Microbiol 65(4):1721–1730. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC91243/

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Schloss PD, Handelsman J (2005) Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 71(3):1501–1506. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065144/

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. http://www.ncbi.nlm.nih.gov/pubmed/2231712

  14. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461. doi:10.1093/bioinformatics/btq461

    Article  CAS  PubMed  Google Scholar 

  15. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009) Introducing Mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541. http://aem.asm.org/content/75/23/7537

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Navas-Molina JA, Peralta-Sánchez JM, González A, McMurdie PJ, Vázquez-Baeza Y, Xu Z et al (2013) Advancing our understanding of the human microbiome using QIIME. Methods Enzymol 531:371–444

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K et al (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7):5069–5072. doi:10.1128/aem.03006-05

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Schloss PD, Westcott SL (2011) Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77(10):3219–3226. http://www.ncbi.nlm.nih.gov/pubmed/21421784

  20. Westcott SL, Schloss PD (2015) De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 13(3):e1487. https://peerj.com/articles/1487

  21. Bonder MJ, Abeln S, Zaura E, Brandt BW (2012) Comparing clustering and pre-processing in taxonomy analysis. Bioinformatics 28(22):2891–2897. https://academic.oup.com/bioinformatics/article/28/22/2891/241231/Comparing-clustering-and-pre-processing-in

    Article  CAS  PubMed  Google Scholar 

  22. Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 10(10):996–998. doi:10.1038/nmeth.2604

    Article  CAS  PubMed  Google Scholar 

  23. Kopylova E, Navas-Molina JA, Mercier C, Xu ZZ, Mahé F, He Y, et al (2016) Open-source sequence clustering methods improve the state of the art. mSystems 13(1):e00003–15. http://msystems.asm.org/content/1/1/e00003-15

  24. Schloss PD (2016) Application of a database-independent approach to assess the quality of operational taxonomic unit picking methods. mSystems 13(2):e00027–16. http://msystems.asm.org/content/1/2/e00027-16

  25. He Y, Caporaso JG, Jiang X-T, Sheng H-F, Huse SM, Rideout JR, et al (2015) Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome 13(3):20. 10.1186/s40168-015-0081-x

  26. Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73(16):5261–5267. http://www.ncbi.nlm.nih.gov/pubmed/17586664

  27. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336. doi:10.1038/nmeth.f.303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kuczynski J, Lauber CL, Walters WA, Parfrey LW, Clemente JC, Gevers D et al (2012) Experimental and analytical tools for studying the human microbiome. Nat Rev Genet 13(1):47–58. http://www.nature.com/nrg/journal/v13/n1/full/nrg3129.html

    Article  CAS  Google Scholar 

  29. Bokulich NA, Rideout JR, Kopylova E, Bolyen E, Patnode J, Ellett Z, et al (2015) A standardized, extensible framework for optimizing classification improves marker-gene taxonomic assignments. PeerJ PrePrints. https://peerj.com/preprints/934

  30. Lan Y, Wang Q, Cole JR, Rosen GL (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One 13(3):e32491. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0032491

  31. Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145. http://www.ncbi.nlm.nih.gov/pubmed/19004872

  32. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J et al (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35(21):7188–7196. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2175337/

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A et al (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3):610–618. http://www.nature.com/ismej/journal/v6/n3/full/ismej2011139a.html

    Article  CAS  PubMed  Google Scholar 

  34. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. (2016) A new view of the tree of life. Nat Microbiol 1(5):16048. http://www.nature.com/articles/nmicrobiol201648

  35. Oksanen J, Blanchet F, Kindt R, Legendre P, O’Hara R (2016) Vegan: community ecology package. R package 2.3–3. https://cran.r-project.org/web/packa. https://cran.r-project.org/package=vegan

  36. Lemos LN, Fulthorpe RR, Triplett EW, Roesch LFW (2011) Rethinking microbial diversity analysis in the high throughput sequencing era. J Microbiol Methods 86(1):42–51

    Article  CAS  PubMed  Google Scholar 

  37. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  Google Scholar 

  38. Simpsom EH (1949) Measurement of diversity. Nature 163:688. http://www.nature.com/nature/journal/v163/n4148/abs/163688a0.html

  39. Bray JR, Curtis JT (1957) An ordination of the upland forest communities of Southern Wisconsin. Ecol Monogr 27(4):325–349. http://www.jstor.org/stable/1942268

    Article  Google Scholar 

  40. Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1317376/

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Hoff KJ, Lingner T, Meinicke P, Tech M (2009) Ophelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res 37:W101–W105. http://www.ncbi.nlm.nih.gov/pubmed/19429689

  42. Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):e191. http://www.ncbi.nlm.nih.gov/pubmed/20805240

  43. Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38(12):e132. http://www.ncbi.nlm.nih.gov/pubmed/20403810

  44. Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL (2012) Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 40(1):e9. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245904/

  45. Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12(1):59–60. http://www.nature.com/nmeth/journal/v12/n1/full/nmeth.3176.html

    Article  CAS  PubMed  Google Scholar 

  46. Database Resources of the National Center for Biotechnology Information (2017). Nucleic Acids Res 45(D1):D12–D17. 10.1093/nar/gkw1071

  47. Huson DH, Auch AF, Qi J, Schuster SC (2017) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386. http://www.ncbi.nlm.nih.gov/pubmed/17255551

  48. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15(3):R46. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/

  49. Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. http://genome.cshlp.org/content/early/2016/11/16/gr.210641.116

  50. Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811–814. http://www.nature.com/nmeth/journal/v9/n8/full/nmeth.2066.html

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Prosser JI (2015) Dispersing misconceptions and identifying opportunities for the use of “omics” in soil microbial ecology. Nat Rev Microbiol 13(7):439–446. doi:10.1038/nrmicro3468

    Article  CAS  PubMed  Google Scholar 

  52. Brulc JM, Antonopoulos DA, Berg Miller ME, Wilson MK, Yannarell AC, Dinsdale EA, et al. (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci 106(6). http://www.pnas.org/content/early/2009/01/30/0806191105.abstract

  53. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31(6):533–538. http://www.nature.com/nbt/journal/v31/n6/abs/nbt.2579.html

    Article  CAS  PubMed  Google Scholar 

  54. Lemos LN, Pereira RV, Quaggio RB, Martins LF, Moura LMS, Silva D, et al (2017) Genome-centric analysis of a thermophilic and cellulolytic bacterial consortium derived from composting. Front Microbiol 8. http://journal.frontiersin.org/article/10.3389/fmicb.2017.00644/abstract

  55. Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40(20):e155. http://www.ncbi.nlm.nih.gov/pubmed/22821567

  56. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol A J Comput Mol Cell Biol 19(5):455–477. http://www.ncbi.nlm.nih.gov/pubmed/22506599

  57. Wu Y-W, Tang Y-H, Tringe SG, Simmons BA, Singer SW (2014) MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2:26. 10.1186/2049-2618-2-26

  58. Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW. (2014) GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4183954/

  59. Kang DD, Froula J, Egan R, Wang Z (2015) MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. https://peerj.com/articles/1165

  60. Ledford H (2015). How to solve the world’s biggest problems. Nature News 525(7569):308. http://www.nature.com/news/how-to-solve-the-world-s-biggest-problems-1.18367

  61. Marchesi JR, Ravel J (2015). The vocabulary of microbiome research: a proposal. Microbiome 3(1):31. http://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-015-0094-5

  62. Mullard A (2008) Microbiology: the inside story. Nature News 453(7195):578–580. http://www.nature.com/news/2008/080528/full/453578a.html

    Article  CAS  Google Scholar 

  63. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA et al (2009) The NIH Human Microbiome Project. Genome Res 19(12):2317–2323. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2792171/

    Article  PubMed  PubMed Central  Google Scholar 

  64. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285):59–65. http://www.ncbi.nlm.nih.gov/pubmed/20203603

  65. Vogel TM, Simonet P, Jansson JK, Hirsch PR, Tiedje JM, Elsas V, et al (2009) TerraGenome: a consortium for the sequencing of a soil metagenome. Nat Rev Microbiol 2009. http://www.nature.com/nrmicro/journal/v7/n4/full/nrmicro2119.html

  66. Gilbert JA, Jansson JK, Knight R (2014). The Earth Microbiome project: successes and aspirations. BMC Biol. 12:69. 10.1186/s12915-014-0069-1

  67. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L et al (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 29(5):415–420. http://www.scopus.com/inward/record.url?eid=2-s2.0-79955749319&partnerID=40&md5=692f7e8d6edcdff39c4928d95a5b6bb7%5Cnhttp://precedings.nature.com/documents/5252/version/2%5Cnhttp://www.nature.com/nbt/journal/v29/n5/abs/nbt.1823.html

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Pylro VS, Roesch LFW, Ortega JM, do Amaral AM, Tola MR, Hirsch PR, et al (2014) Brazilian Microbiome Project: revealing the unexplored microbial diversity-challenges and prospects. Microb Ecol 67(2):237–241

    Google Scholar 

  69. Pylro VS, Morais DK, Roesch LFW (2015). Microbiology: microbiome studies need local leaders. Nature 528(7580)

    Google Scholar 

  70. Nesme J, Achouak W, Agathos SN, Bailey M, Baldrian P, Brunel D, et al (2016) Back to the future of soil metagenomics. Front Microbiol 7

    Google Scholar 

  71. Pylro VS, Roesch LFW, Morais DK, Clark IM, Hirsch PR, Tótola MR (2014) Data analysis for 16S microbial profiling from different benchtop sequencing platforms. J Microbiol Methods 107:30–37

    Article  CAS  PubMed  Google Scholar 

  72. Pylro VS, Morais DK, de Oliveira FS, dos Santos FG, Lemos LN, Oliveira G, et al (2016) BMPOS: a flexible and user-friendly tool sets for microbiome studies. Microb Ecol 72(2)

    Google Scholar 

  73. Pylro VS, Mui TS, Rodrigues JLM, Andreote FD, Roesch LFW (2016) A step forward to empower global microbiome research through local leadership. Trends Microbiol 24:767–771

    Google Scholar 

Download references

Acknowledgments

Both Leandro Nascimento Lemos and Victor Satler Pylro received fellowships from FAPESP (São Paulo Research Foundation) (Processes 2016/18215-1 and 2014/50320-4 and Processes 2016/02219-8 and 2014/50320-4, respectively). All authors are supported by the Brazilian Microbiome Project (http://www.brmicrobiome.org/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Pylro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Lemos, L.N., Morais, D.K., Tsai, S.M., Roesch, L., Pylro, V. (2017). Bioinformatics for Microbiome Research: Concepts, Strategies, and Advances. In: Pylro, V., Roesch, L. (eds) The Brazilian Microbiome. Springer, Cham. https://doi.org/10.1007/978-3-319-59997-7_7

Download citation

Publish with us

Policies and ethics