Applied Microbiology and Biotechnology

, Volume 103, Issue 1, pp 69–82 | Cite as

Bioinformatics tools to assess metagenomic data for applied microbiology

  • Otávio G. G. Almeida
  • Elaine C. P. De MartinisEmail author


The reduction of the price of DNA sequencing has resulted in the emergence of large data sets to handle and analyze, especially in microbial ecosystems, which are characterized by high taxonomic and functional diversities. To assess the properties of these complex ecosystems, a conceptual background of the application of NGS technology and bioinformatics analysis to metagenomics is required. Accordingly, this article presents an overview of the evolution of knowledge of microbial ecology from traditional culture-dependent methods to culture-independent methods and the last frontier in knowledge, metagenomics. Topics that will be covered include sample preparation for NGS, starting with total DNA extraction and library preparation, followed by a brief discussion of the chemistry of NGS to help provide an understanding of which bioinformatics pipeline approach may be helpful for achieving a researcher’s goals. The importance of selecting appropriate sequencing coverage and depth parameters to obtain a suitable measure of microbial diversity is discussed. As all DNA sequencing processes produce base-calling errors that compromise data analysis, including genome assembly and microbial functional analysis, dedicated software is presented and conceptually discussed with regard to potential applications in the general microbial ecology field.


Metagenomics NGS Applied bioinformatics Microbial diversity 



ECP De Martinis is a fellow of National Council for Scientific and Technological Development, Brazil (grant #6762/2006-4) and she is grateful for a Research Grant from São Paulo Research Foundation (FAPESP), Brazil (grant # 2017/18928-0). OGG Almeida is grateful to São Paulo Research Foundation (FAPESP), Brazil, for a Ph.D. fellowship (grant #2017/13759-6).

Funding information

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Bag S, Saha B, Mehta O, Anbumani D, Naveen K, Dayal M, Pant A, Kumar P, Saxena S, Allin KH, Hansen T, Arumugam M, Vestergaard H, Pedersen O, Pereira V, Abraham P, Tripathi R, Wadhwa N, Bhatnagar S, Prakash VG, Radha V, Anjana RM, Mohan V, Takeda K, Kurakawa T, Nair GB, Das B (2016) An improved method for high qualitymetagenomics DNA extraction from human and environmental samples. Sci Rep 6.
  2. Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J (2012) Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13:R122. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochim Biophys Acta 1842:1932–1941. CrossRefPubMedGoogle Scholar
  4. Chao A, Jost L (2012) Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology 93:2533–2547. CrossRefPubMedGoogle Scholar
  5. Cocolin L, Mataragas M, Bourdichon F, Doulgeraki A, Pilet MF, Jagadeesan B, Rantsiou K, Phister T (2017) Next generation microbial risk assessment meta-omics: the next need for integration. Int J Food Microbiol.
  6. Corley SM, MacKenzie KL, Beverdam A, Roddam LF, Wilkins MR (2017) Differentially expressed genes from RNA-seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols. BMC Genomics 18:399. CrossRefPubMedPubMedCentralGoogle Scholar
  7. Escobar-Zepeda A, Léon AVP, Sanchez-Flores A (2015) The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front Genet 6.
  8. Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8(3):186–194CrossRefGoogle Scholar
  9. Felczykowska A, Krajewska A, Zielińska S, Łoś JM (2015a) Sampling, metadata, and DNA extraction- importante steps in metagenomic studies. Acta Biochim Pol.
  10. Felczykowska A, Krajewska A, Zielińska S, Łoś JM, Bloch SK, Nejman-Faleńczyk B (2015b) Metagenomics. Acta Biochim Pol.
  11. Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, Jovanovich SB, Nelson JR, Schloss JA, Schwartz DC, Vezenov DV (2009) The challenges of sequencing by synthesis. Nat Biotechnol 27:1013–1023. CrossRefPubMedGoogle Scholar
  12. Fullwood MJ, Wei CL, Liu ET, Ruan Y (2009) Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genomeanalyses. Genome Res.
  13. Garza DR, Dutilh BE (2015) From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems. Cell Mol Life Sci 72:4287–4308. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Goodwin S, McPherson JD, McCombie R (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. CrossRefPubMedGoogle Scholar
  15. Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, Ordoukhanian P (2014) Library construction for next-generation sequencing: overviews and challenges. Biotech 56:61–4, 66, 68, passim. CrossRefGoogle Scholar
  16. Hooper SD, Dalevi D, Pati A, Mavromatis K, Ivanova NN, Kyrpides NC (2010) Estimating DNA coverage and abundance in metagenomes using a gamma approximation. Bioinformatics. CrossRefPubMedGoogle Scholar
  17. Hugenholtz P, Pace NR (1996) Identifying microbial diversity in the natural environment: a molecular phylogenetic approach. Trends Biotechnol 14:190–197. CrossRefPubMedGoogle Scholar
  18. Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377–386. CrossRefPubMedPubMedCentralGoogle Scholar
  19. Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R (2016) Megan Community edition – interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol 12:e1004957. CrossRefPubMedPubMedCentralGoogle Scholar
  20. Josefsen MH, Andersen SC, Christensen J, Hoorfar J (2015) Microbial food safety: potential of DNA extraction methods for use in diagnostic metagenomics. J Microbiol Methods 114:30–34. CrossRefPubMedGoogle Scholar
  21. Keisam S, Romi W, Ahmed G, Jeyaram K (2016) Quantifying the biases in metagenome mining for realistic assessment of microbial ecology of naturally fermented foods. Sci Rep 6.
  22. Li D, Liu CM, Luo R, Sadakane K, Lam TW (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. CrossRefPubMedGoogle Scholar
  23. Lundin D, Severin I, Logue JB, Östman O, Andersson AF, Lindström ES (2012) Which sequencing depth is sufficient to describe patterns in bacterial α- and β- diversity? Environ Microbiol Rep 4:367–372. CrossRefPubMedGoogle Scholar
  24. Marchesi JR, Ravel J (2015) The vocabulary of microbiome research: a proposal. Microbiome 3:31. CrossRefPubMedPubMedCentralGoogle Scholar
  25. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17.
  26. Marzorati M, Wittebolle L, Boon N, Daffonchio D, Verstraete W (2008) How to get more out of molecular fingerprints pratical tools to microbial ecology. Environ Microbiol 10:1571–1581. CrossRefPubMedGoogle Scholar
  27. Mayo B, Rachid CTCC, Alegría A, Leite AMO, Peixoto RS, Delgado S (2014) Impact of next generation sequencing techniques in food microbiology. Curr Genomics 15:293–309. CrossRefPubMedPubMedCentralGoogle Scholar
  28. McGinn S, Gut IG (2013) DNA sequencing- spanning the generations. New Biotechnol 30:366–372. CrossRefGoogle Scholar
  29. Metzker ML (2010) Sequencing technologies- the next generation. Nat Rev Genet 11:31–46. CrossRefPubMedGoogle Scholar
  30. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc.
  31. Meyer F, Paarman D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodrigues A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server- a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf 9:386. CrossRefGoogle Scholar
  32. Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32:1088–1090. CrossRefPubMedGoogle Scholar
  33. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Muyzer G (1999) DGGE/TGGE a method for identifying genes from natural ecosystems. Curr Opin Microbiol 2:317–322. CrossRefPubMedGoogle Scholar
  35. Namiki T, Hachiya T, Tanaka H, Sakakibara Y (2012) MetaVelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res.
  36. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPADES: a new versatile metagenomic assembler. Genome Res 27:824–834. CrossRefPubMedPubMedCentralGoogle Scholar
  37. Ogram A (2000) Soil molecular microbial ecology at age 20: methodological challenges for the future. Soil Biol Biochem.
  38. Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I (2015) Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 9:BBI.S12462. CrossRefGoogle Scholar
  39. Pabalan N, Jarjanazi H, Steiner TS (2014) Meta-analysis in microbiology. Indian J Med Microbiol 32:229. CrossRefPubMedGoogle Scholar
  40. Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7:e30619. CrossRefPubMedPubMedCentralGoogle Scholar
  41. Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) META-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27:i94–i101. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Peng Y, Leung HCM, Yiu M, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428. CrossRefPubMedGoogle Scholar
  43. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35:833–844. CrossRefPubMedGoogle Scholar
  44. Ranjan R, Rani A, Metwally A, McGee HS, Perkins DL (2016) Analysis of the microbiome: advantages of whole genome shotgun versus 16S amplicon sequencing. Biochem Biophys Res Commun.
  45. Rhoades A, Au KF (2015) PacBio sequencing and its applications. Genomics, Proteomics Bioinformatics 13:278–289. CrossRefGoogle Scholar
  46. Rhodes J, Beale MA, Fisher MC (2014) Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq. PLoS One 9:e113501. CrossRefPubMedPubMedCentralGoogle Scholar
  47. Rodriguez-R LM, Konstantinidis KT (2014a) Estimating coverage in metagenomic data sets and why it matters. ISME J.
  48. Rodriguez-R LM, Konstantinidis KT (2014b) Nonpareil: a redundancy based approach to assess the level of coverage in metagenomic datasets. Bioinformatics 30:629–635. CrossRefPubMedGoogle Scholar
  49. Salonen A, Nikkilä J, Jalanka-Tuovinen J, Immonen O, Rajilić-Stojanović M, Kekkonen RA, Palva A, de Vos WM (2010) Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods.
  50. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. PNAS 74(12):5463–5467CrossRefGoogle Scholar
  51. Schadt EE, Truner S, Kasarskis A (2010) A window into third-generation sequencing. Hum Mol Genet 19:R227–R240. CrossRefPubMedGoogle Scholar
  52. Schloss PD, Handelsman J (2003) Biotechnological prospects from metagenomics. Curr Opin Biotechnol 14(3):303–310CrossRefGoogle Scholar
  53. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Strez B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, plataform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541. CrossRefPubMedPubMedCentralGoogle Scholar
  54. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. PNAS 109:6241–6246. CrossRefPubMedGoogle Scholar
  55. Scholz MB, Lo CC, Chain PSG (2012) Next generation sequencing and bioinformatics bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol 23:9–15. CrossRefPubMedGoogle Scholar
  56. Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Mol Ecol 21:1794–1805. CrossRefPubMedGoogle Scholar
  57. Sims D, Sudbery I, IIott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analysis. Nat Rev Genet 15:121–132. CrossRefPubMedGoogle Scholar
  58. Sinha R, Abnet CC, White O, Knight R, Huttenhower C (2015) The microbiome quality control project: baseline study design and future directions. Genome Biol 16:276. CrossRefPubMedPubMedCentralGoogle Scholar
  59. Su C, Lei L, Duan Y, Zhang KQ, Yang J (2012) Culture-independent methods for studying environmental microorganisms: methods, application, and perspective. Appl Microbiol Biotechnol 93:993–1003. CrossRefPubMedGoogle Scholar
  60. Thomas T, Gilbert J, Meyer F (2012) Metagenomics- a guide from sampling to data analysis. Microb Inform Exp 2:3. CrossRefPubMedPubMedCentralGoogle Scholar
  61. Treangen TJ, Koren S, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14:R2. CrossRefPubMedPubMedCentralGoogle Scholar
  62. van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA (2017) Assembling metagenomes, one community at a time. BMC Genomics.
  63. Van Djick EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426. CrossRefGoogle Scholar
  64. Van Nieuwerburgh F, Thompson RC, Ledesma J, Deforce D, Gaasterland T, Ordoukhanian P, Head SR (2012) Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination. Nucleic Acids Res.
  65. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530. CrossRefPubMedGoogle Scholar
  66. Wesolowska-Andersen A, Bahl MI, Carvalho V, Kristiansen K, Sicheritz-Pontén T, Gupta R, Licht TR (2014) Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomics analysis. Microbiome 2:19. CrossRefPubMedPubMedCentralGoogle Scholar
  67. Wood DE, Salzberg SL (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. CrossRefPubMedPubMedCentralGoogle Scholar
  68. Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:e1000667. CrossRefPubMedPubMedCentralGoogle Scholar
  69. Xu J (2006) Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Mol Ecol 15:1713–1731. CrossRefPubMedGoogle Scholar
  70. Zhou Q, Su X, Ning K (2014) Assessment of quality control approaches for metagenomic data analysis. Sci Rep 4.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculdade de Ciências Farmacêuticas de Ribeirão PretoUniversidade de São PauloSão PauloBrazil

Personalised recommendations