Skip to main content
Log in

16S rRNA gene high-throughput sequencing data mining of microbial diversity and interactions

  • Mini-Review
  • Published:
Applied Microbiology and Biotechnology Aims and scope Submit manuscript

Abstract

The ubiquitous occurrence of microorganisms gives rise to continuous public concerns regarding their pathogenicity and threats to human environment, as well as potential engineering benefits in biotechnology. The development and wide application of environmental biotechnology, for example in bioenergy production, wastewater treatment, bioremediation, and drinking water disinfection, have been bringing us with both environmental and economic benefits. Strikingly, extensive applications of microscopic and molecular techniques since 1990s have allowed engineers to peep into the microbiology in “black box” of engineered microbial communities in biotechnological processes, providing guidelines for process design and optimization. Recently, revolutionary advances in DNA sequencing technologies and rapidly decreasing costs are altering conventional ways of microbiology and ecology research, as it launches an era of next-generation sequencing (NGS). The principal research burdens are now transforming from traditional labor-intensive wet-lab experiments to dealing with analysis of huge and informative NGS data, which is computationally expensive and bioinformatically challenging. This study discusses state-of-the-art bioinformatics and statistical analyses of 16S ribosomal RNA (rRNA) gene high-throughput sequencing (HTS) data from prevalent NGS platforms to promote its applications in exploring microbial diversity of functional and pathogenic microorganisms, as well as their interactions in biotechnological processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31(6):533–538

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  CAS  PubMed  Google Scholar 

  • Angiuoli SV, Matalka M, Gussman A, Galens K, Vangala M, Riley DR, Arze C, White JR, White O, Fricke WF (2011) CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinforma 12(1):356–370

    Article  Google Scholar 

  • Bragg L, Stone G, Imelfort M, Hugenholtz P, Tyson GW (2012) Fast, accurate error-correction of amplicon pyrosequences using Acacia. Nat Methods 9(5):425–426

    Article  CAS  PubMed  Google Scholar 

  • Cai L, Ju F, Zhang T (2013) Tracking human sewage microbiome in a municipal wastewater treatment plant. Appl Microbiol Biotechnol 98(7):3317–3326

    Article  PubMed  Google Scholar 

  • Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R (2010a) PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26(2):266–267

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI (2010b) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Cole J, Wang Q, Cardenas E, Fish J, Chai B, Farris R, Kulam-Syed-Mohideen A, McGarrell D, Marsh T, Garrity G (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37(1):141–145

    Article  Google Scholar 

  • Doncheva NT, Assenov Y, Domingues FS, Albrecht M (2012) Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc 7(4):670–685

    Article  CAS  PubMed  Google Scholar 

  • Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461

    Article  CAS  PubMed  Google Scholar 

  • Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27(16):2194–2200

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Evans J, Sheneman L, Foster J (2006) Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J Mol Evol 62(6):785–792

    Article  CAS  PubMed  Google Scholar 

  • Faust K, Raes J (2012) Microbial interactions: from networks to models. Nat Rev Microbiol 10(8):538–550

    Article  CAS  PubMed  Google Scholar 

  • Friedman J, Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8(9):e1002687

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Gobet A, Quince C, Ramette A (2010) Multivariate Cutoff Level Analysis (MultiCoLA) of large community data sets. Nucleic Acids Res 38(15):e155–e155

    Article  PubMed Central  PubMed  Google Scholar 

  • Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86

    Article  PubMed Central  PubMed  Google Scholar 

  • Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, Knight R, Ley RE (2014) Conducting a microbiome study. Cell 158(2):250–262

    Article  CAS  PubMed  Google Scholar 

  • Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3):307–321

    Article  CAS  PubMed  Google Scholar 

  • Guo F, Zhang T (2012) Profiling bulking and foaming bacteria in activated sludge by high throughput sequencing. Water Res 46(8):2772–2782

    Article  CAS  PubMed  Google Scholar 

  • Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21(3):494–504

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8):754–755

    Article  CAS  PubMed  Google Scholar 

  • Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17(3):377–386

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ibarbalz FM, Figuerola EL, Erijman L (2013) Industrial activated sludge exhibit unique bacterial community composition at high taxonomic ranks. Water Res 47(11):3854–3864

    Article  CAS  PubMed  Google Scholar 

  • Ju F, Zhang T (2014a) Bacterial assembly and temporal dynamics in activated sludge of a full-scale municipal wastewater treatment plant. ISME J 9:683–695

    Article  PubMed  Google Scholar 

  • Ju F, Zhang T (2014b) Novel microbial populations in ambient and mesophilic biogas-producing and phenol-degrading consortia unraveled by high-throughput sequencing. Microb Ecol 68(2):235–246

    Article  PubMed  Google Scholar 

  • Ju F, Guo F, Ye L, Xia Y, Zhang T (2013) Metagenomic analysis on seasonal microbial variations of activated sludge from a full-scale wastewater treatment plant over 4 years. Environ Microbiol Rep 6(1):80–89

    Article  PubMed  Google Scholar 

  • Ju F, Xia Y, Guo F, Wang Z, Zhang T (2014) Taxonomic relatedness shapes bacterial assembly in activated sludge of globally distributed wastewater treatment plants. Environ Microbiol 16(8):2421–2432

    Article  CAS  PubMed  Google Scholar 

  • Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12(4):656–664

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Knights D, Costello EK, Knight R (2011) Supervised classification of human microbiota. FEMS Microbiol Rev 35(2):343–359

    Article  CAS  PubMed  Google Scholar 

  • Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Thurber RLV, Knight R (2013) Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31(9):814–821

    Article  CAS  PubMed  Google Scholar 

  • Lassmann T, Sonnhammer EL (2005) Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinforma 6(1):298–306

    Article  Google Scholar 

  • Lix LM, Keselman JC, Keselman H (1996) Consequences of assumption violations revisited: a quantitative review of alternatives to the one-way analysis of variance F test. Rev Educ Res 66(4):579–619

    Google Scholar 

  • Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30(5):434–439

    Article  CAS  PubMed  Google Scholar 

  • Ludwig W, Strunk O, Westram R, Richter L, Meier H, Buchner A, Lai T, Steppi S, Jobb G, Förster W (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32(4):1363–1371

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mao Y, Yu K, Xia Y, Chao Y, Zhang T (2014) Genome reconstruction and gene expression of “Candidatus Accumulibacter phosphatis” clade IB performing biological phosphorus removal. Environ Sci Technol 48(17):10363–10371

  • Minoche AE, Dohm JC, Himmelbauer H (2011) Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12(11):R112

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Morariu VI, Srinivasan BV, Raykar VC, Duraiswami R, Davis LS (2009) Automatic online tuning for fast Gaussian summation. In: Advances in neural information processing systems, 1(1):1113-1120

  • Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217

    Article  CAS  PubMed  Google Scholar 

  • Oswald ES, Brown LM, Bulinski JC, Hung CT (2011) Label-free protein profiling of adipose-derived human stem cells under hyperosmotic treatment. J Proteome Res 10(7):3050–3059

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Page RD (2001) TreeView. Glasgow University, Glasgow, UK

    Google Scholar 

  • Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9):1073–1079

    Article  CAS  PubMed  Google Scholar 

  • Peng X, Guo F, Ju F, Zhang T (2014) Shifts in the microbial community, nitrifiers and denitrifiers in the biofilm in a full-scale rotating biological contactor. Environ Sci Technol 48(14):8044–8052

    Article  CAS  PubMed  Google Scholar 

  • Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One 5(3):e9490

    Article  PubMed Central  PubMed  Google Scholar 

  • Proulx SR, Promislow DE, Phillips PC (2005) Network thinking in ecology and evolution. Trends Ecol Evol 20(6):345–353

    Article  PubMed  Google Scholar 

  • Pruesse E, Peplies J, Glöckner FO (2012) SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28(14):1823–1829

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Qian P-Y, Wang Y, Lee OO, Lau SC, Yang J, Lafi FF, Al-Suwailem A, Wong TY (2010) Vertical stratification of microbial communities in the Red Sea revealed by 16S rDNA pyrosequencing. ISME J 5(3):507–518

    Article  PubMed Central  PubMed  Google Scholar 

  • Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13(1):341–353

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011) Removing noise from pyrosequenced amplicons. BMC Bioinforma 12(1):38–55

    Article  Google Scholar 

  • Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62(2):142–160

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Reeder J, Knight R (2010) Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution. Nat Methods 7(9):668–669

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB (2013) Characterizing and measuring bias in sequence data. Genome Biol 14(5):R51

    Article  PubMed Central  PubMed  Google Scholar 

  • Ruan Q, Dutta D, Schwalbach MS, Steele JA, Fuhrman JA, Sun F (2006) Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 22(20):2532–2538

    Article  CAS  PubMed  Google Scholar 

  • Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1):531–536

    Google Scholar 

  • Soergel DA, Dey N, Knight R, Brenner SE (2012) Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J 6(7):1440–1444

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY, Chow C-ET, Sachdeva R, Jones AC, Schwalbach MS (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J 5(9):1414–1425

    Article  PubMed Central  PubMed  Google Scholar 

  • Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Vanwonterghem I, Jensen PD, Ho DP, Batstone DJ, Tyson GW (2014) Linking microbial community structure, interactions and function in anaerobic digesters using new molecular techniques. Curr Opin Biotechnol 27:55–64

    Article  CAS  PubMed  Google Scholar 

  • Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73(16):5261–5267

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wright ES, Yilmaz LS, Noguera DR (2012) DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences. Appl Environ Microbiol 78(3):717–725

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Xia Y, Cai L, Zhang T, Fang HH (2012) Effects of substrate loading and co-substrates on thermophilic anaerobic conversion of microcrystalline cellulose and microbial communities revealed using high-throughput sequencing. Int J Hydrog Energy 37(18):13652–13659

    Article  CAS  Google Scholar 

  • Xu Z, Malmer D, Langille MG, Way SF, Knight R (2014) Which is more important for classifying microbial communities: who’s there or what they can do&quest. ISME J 8:2357–2359

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Yang Y, Jiang XT, Zhang T (2014) Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes. PLoS One 9(10):e110947

    Article  PubMed Central  PubMed  Google Scholar 

  • Ye L, Shao MF, Zhang T, Tong AHY, Lok S (2011) Analysis of the bacterial community in a laboratory-scale nitrification reactor and a wastewater treatment plant by 454-pyrosequencing. Water Res 45(15):4390–4398

    Article  CAS  PubMed  Google Scholar 

  • Yu K, Zhang T (2013) Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach. PLoS One 8(4):e59831

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zhang T, Shao M-F, Ye L (2012) 454 Pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants. ISME J 6(6):1137–1147

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors wish to thank the Hong Kong General Research Fund (7195/06E, 7197/08E, 7202/09E, 7198/10E, 7201/11E, 7190/12E, and 172099/14E) for the financial support of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, F., Zhang, T. 16S rRNA gene high-throughput sequencing data mining of microbial diversity and interactions. Appl Microbiol Biotechnol 99, 4119–4129 (2015). https://doi.org/10.1007/s00253-015-6536-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00253-015-6536-y

Keywords

Navigation