Skip to main content
Log in

Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform

  • Systems and Synthetic Microbiology and Bioinformatics
  • Published:
Journal of Microbiology Aims and scope Submit manuscript

Abstract

The cost of DNA sequencing has decreased due to advancements in Next Generation Sequencing. The number of sequences obtained from the Illumina platform is large, use of this platform can reduce costs more than the 454 pyrosequencer. However, the Illumina platform has other challenges, including bioinformatics analysis of large numbers of sequences and the need to reduce erroneous nucleotides generated at the 3′-ends of the sequences. These erroneous sequences can lead to errors in analysis of microbial communities. Therefore, correction of these erroneous sequences is necessary for accurate taxonomic identification. Several studies that have used the Illumina platform to perform metagenomic analyses proposed curating pipelines to increase accuracy. In this study, we evaluated the likelihood of obtaining an erroneous microbial composition using the MiSeq 250 bp paired sequence platform and improved the pipeline to reduce erroneous identifications. We compared different sequencing conditions by varying the percentage of control phiX added, the concentration of the sequencing library, and the 16S rRNA gene target region using a mock community sample composed of known sequences. Our recommended method corrected erroneous nucleotides and improved identification accuracy. Overall, 99.5% of the total reads shared 95% similarity with the corresponding template sequences and 93.6% of the total reads shared over 97% similarity. This indicated that the MiSeq platform can be used to analyze microbial communities at the genus level with high accuracy. The improved analysis method recommended in this study can be applied to amplicon studies in various environments using high-throughput reads generated on the MiSeq platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahn, J.H., Kim, M.S., Kim, M.C., Lim, J.S., Lee, G.T., Yun, J.K., Kim, T., Kim, T., and Ka, J.O. 2006. Analysis of bacterial diversity and community structure in forest soils contaminated with fuel hydrocarbon. J. Microbiol. Biotechnol. 16, 704–715.

    CAS  Google Scholar 

  • Bartram, A.K., Lynch, M.D.J., Stearns, J.C., Moreno-Hagelsieb, G., and Neufeld, J.D. 2011. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end Illumina reads. Appl. Environ. Microbiol. 77, 5569–5569.

    Article  CAS  PubMed Central  Google Scholar 

  • Bell, T.H., Yergeau, E., Maynard, C., Juck, D., Whyte, L.G., and Greer, C.W. 2013. Predictable bacterial composition and hydrocarbon degradation in arctic soils following diesel and nutrient disturbance. ISME J. 7, 1200–1210.

    Google Scholar 

  • Berry, D., Schwab, C., Milinovich, G., Reichert, J., Ben Mahfoudh, K., Decker, T., Engel, M., Hai, B., Hainzl, E., Heider, S., et al. 2012. Phylotype-level 16S rRNA analysis reveals new bacterial indicators of health state in acute murine colitis. ISME J. 6, 2091–2106.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., Knight, R., Mills, D.A., and Caporaso, J.G. 2013. Qualityfiltering vastly improves diversity estimates from Illumina amplicon sequencing. Nat. Methods 10, 57–59.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Caporaso, J.G., Lauber, C.L., Walters, W.A., Berg-Lyons, D., Huntley, J., Fierer, N., Owens, S.M., Betley, J., Fraser, L., Bauer, M., et al. 2012. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Caporaso, J.G., Lauber, C.L., Walters, W.A., Berg-Lyons, D., Lozupone, C.A., Turnbaugh, P.J., Fierer, N., and Knight, R. 2011. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA 108, 4516–4522.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Claesson, M.J., Wang, Q., O’Sullivan, O., Greene-Diniz, R., Cole, J.R., Ross, R.P., and O’Toole, P.W. 2010. Comparison of two Next-Generation Sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38, e200.

    Article  PubMed Central  PubMed  Google Scholar 

  • Degnan, P.H. and Ochman, H. 2012. Illumina-based analysis of microbial community diversity. ISME J. 6, 183–194.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Dunnett, C.W. 1955. A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Ass. 50, 1096–1121.

    Article  Google Scholar 

  • Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461.

    Article  CAS  PubMed  Google Scholar 

  • Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Engelbrektson, A., Kunin, V., Wrighton, K.C., Zvenigorodsky, N., Chen, F., Ochman, H., and Hugenholtz, P. 2010. Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J. 4, 642–647.

    Article  CAS  PubMed  Google Scholar 

  • Fisher, R.A. 1922. On the interpretation of χ2 from contingency tables, and the calculation of P. J. Royal Statist. Soc. 85, 87–94.

    Google Scholar 

  • Gloor, G.B., Hummelen, R., Macklaim, J.M., Dickson, R.J., Fernandes, A.D., MacPhee, R., and Reid, G. 2010. Microbiome profiling by Illumina sequencing of combinatorial sequencetagged PCR products. PLoS One 5, e15406.

    Article  PubMed Central  PubMed  Google Scholar 

  • Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., and Sogin, M.L. 2008. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet. 4, e1000255.

  • Ishii, K. and Fukui, M. 2001. Optimization of annealing temperature to reduce bias caused by a primer mismatch in multitemplate PCR. Appl. Environ. Microbiol. 67, 3753–3755.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Janda, J.M. and Abbott, S.L. 2007. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, Perils, and Pitfalls. J. Clin. Microbiol. 45, 2761–2764.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Jeon, Y.S., Chun, J., and Kim, B.S. 2013. Identification of household bacterial community and analysis of species shared with human microbiome.Curr. Microbiol. 67, 557–563.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Junemann, S., Prior, K., Szczepanowski, R., Harks, I., Ehmke, B., Goesmann, A., Stoye, J., and Harmsen, D. 2012. Bacterial community shift in treated periodontitis patients revealed by ion torrent 16S rRNA gene amplicon sequencing. PLoS One 7, e41606.

    Article  PubMed Central  PubMed  Google Scholar 

  • Kim, M.C., Ahn, J.H., Shin, H.C., Kim, T., Ryu, T.H., Kim, D.H., Song, H.G., Lee, G.H., and Kai, J.O. 2008. Molecular analysis of bacterial community structures in paddy soils for environmental risk assessment with two varieties of genetically modified rice, Iksan 483 and Milyang 204. J. Microbiol. Biotechnol. 18, 207–218.

    CAS  PubMed  Google Scholar 

  • Kim, O.S., Cho, Y.J., Lee, K., Yoon, S.H., Kim, M., Na, H., Park, S.C., Jeon, Y.S., Lee, J.H., Yi, H., et al. 2012. Introducing EzTaxon-e: A prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int. J. Syst. Evol. Microbiol. 62, 716–721.

    Article  CAS  PubMed  Google Scholar 

  • Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K., and Schloss, P.D. 2013. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Kumar, P.S., Brooker, M.R., Dowd, S.E., and Camerlengo, T. 2011. Target region selection is a critical determinant of community fingerprints generated by 16S pyrosequencing. PLoS One 6, e20956.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Kurata, S., Kanagawa, T., Magariyama, Y., Takatsu, K., Yamada, K., Yokomaku, T., and Kamagata, Y. 2004. Reevaluation and reduction of a PCR bias caused by reannealing of templates. Appl. Environ. Microbiol. 70, 7545–7549.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • LaTuga, M.S., Ellis, J.C., Cotton, C.M., Goldberg, R.N., Wynn, J.L., Jackson, R.B., and Seed, P.C. 2011. Beyond bacteria: A study of the enteric microbial consortium in extremely low birth weight infants. PLoS One 6, e27858.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Liu, Z.Z., DeSantis, T.Z., Andersen, G.L., and Knight, R. 2008. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res. 36, e120.

    Article  PubMed Central  PubMed  Google Scholar 

  • Miller, W. and Myers, E.W. 1988. Sequence comparison with concave weighting functions. Bull. Math. Biol. 50, 97–120.

    Article  CAS  PubMed  Google Scholar 

  • Nakamura, K., Oshima, T., Morimoto, T., Ikeda, S., Yoshikawa, H., Shiwa, Y., Ishikawa, S., Linak, M.C., Hirai, A., Takahashi, H., et al. 2011. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Nelson, M.C., Morrison, H.G., Benjamino, J., Grim, S.L., and Graf, J. 2014. Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One 9, e94249.

    Article  PubMed Central  PubMed  Google Scholar 

  • Oh, J., Kim, B.K., Cho, W.S., Hong, S.G., and Kim, K.M. 2012. Pyrotrimmer: A software with GUI for pre-processing 454 amplicon sequences. J. Microbiol. 50, 766–769.

    Article  CAS  PubMed  Google Scholar 

  • Robinson, J.T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26.

    CAS  Google Scholar 

  • Schloss, P.D., Gevers, D., and Westcott, S.L. 2011. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6, e27310.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Suzuki, M.T. and Giovannoni, S.J. 1996. Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl. Environ. Microbiol. 62, 625–630.

    CAS  PubMed Central  PubMed  Google Scholar 

  • Tindall, B.J., Rossello-Mora, R., Busse, H.J., Ludwig, W., and Kampfer, P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 60, 249–266.

    Article  CAS  PubMed  Google Scholar 

  • Wagner, A., Blackstone, N., Cartwright, P., Dick, M., Misof, B., Snow, P., Wagner, G.P., Bartels, J., Murtha, M., and Pendleton, J. 1994. Surveys of gene families using polymerase chain-reaction- PCR selection and PCR drift. Syst. Biol. 43, 250–261.

    Article  Google Scholar 

  • Wang, Q., Garrity, G.M., Tiedje, J.M., and Cole, J.R. 2007. Naive bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Werner, J.J., Zhou, D., Caporaso, J.G., Knight, R., and Angenent, L.T. 2012. Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys. ISME J. 6, 1273–1276.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Woese, C.R. 1987. Bacterial evolution. Microbiol. Rev. 51, 221–271.

    CAS  Google Scholar 

  • Yarza, P., Yilmaz, P., Pruesse, E., Glockner, F.O., Ludwig, W., Schleifer, K.H., Whitman, W.B., Euzeby, J., Amann, R., and Rossello-Mora, R. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645.

    Article  CAS  PubMed  Google Scholar 

  • Zhou, H.W., Li, D.F., Tam, N.F.Y., Jiang, X.T., Zhang, H., Sheng, H.F., Qin, J., Liu, X., and Zou, F. 2011. Bipes, a cost-effective highthroughput method for assessing microbial diversity. ISME J. 5, 741–749.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bong-Soo Kim.

Additional information

Supplemental material for this article may be found at http://www.springerlink.com/content/120956.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, YS., Park, SC., Lim, J. et al. Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform. J Microbiol. 53, 60–69 (2015). https://doi.org/10.1007/s12275-015-4601-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12275-015-4601-y

Keywords

Navigation