Skip to main content

A comprehensive evaluation of single-end sequencing data analyses for environmental microbiome research

Abstract

Illumina sequencing platforms have been widely used for amplicon-based environmental microbiome research. Analyses of amplicon data of environmental samples, generated from Illumina MiSeq platform illustrate the reverse (R2) reads in the PE datasets to have low quality towards the 3’ end of the reads which affect the sequencing depth of samples and ultimately impact the sample size which may possibly lead to an altered outcome. This study evaluates the usefulness of single-end (SE) sequencing data in microbiome research when the Illumina MiSeq PE dataset shows significantly high number of low-quality reverse reads. In this study, the amplicon data (V1V3, V3V4, V4V5 and V6V8) from 128 environmental (soil) samples, downloaded from SRA, demonstrate the efficiency of single-end (SE) sequencing data analyses in microbiome research. The SE datasets were found to infer the core microbiome structure as comparable to the PE dataset. Conspicuously, the forward (R1) datasets inferred a higher number of taxa as compared to PE datasets for most of the amplicon regions, except V3V4. Thus, analyses of SE sequencing data, especially R1 reads, in environmental microbiome studies could ameliorate the problems arising on sample size of the study due to low quality reverse reads in the dataset. However, care must be taken while interpreting the microbiome structure as few taxa observed in the PE datasets were absent in the SE datasets. In conclusion, this study demonstrates the availability of choices in analyzing the amplicon data without having the need to remove samples with low quality reverse reads.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Availability of data and material

Downloaded from SRA.

Code availability

Not applicable.

References

  1. Bharti R, Grimm DG (2021) Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 22:178–193. https://doi.org/10.1093/bib/bbz155

    CAS  Article  PubMed  Google Scholar 

  2. Bižić M, Klintzsch T, Ionescu D et al (2020) Aquatic and terrestrial cyanobacteria produce methane. Sci Adv 6:eaax5343. https://doi.org/10.1126/sciadv.aax5343

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. Callahan BJ, McMurdie PJ, Rosen MJ et al (2016) DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 13:581–583. https://doi.org/10.1038/nmeth.3869

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. Caruso V, Song X, Asquith M, Karstens L (2019) Performance of microbiome sequence inference methods in environments with varying biomass. mSystems 4:e00163-e218. https://doi.org/10.1128/mSystems.00163-18

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. Chen X, Johnson S, Jeraldo P et al (2018) Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags. Gigascience 7:1–7. https://doi.org/10.1093/gigascience/gix129

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Fuks G, Elgart M, Amir A et al (2018) Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling. Microbiome 6:17. https://doi.org/10.1186/s40168-017-0396-x

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gilbert JA, Jansson JK, Knight R (2014) The Earth Microbiome project: successes and aspirations. BMC Biol 12:69. https://doi.org/10.1186/s12915-014-0069-1

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ (2017) Microbiome datasets are compositional: and this is not optional. Front Microbiol 8:2224. https://doi.org/10.3389/fmicb.2017.02224

    Article  PubMed  PubMed Central  Google Scholar 

  9. Johnson JS, Spakowicz DJ, Hong B-Y et al (2019) Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun 10:5029. https://doi.org/10.1038/s41467-019-13036-1

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. Liu T, Chen C-Y, Chen-Deng A et al (2020) Joining Illumina paired-end reads for classifying phylogenetic marker sequences. BMC Bioinform 21:105. https://doi.org/10.1186/s12859-020-3445-6

    Article  Google Scholar 

  11. McMurdie PJ, Holmes S (2013) phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8:e61217. https://doi.org/10.1371/journal.pone.0061217

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Murali A, Bhargava A, Wright ES (2018) IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 6:140. https://doi.org/10.1186/s40168-018-0521-5

    Article  PubMed  PubMed Central  Google Scholar 

  13. Oliverio AM, Geisen S, Delgado-Baquerizo M et al (2020) The global-scale distributions of soil protists and their contributions to belowground systems. Sci Adv 6:eaax8787. https://doi.org/10.1126/sciadv.aax8787

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. On behalf of the REHAB consortium, Gweon HS, Shaw LP et al (2019) The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. Environ Microbiome 14:7. https://doi.org/10.1186/s40793-019-0347-1

    CAS  Article  Google Scholar 

  15. Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290. https://doi.org/10.1093/bioinformatics/btg412

    CAS  Article  PubMed  Google Scholar 

  16. Pereira-Marques J, Hout A, Ferreira RM et al (2019) Impact of host DNA and sequencing depth on the taxonomic resolution of whole metagenome sequencing for microbiome analysis. Front Microbiol 10:1277. https://doi.org/10.3389/fmicb.2019.01277

    Article  PubMed  PubMed Central  Google Scholar 

  17. Pollock J, Glendinning L, Wisedchanwet T, Watson M (2018) The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl Environ Microbiol 84:e02627-e2717. https://doi.org/10.1128/AEM.02627-17

    Article  PubMed  PubMed Central  Google Scholar 

  18. Ramakodi MP (2021) Effect of amplicon sequencing depth in environmental microbiome research. Curr Microbiol 78:1026–1033. https://doi.org/10.1007/s00284-021-02345-8

    CAS  Article  PubMed  Google Scholar 

  19. Schirmer M, Ijaz UZ, D’Amore R et al (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 43:e37–e37. https://doi.org/10.1093/nar/gku1341

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. Singer GAC, Fahner NA, Barnes JG et al (2019) Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: a case study of eDNA metabarcoding seawater. Sci Rep 9:5991. https://doi.org/10.1038/s41598-019-42455-9

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Soriano-Lerma A, Pérez-Carrasco V, Sánchez-Marañón M et al (2020) Influence of 16S rRNA target region on the outcome of microbiome studies in soil and saliva samples. Sci Rep 10:13637. https://doi.org/10.1038/s41598-020-70141-8

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. Susin A, Wang Y, Lê Cao K-A, Calle ML (2020) Variable selection in microbiome compositional data analysis. NAR Genom Bioinform 2:lqaa029. https://doi.org/10.1093/nargab/lqaa029

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Tan G, Opitz L, Schlapbach R, Rehrauer H (2019) Long fragments achieve lower base quality in Illumina paired-end sequencing. Sci Rep 9:2856. https://doi.org/10.1038/s41598-019-39076-7

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. Thompson LR, Sanders JG, McDonald D et al (2017) A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551:457–463. https://doi.org/10.1038/nature24621

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Wen C, Wu L, Qin Y et al (2017) Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform. PLoS ONE 12:e0176716. https://doi.org/10.1371/journal.pone.0176716

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. Werner JJ, Zhou D, Caporaso JG et al (2012) Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys. ISME J 6:1273–1276. https://doi.org/10.1038/ismej.2011.186

    CAS  Article  PubMed  Google Scholar 

  27. Wickham H (2016) ggplot2: elegant graphics for data analysis, 2nd edn. Springer International Publishing, Cham

    Book  Google Scholar 

  28. Wickham H, Averick M, Bryan J et al (2019) Welcome to the Tidyverse. Joss 4:1686. https://doi.org/10.21105/joss.01686

    Article  Google Scholar 

  29. Wright ES (2016) Using DECIPHER v2.0 to analyze big biological sequence data in R. Biology 8:352. https://doi.org/10.32614/RJ-2016-025

    Article  Google Scholar 

  30. Zaheer R, Noyes N, Ortega Polo R et al (2018) Impact of sequencing depth on the characterization of the microbiome and resistome. Sci Rep 8:5890. https://doi.org/10.1038/s41598-018-24280-8

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

I would like to thank Dr. Bhawna Dubey, Chief Scientific Officer, Reprocell Bioserve Biotechnologies Pvt. Ltd., Hyderabad for reviewing the manuscript. The effort of Soriano-Lerma et al. (2020) for making the data publicly available on SRA is highly appreciated. CSIR-NEERI is acknowledged for providing the necessary support to carry out the analyses. The manuscript draft is submitted in the Institute Repository under the KRC No.: CSIR-NEERI/KRC/2021/MAY/HZC/3

Funding

Not applicable.

Author information

Affiliations

Authors

Contributions

Single author.

Corresponding author

Correspondence to Meganathan P. Ramakodi.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Erko Stackebrandt.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1600 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ramakodi, M.P. A comprehensive evaluation of single-end sequencing data analyses for environmental microbiome research. Arch Microbiol 203, 6295–6302 (2021). https://doi.org/10.1007/s00203-021-02597-9

Download citation

Keywords

  • Environmental microbiome
  • Amplicon sequencing
  • Illumina MiSeq platform
  • Single-end sequencing data
  • Low quality reads
  • Taxonomy inference