Illumina sequencing platforms have been widely used for amplicon-based environmental microbiome research. Analyses of amplicon data of environmental samples, generated from Illumina MiSeq platform illustrate the reverse (R2) reads in the PE datasets to have low quality towards the 3’ end of the reads which affect the sequencing depth of samples and ultimately impact the sample size which may possibly lead to an altered outcome. This study evaluates the usefulness of single-end (SE) sequencing data in microbiome research when the Illumina MiSeq PE dataset shows significantly high number of low-quality reverse reads. In this study, the amplicon data (V1V3, V3V4, V4V5 and V6V8) from 128 environmental (soil) samples, downloaded from SRA, demonstrate the efficiency of single-end (SE) sequencing data analyses in microbiome research. The SE datasets were found to infer the core microbiome structure as comparable to the PE dataset. Conspicuously, the forward (R1) datasets inferred a higher number of taxa as compared to PE datasets for most of the amplicon regions, except V3V4. Thus, analyses of SE sequencing data, especially R1 reads, in environmental microbiome studies could ameliorate the problems arising on sample size of the study due to low quality reverse reads in the dataset. However, care must be taken while interpreting the microbiome structure as few taxa observed in the PE datasets were absent in the SE datasets. In conclusion, this study demonstrates the availability of choices in analyzing the amplicon data without having the need to remove samples with low quality reverse reads.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Availability of data and material
Downloaded from SRA.
Bharti R, Grimm DG (2021) Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 22:178–193. https://doi.org/10.1093/bib/bbz155
Bižić M, Klintzsch T, Ionescu D et al (2020) Aquatic and terrestrial cyanobacteria produce methane. Sci Adv 6:eaax5343. https://doi.org/10.1126/sciadv.aax5343
Callahan BJ, McMurdie PJ, Rosen MJ et al (2016) DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 13:581–583. https://doi.org/10.1038/nmeth.3869
Caruso V, Song X, Asquith M, Karstens L (2019) Performance of microbiome sequence inference methods in environments with varying biomass. mSystems 4:e00163-e218. https://doi.org/10.1128/mSystems.00163-18
Chen X, Johnson S, Jeraldo P et al (2018) Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags. Gigascience 7:1–7. https://doi.org/10.1093/gigascience/gix129
Fuks G, Elgart M, Amir A et al (2018) Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling. Microbiome 6:17. https://doi.org/10.1186/s40168-017-0396-x
Gilbert JA, Jansson JK, Knight R (2014) The Earth Microbiome project: successes and aspirations. BMC Biol 12:69. https://doi.org/10.1186/s12915-014-0069-1
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ (2017) Microbiome datasets are compositional: and this is not optional. Front Microbiol 8:2224. https://doi.org/10.3389/fmicb.2017.02224
Johnson JS, Spakowicz DJ, Hong B-Y et al (2019) Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun 10:5029. https://doi.org/10.1038/s41467-019-13036-1
Liu T, Chen C-Y, Chen-Deng A et al (2020) Joining Illumina paired-end reads for classifying phylogenetic marker sequences. BMC Bioinform 21:105. https://doi.org/10.1186/s12859-020-3445-6
McMurdie PJ, Holmes S (2013) phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8:e61217. https://doi.org/10.1371/journal.pone.0061217
Murali A, Bhargava A, Wright ES (2018) IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 6:140. https://doi.org/10.1186/s40168-018-0521-5
Oliverio AM, Geisen S, Delgado-Baquerizo M et al (2020) The global-scale distributions of soil protists and their contributions to belowground systems. Sci Adv 6:eaax8787. https://doi.org/10.1126/sciadv.aax8787
On behalf of the REHAB consortium, Gweon HS, Shaw LP et al (2019) The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. Environ Microbiome 14:7. https://doi.org/10.1186/s40793-019-0347-1
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290. https://doi.org/10.1093/bioinformatics/btg412
Pereira-Marques J, Hout A, Ferreira RM et al (2019) Impact of host DNA and sequencing depth on the taxonomic resolution of whole metagenome sequencing for microbiome analysis. Front Microbiol 10:1277. https://doi.org/10.3389/fmicb.2019.01277
Pollock J, Glendinning L, Wisedchanwet T, Watson M (2018) The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl Environ Microbiol 84:e02627-e2717. https://doi.org/10.1128/AEM.02627-17
Ramakodi MP (2021) Effect of amplicon sequencing depth in environmental microbiome research. Curr Microbiol 78:1026–1033. https://doi.org/10.1007/s00284-021-02345-8
Schirmer M, Ijaz UZ, D’Amore R et al (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 43:e37–e37. https://doi.org/10.1093/nar/gku1341
Singer GAC, Fahner NA, Barnes JG et al (2019) Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: a case study of eDNA metabarcoding seawater. Sci Rep 9:5991. https://doi.org/10.1038/s41598-019-42455-9
Soriano-Lerma A, Pérez-Carrasco V, Sánchez-Marañón M et al (2020) Influence of 16S rRNA target region on the outcome of microbiome studies in soil and saliva samples. Sci Rep 10:13637. https://doi.org/10.1038/s41598-020-70141-8
Susin A, Wang Y, Lê Cao K-A, Calle ML (2020) Variable selection in microbiome compositional data analysis. NAR Genom Bioinform 2:lqaa029. https://doi.org/10.1093/nargab/lqaa029
Tan G, Opitz L, Schlapbach R, Rehrauer H (2019) Long fragments achieve lower base quality in Illumina paired-end sequencing. Sci Rep 9:2856. https://doi.org/10.1038/s41598-019-39076-7
Thompson LR, Sanders JG, McDonald D et al (2017) A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551:457–463. https://doi.org/10.1038/nature24621
Wen C, Wu L, Qin Y et al (2017) Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform. PLoS ONE 12:e0176716. https://doi.org/10.1371/journal.pone.0176716
Werner JJ, Zhou D, Caporaso JG et al (2012) Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys. ISME J 6:1273–1276. https://doi.org/10.1038/ismej.2011.186
Wickham H (2016) ggplot2: elegant graphics for data analysis, 2nd edn. Springer International Publishing, Cham
Wickham H, Averick M, Bryan J et al (2019) Welcome to the Tidyverse. Joss 4:1686. https://doi.org/10.21105/joss.01686
Wright ES (2016) Using DECIPHER v2.0 to analyze big biological sequence data in R. Biology 8:352. https://doi.org/10.32614/RJ-2016-025
Zaheer R, Noyes N, Ortega Polo R et al (2018) Impact of sequencing depth on the characterization of the microbiome and resistome. Sci Rep 8:5890. https://doi.org/10.1038/s41598-018-24280-8
I would like to thank Dr. Bhawna Dubey, Chief Scientific Officer, Reprocell Bioserve Biotechnologies Pvt. Ltd., Hyderabad for reviewing the manuscript. The effort of Soriano-Lerma et al. (2020) for making the data publicly available on SRA is highly appreciated. CSIR-NEERI is acknowledged for providing the necessary support to carry out the analyses. The manuscript draft is submitted in the Institute Repository under the KRC No.: CSIR-NEERI/KRC/2021/MAY/HZC/3
Conflict of interest
Consent to participate
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by Erko Stackebrandt.
Below is the link to the electronic supplementary material.
About this article
Cite this article
Ramakodi, M.P. A comprehensive evaluation of single-end sequencing data analyses for environmental microbiome research. Arch Microbiol 203, 6295–6302 (2021). https://doi.org/10.1007/s00203-021-02597-9
- Environmental microbiome
- Amplicon sequencing
- Illumina MiSeq platform
- Single-end sequencing data
- Low quality reads
- Taxonomy inference