Background

The human gut hosts 100 trillion microorganisms, encompassing thousands of species collectively, weighing an average 1.5 kg per person [1, 2]. The human microbiota is important because of its metagenomic repertoire, which is estimated to be 100 times larger than the human genome and encodes a vast array of functionality critical for host physiology and metabolism [2]. Differences in human gut microbiome composition have been linked to metabolic diseases such as type 2 diabetes (T2D) and obesity [3,4,5,6,7]. Identifying specific bacterial biomarkers within the microbiome could help predict the occurrence of T2D or tailor treatments in high-risk subjects to prevent or delay the onset of metabolic diseases. The molecular mechanisms through which the intestinal microbiota play a key role in metabolic diseases are linked to an increased energy harvesting and the triggering of the low-grade inflammatory status characterizing insulin resistance and obesity [8, 9].

The prevalence of T2D is increasing worldwide, with current data indicating that at least 8.5% of the world’s population is affected, with the worldwide prevalence expected to reach 12% by 2025 [10, 11]. T2D is mainly caused by insulin resistance and relative insulin deficiency [12]. Saudi Arabia, with a total population of over 20 million, has an estimated T2D constituting 25% of the total population [13]. The rapid rate of increase of T2D disease in some areas of Saudi Arabia, which increased from 16% in 2005 to over 25% in 2011, is thought to be due to rapid lifestyle changes such as diet and sedentary lifestyle, as well as adverse environmental factors [13].

We analyzed the composition of 16S rRNA from the stool samples collected from Saudi Arabian participants residing in the Eastern Province and quantified the abundance of microbial communities to determine significant differences between subpopulations of samples based on diabetes status and glucose level. We assessed alpha diversity between the subpopulations to measure species richness and evenness among samples noting that an increased Firmicutes:Bacteriodetes ratio has previously been observed in the microbiota of obese/diabetic individuals compared to the microbiota of healthy individuals [14, 15]. Furthermore, individuals with diabetes were tracked for high glucose level (> 126 mg/dL) as it is an indicator of fasting hyperglycemia, which could potentially lead to severe long-term complications including cardiovascular disease, neuropathy and kidney failure.

Results

Principal coordinate analysis (PCoA) of the generated 16S datasets is shown in Supplementary Fig. S1a and b. The first and second principal coordinated explained 23% and 14%; 23% and 12% of the Diabetes Status and sex variance, respectively. Levels of the 150 most abundance microbial genera within T2D and non-T2D participants were observed to differ significantly in stool microbiota abundance derived from 16S sequencing (Supplementary Fig. S2a and b).

Figure 1a and b shows the rank abundant curve and Permutational Multivariate Analysis of Variance (PERMANOVA) cloud, respectively for Saudi T2D and control 16S stool microbiota datasets. These show that the microbiome communities differ globally between T2D and non-T2D subjects at statistical significance, p = 0.01. The abundance of Taxonomic Composition in males and females is clearly evident in both females (Supplementary Fig. S3a and b) and in males (Supplementary Fig. S4a and b). We also compared Saudi T2D participants with higher glucose > 126 mg/dL versus lower glucose strata <  = 126 mg/dL glucose using the top 150 genera. Amongst the 298 samples with glucose data, n = 193 were in the higher glucose strata and n = 105 were in the lower strata (Supplementary Fig. S5). Unlike previous studies conducted on Western populations, the Saudi participants with T2D and higher glucose levels showed a trend toward increased diversity, a result that is similar to another recently reported study from a United Arab Emirates cohort [3, 4, 16].

Fig. 1
figure 1

Rank abundance curve (a) and permutational multivariate analysis of variance (PERMANOVA) (b) for Saudi T2Ds and control 16S stool microbiota datasets. This figure shows the rank abundant curve and PERMANOVA respectively comprised of Saudi T2D and control 16S stool microbiota samples. These show that the microbiome communities differ globally between T2D and non-T2D subjects at statistical significance, p = 0.01

Alpha diversity was compared in males versus females (n = 204 and 226, respectively) with no significant differences observed using various different classifications: ACE (Abundance-based Coverage Estimator) and Chao1 indices to estimate richness (measurement of OTUs expected in samples given all the bacterial species identified in the samples); Shannon–Weaver, Simpson and Inverse Simpson to define different levels of resolution (phylum, class, order, family, genus, and species); and Fisher (Supplementary Fig. S6). Alpha diversity of T2D versus non-T2D participants revealed statistically significant enrichment of the Shannon–Weaver and Simpson metrics (Supplementary Figs. S7 and S8) (p < 2.26 × 10–10 (CI: -0.392 to -0.718)) and p < 4.63 × 10–7 (CI: -0.049 to -0.108) for Shannon and Simpson diversity, respectively. Saudi T2D cases versus controls showed an association with an elevated Bacteroidetes/Firmicutes ratio, p = 2.1 × 10–5 t-test (Supplementary Fig. S8).

We observed an overall positive enrichment of microbiota genus/families for diabetics compared to healthy individuals. In addition, among T2D patients, those with high glucose levels exhibited slightly more positive enrichment compared to those at lower risk of fasting hyperglycemia (Fig. 2a and b and Table S1). In particular, the Akkermansia, Acidaminococcus, Megamonas, Dialister, Lactobacillus and Paraprevotella genus were enriched at p < 1 × 10–9 in T2D versus non-T2D. The Bacteroides, Dialister, Akkermansia and Prevotella genus were enriched in low versus high-risk T2D using a fasting glucose cutoff of 126 mg/dL.

Fig. 2
figure 2

Fold change plots of enriched OTUs for: T2D vs control (a) and glucose levels for high vs low T2D status (b). An overall positive enrichment of microbiota phylum/genus for diabetics compared to non-T2D individuals and amongst diabetic participants was observed. Those with high glucose levels exhibited slightly more positive enrichment compared to those as lower risk of fasting hyperglycemia

Discussion

In this study we performed the largest microbiome study ever conducted in Saudi Arabia, as well as the first-ever characterization of gut microbiota T2D versus non-T2D in this population. We used 16 s rRNA metagenomic sequencing to reads identifiable down to genus level from the stool samples of 461 T2D and 119 non-T2D Saudi individuals from the Eastern Province of Saudi Arabia, a region particularly affected by T2D [17]. We assessed the microbiota abundance based on diabetes status and glucose levels and examined community diversity patterns to compare with other T2D microbiota studies from around the globe. These efforts are important and warranted given the scarcity of microbiome data in Middle Eastern populations, and these results provide a useful addition to the global microbiome reference dataset in an under-examined community. Saudi Arabian T2D costs have risen over 500% in two decades with 10 million individuals estimated to be diabetic or pre-diabetic, therefore comparing overlapping and varying patterns in gut microbiota with other studies is critical to assessing novel treatment options in light of a rapidly growing T2D health epidemic [17, 18].

Community level differences are evident in the Saudi population between T2D and non-T2D individuals, and diversity patterns appear to vary from well-characterized microbiota from Western cohorts. Indeed, in contrast to Western cohorts that often show associations between decreased gut microbiota diversity and insulin resistance, here we show that Saudi participants with T2D exhibited higher relative diversity in comparison to normal metabolic counterparts [19]. These results are similar to a recent report from Al Bataineh and colleagues who characterized microbiomes in a cohort of 50 T2D and non-T2D individuals from the United Arab Emirates, though higher diversity in that smaller T2D cohort was determined to be insignificant when controlling for age [20]. Sex was not found to play a role in community structural differences, and results were independently validated between females and males. The role of overall community diversity decreasing in T2D populations has been widely cited in early studies on Western populations, yet larger meta-analyses involving global populations have distorted this pattern and highlight the importance of locally representative studies [21, 22].

We observe significant differences between T2D and non-T2D individuals for many microbial taxa, as well as between T2D individuals with high and low fasting blood glucose levels. Concordant with studies conducted on Western populations is the association of increasing Bacteroidetes/Firmicutes ratio with T2D and in our overweight and obese T2D cohort, increased Bacteroidetes may be functionally related to metabolism of branched chain amino acids which has been linked to obesity-related metabolic phenotypes [3, 23]. Among OTUs assigned at the genus taxonomic level, Prevotella and Bacteroides OTUs showed some of the most significant log-fold increases in abundance for diabetics (over four-fold increases in abundance), species of which have been functionally associated with the development of insulin resistance and glucose intolerance [24]. Among Firmicutes however, levels of Acidaminococcus and Megasphaera were positively correlated with T2D, as has been previously observed, and could functionally relate with increases to Bacteroidetes through complementary amino acid metabolism [24,25,26]. We observed higher levels of Akkermansia in the Saudi T2D group, despite potential protective effects for obesity and metabolic disease. Associations of levels of Akkermansia, a mucus-consuming taxon, have been observed to be associated with health and with ethnicity in Western populations and may represent an impact of dietary and lifestyle effects on microbiota composition, as this microbe is rarely observed in more traditional cultures across large geographic regions [27]. It should be noted however that Akkermansia levels are also often increased in response to metformin intake in T2D subjects (metformin use metadata is not known for the current cohort) [28]. Taxonomic differences associated with T2D likely reflect shared or complementary functional and metabolic traits but may be regionally specific based on dietary and environmental variations known to influence the microbiome [27,28,29].

Based on diabetes status and quantified glucose levels of Middle Eastern participants, relatively stable differences in stool composition were observed by differential abundance and alpha diversity measures. Many studies have examined T2D associations with gut microbiota in populations around the globe, and while some patterns generally validate across studies such as individual taxon abundance variation, others such as overall community diversity do not replicate consistently. Obesity, diet, lifestyle and ancestry are all factors that influence T2D and each varies significantly from culture to culture around the globe, meaning that the patterns in T2D development and roles of the microbiome likely vary as well. As a rapidly emerging chronic condition in Saudi Arabia and the Middle East, T2D burdens have grown more quickly and affect larger proportions of the population than any other global region, making a regional reference T2D-microbiome dataset critical to understanding the nuances of disease development on a global scale.

Conclusions

This is the largest microbiome study ever conducted in Saudi Arabia, as well as the first-ever characterization of gut microbiota T2D versus non-T2D in this population. In addition, it has shown that community level differences are evident in the Saudi population between T2D and non-T2D individuals, and diversity patterns appear to vary from well-characterized microbiota from Western cohorts. Comparing overlapping and varying patterns in gut microbiota with other studies is critical to assessing novel treatment options in light of a rapidly growing T2D health epidemic in the region. As a rapidly emerging chronic condition in Saudi Arabia and the Middle East, T2D burdens have grown more quickly and affect larger proportions of the population than any other global region, making a regional reference T2D-microbiome dataset critical to understanding the nuances of disease development on a global scale.

Methods

Study populations

Between 2015–2019, stool samples and data were collected from 461 consecutive diabetic patients attending the Diabetic Clinics, King Fahd Hospital of the University, Al-Khobar, Saudi Arabia and from 119 healthy controls. Controls were selected from the general population with age ranged from 30–75 years and had a body mass index (BMI) ranging from 22 to 33 kg/m2 and had no diabetes or family history of diabetes. The T2D patients had a minimum disease duration of 5 years. Table 1 outlines the patient demographics and clinical characteristics. Baseline measurements included anthropometric measurements, physical examinations and in-person surveys. Participants who had been treated with antibiotics in the previous three months, were pregnant or lactating, or had inflammatory bowel disease were excluded from the study. Blood and stool samples were collected from participants and were stored immediately after collection at − 80 °C.

Table 1 Clinical and demographic characteristics for Saudi Arabian Type 2 diabetes cases (n-461) and controls (n = 119)

Methods for DNA library preparation and sequencing

Stool samples were taken from T2D (n = 461) and from healthy (n = 119) participants and were immediately stored at − 80 °C. Bacterial DNA extraction from stool samples was performed using QIAamp Fast DNA Stool Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Three independent extractions were performed from each sample to ensure robust representation of all microbial content.

Sequencing was performed using either the Swift Amplicon 16S panel (Swift Biosciences) or a custom protocol. For the Swift protocol, 20 ng of stool-derived DNA was used for 16S sequencing library preparation using the 16S Primer Panel v2, the Swift Normalase Amplicon Panels (SNAP) Core Kit, and the SNAP Combinatorial Dual Index Primer Kit (Sets 1A and 1B) (Swift Biosciences, CA). The indexed libraries were on average 620 base pairs (bp) in length, and individual DNA libraries were diluted to 2.5 nM, pooled in equimolar proportion, and sequenced on a NovaSeq 6000 SP flow cell (Illumina, CA) using 250 bp paired end reads. For the custom approach, PCR was performed on each sample using the 515F primer (forward primer) and one of the 100 806rcbc primers (reverse primer). These primers contained: sequence homologous to region V4 of the 16S rRNA in forward and reverse; Illumina adaptors; and the reverse primers contained indexing sequences. Taq PCR Master Mix from Qiagen was used to prepare the PCR master mix. A PCR reaction was performed on each extracted DNA sample, i.e. each stool sample had three PCR reactions. The PCR product was run on 1% agarose gel. The band of expected size (381 bp) was excised from gel and purified with gel purification kit from Qiagen. The three PCR products from each sample were pooled together. The pooled and purified PCR product was quantified with NanoDrop 2000 (Thermo Sciences, USA). Equal concentrations of DNA from each sample (5 ng of DNA) were pooled together. For each sequencing run, DNA from 50 samples was pooled to make the DNA library for each batch.

The final concentration of the DNA library was quantified with real time PCR using the Kapa library quantification kit (Roche, USA) according to the manufacturer’s instructions. The DNA library of each batch was sequenced using the MiSeq platform from Illumina (Illumina, USA) using the MiSeq reagent V2 500 cycles Kit from Illumina and the custom read1 (TATGGTAATTGTGTGCCAGCMGCCGCGGTAA), read2 (AGTCAGTCAGCCGGACTACH VGGGTWTCTAAT) and index (ATTAGAWACCCBDGTAGTCCGGCTGACTGACT) sequencing primers. PhiX DNA (Illumina, USA) was used as a control library.

Analysis

Supplementary Fig. S9 overviews the analytical pipeline and workflow employed for these analyses. 16S rRNA (V4 region) sequences were used in this study and sequenced with Illumina software which handled the initial primer and barcode processing of all raw sequences. Raw sequences were demultiplexed with Illumina’s bcl2fastq2 v2.20 [30]. FastQC was then used for further processing to remove samples with low quality scores across the majority of bases [31]. After de-multiplexing the raw sequences and screening via FastQC, the majority of data processing was executed in QIIME2. Paired-end reads were joined using VSEARCH. Chimera amplicon removal and abundance filtering were processed using Deblur [20]. Amplicon sequences were clustered and assembled into Operational Taxonomical Units (OTUs) using closed reference clustering against the Greengenes 13_8 database via VESEARCH. Taxonomic assignment was performed using a pre-trained Naïve Bayes classifier with Greengenes OTU database. The abundance tables and data obtained from QIIME2 were combined into a Phyloseq object and further analyzed in R with custom scripts [32]. Briefly, we removed taxa with less than 5% prevalence across all samples as well as set a cutoff that samples must have 1000 OTU counts (default phyloseq setting) to be carried into downstream analyses to filter out noisy samples. No samples were removed. Principle Coordinate Analysis (PCA) using Bray–Curtis dissimilarity was performed on relative abundance using phyloseq ordinate and plotordination functions. Heatmaps of top OTUs were generated using relative abundance as input to phyloseq plot_heatmap [1]. Alpha diversity was performed on counts using phyloseq estimate_richness and plot_richness functions [32]. Taxonomic bar plots generated on relative abundance aggregated to family level using phyloseqplot_composition [32]. Differentially expressed OTUs calculated using DESeq2 package [33]. Phyloseq object with counts was transformed using phyloseq_to_deseq2 and DESeqwith the default Wald test used to identify significant OTUs for the contrast of interest [34]. Bacteroidetes-Firmicutes ratio calculated using the microbiome package bfratio on relative abundance aggregated to the phylum level [35]. Visualization of population density and microbiome variation generated using microbiome plot landscape (https://github.com/microbiome/microbiome/blob/master/R/plot_landscape.R) with Non-Metric Multidimensional Scaling (NMDS) and Bray–Curtis dissimilarity metrics. PERMANOVA performed using vegan package and adonis function with default parameters. Rank abundance input is counts into Biodiversity R rankabundance [36]. The resulting output from rankabundance was visualized using BiodiversityR rankabuncomp and ggplot2 (https://ggplot2.tidyverse.org).