Background

Type 2 diabetes mellitus (T2D) is a prevalent disease characterized by imbalances in regulation of blood glucose, and in the levels of blood lipids, blood platelet aggregation, and blood pressure [1,2,3]. Multiple gene variants associated with T2D have been identified, partly explaining the heritability of the disorder [4]. Apparently, the genetic susceptibility conferring risk of overt diabetes is triggered by numerous environmental risk factors including unhealthy diet, sedentary lifestyle, and smoking. Several of the diabetes-related environmental risk factors may mediate part of their diabetogenic impact through changes of the intestinal microbiota [5]. As such, aberrant composition and function of the intestinal microbiota have recently been implicated in the pathogenesis of T2D as well as several other metabolic disorders [6,7,8,9,10,11].

The T2D phenotype of Asian Indians is different from that of Europeans and is characterized by unique fat distribution, as well as changes in blood lipid composition and inflammatory markers [12, 13]. Several earlier studies have reported sub-clinical inflammation in the general Indian population in context of insulin resistance and prediabetes [12, 14,15,16,17]. Even when compared with other South Asian populations, it has been observed that while general adiposity could explain the difference in insulin resistance in Chinese and Malays, abdominal fat distribution and inflammation were the significant factors that contributed to excess insulin resistance in Asian Indians [17]. This characteristic phenotype of Asian Indians could possibly be linked to the gut microbiota through the unique dietary patterns of the Indian population. A distinctive feature of the gut microbiota of healthy Indian subjects is the predominance of the genera Prevotella, Faecalibacterium, Collinsella, and Megasphaera [18,19,20,21]. Furthermore, Asian Indians with T2D have been reported to have alterations in abundances of all kinds of microbes spanning Eubacteria, Archaea, and eukaryotes [22]. Elevated abundance of specific bacterial genera like Escherichia has also been reported in Indians with T2D when compared to healthy individuals [22, 23].

The natural history of T2D includes a stage of prediabetes (PD) where blood glucose levels are higher than normal, but not high enough to warrant the diagnosis of diabetes [24]. Prevention of disease progression is possible at this stage [25,26,27,28]. A few studies with limited sample size have reported a possible association between the gut microbiome composition and prediabetes [10, 29,30,31,32,33]. However, there have been no studies comparing different ethnicities, looking for a prediabetes signature in the gut microbiota. The current study aims to investigate the gut microbiota in Indian and Danish adults with normoglycemia and compare it with the microbiota of individuals with prediabetes in the two countries. Besides genetic differences between the Indian and Danish individuals, Denmark and India have entirely different cultural, climatic, socio-demographic and dietary patterns. This study is intended to serve as a unique resource in the quest to obtain specific microbiome signatures of prediabetes which can help in better understanding of the disease pathophysiology and may be explored further for identifying potential early indicators/ biomarkers for individuals with risk of dysglycemia, across populations of different ethnicities.

Methods

Participant enrollment and sample collection in Denmark

A total of 259 Danish volunteers [138 normoglycemic (NG) controls and 121 with prediabetes (PD)] were recruited from the DanFund [34] and ADDITION-PRO cohorts [35] and by advertisement in local newspapers. All Danish subjects were of White European ethnicity, aged 35 to 74 years, with a body mass index (BMI) from 20 to 40 kg/m2. Individuals with known diabetes of any kind, who were treated with antibiotics within 4 months, were pregnant or lactating, or unable to give informed consent were ineligible for inclusion.

Individuals with HbA1c below 5.7% (39 mmol/mol) and fasting plasma glucose below 6.1 mmol/L at time of screening were eligible for inclusion as normoglycemic controls. Individuals with a history of gestational diabetes were ineligible for inclusion as normoglycemic controls. Individuals with fasting plasma glucose of 6.1 to 6.9 mmol/L or glycated hemoglobin A1c of 5.7 to 6.4% (39 to 47 mmol/mol) were eligible for inclusion as prediabetics.

Fecal samples were collected by the participants following standardized procedures, including home sampling with immediate freezing at − 18 °C and transfer in an insulating polystyrene container with dry ice or cooling elements for final storage at − 80 °C within 48 h.

Participant enrollment and sample collection in India

The Indian cohort comprised of 278 individuals [137 with normal glucose tolerance (NG) and 141with prediabetes (PD)] attending a tertiary care center for diabetes between April 2014 and April 2016. Diagnosis of normal glucose tolerance and impaired glucose tolerance was based on the results of a standard oral glucose tolerance test (OGTT), performed using a 82.5 g oral glucose load (equivalent to 75 g of anhydrous glucose). Study subjects were adults of either sex aged between 35 and 65 years. Individuals suffering from chronic and severe ailments (such as cancer and tuberculosis) and those who had used medications such as dipeptidyl peptidase-4 inhibitors, acarbose, glucagon-like peptide-1 receptor agonists, and orlistat were excluded from the study. A special kit containing the collection tubes, bed-pan liner, and dry ice required for collection of fecal samples were given to the study subjects. The fecal samples were frozen at − 20 °C within 1 h and then transferred to the − 80 °C freezer.

It may be noted here that although the current report pertains to microbial signatures associated with prediabetes (PD), the cohort recruitment in India and Denmark was done as part of a bigger research project “MicrobDiab - Studies of interactions between the gut Microbiome and the human host biology to elucidate novel aspects of the pathophysiology and pathogenesis of type 2 Diabetes”. The NG samples from India and Denmark reported in this work also forms the basis of a related study of the MicrobDiab project, aimed at deciphering the trans-ethnic microbial signatures associated with T2D.

Phenotyping of study participants

Phenotyping of the study participants from both India and Denmark included recording basic physical variables, viz. height, weight, waist circumference, BMI, and blood pressure, along with a wide variety of biochemical tests and serum levels of 11 inflammation biomarkers (details in Additional file 1). In addition, a structured questionnaire was used to obtain information on age, gender, duration of prediabetes, family history of diabetes, food habits, physical activity patterns, smoking, allergic conditions, disease related to the gastrointestinal tract, etc.

Microbiome sequencing

It may be noted here that to minimize confounding effects of the technical procedures, the standard operating procedures for recruitment of study participants, biological sample processing, and microbial DNA extraction of stools were synchronized. Furthermore, DNA sequencing of all samples were performed collectively in one sequencing center at the Translational Health Science and Technology Institute, India. Similarly, profiling of inflammation biomarkers from all samples were also performed in the same laboratory (details of protocols in Additional file 1).

Extraction of DNA was performed from 200 mg stool sample from each participant using a standard INRA protocol [36]. The variable regions (V1–V5) of the 16S rRNA genes were amplified using 27F(C1) and 926R(C5) primers followed by sequencing of the equimolar libraries performed on a 454 GS FLX+ pyrosequencer platform (Details in Supplementary Methods in Additional file 1). In addition to the samples collected from volunteers recruited in this study, 16S rRNA gene sequencing was also performed for additional microbiome samples collected from Indian and Danish volunteers with T2D for an allied study, using the same protocols and multiplexed sequencing runs mentioned above. Sequence data for all microbiome samples have been submitted to NCBI SRA and are available with SRA accession PRJNA517829 [37].

Sequence analysis

The sequenced reads were demultiplexed using sequencing barcode information (Additional file 2: Table S1) and subsequently quality filtered (average PHRED score > 20). Considering a minimum sequencing coverage of 5000 high-quality reads per sample, a total of 18,380,379 sequences encompassing the V1–V5 region of the 16S rRNA gene were obtained from 864 microbiome samples. V3–V5 regions from all the sequenced reads (having variable read-lengths) were subsequently extracted using V-Xtractor 2.0 [38], and any read which did not encompass the complete V3–V5 region was not considered for further analysis. A total of 17,030,870 quality-checked and trimmed reads pertaining to 864 samples were considered for the downstream OTU picking step (average sequencing depth of 19,712 ± 7774 SD reads/sample). While many contemporary studies have preferred exact sequence variant-based analysis [39, 40] of amplicon sequencing data over OTU picking, these methods are mostly designed for processing of sequencing data generated on an Illumina platform, and to the best of our knowledge, there has been no validation of the utility of exact sequence variants vs OTUs on 454 single-end sequencing data. Further, resorting to identifying exact variants with 100% sequence identity may be construed as an attempt to go beyond what the accuracy of the sequencing technologies allows, and a more conservative OTU-based approach in context of noise arising from sequencing error, intra-genomic heterogeneity, etc., was preferred for the current study. OTU picking was performed using an “open reference OTU picking” approach as implemented in the QIIME pipeline v1.9.1 [41]. For the process, Greengenes OTUs clustered at 97% identity (Greengenes version 13_8) was used as the reference OTU database [42], while UCLUST v1.2.22q [43] was chosen as the preferred OTU picking method (“uclust_ref” run with default parameters for clustering sequences with 97% identity). Representative sequences from each of the OTUs were used for annotating corresponding taxonomic lineages (using the tool dada2 [39] considering SILVA database version 132 [44] as a reference). Sparse OTUs containing < 0.002% of the total number of high-quality reads sequenced were removed. A final OTU abundance table with a total of 1897 OTUs, including 1471 OTUs bearing correspondence to OTUs already cataloged in the Greengenes database, as well as 426 de novo OTUs identified from 864 samples was created. A subset of 537 microbiome samples pertaining to normoglycemic and prediabetic individuals from India and Denmark, corresponding to 10,647,149 quality-checked and trimmed reads, was considered for downstream taxonomic analyses in this study. Functional potential of the gut microbiomes were estimated from the taxonomic distribution using the tools PiCrust v1.1.0 [45] and Vikodak [46]. Although estimating functional potential of microbiomes should ideally be performed with appropriate shotgun metagenomics data, the current amplicon sequencing-based study has its limitation in this respect, and therefore used the above mentioned tools which are reported to provide reliable estimates from taxonomic abundance data.

Statistical analysis

Alpha diversity metrics (viz. Shannon diversity, Simpson index and OTU richness) were calculated using R Vegan packagev2.5.2 [47]. Given that uneven sequencing depth of different samples may influence calculation of alpha diversity measures like OTU richness, this step was performed on rarefied abundance data (equivalent to the sample having minimum sequencing depth, i.e., ~ 4500 reads/sample). T-tests were performed to assess any significant differences between the alpha diversity parameters of samples belonging to different geographies or health status. Differences between the measured phenotypic traits of subjects belonging to different countries/health status were evaluated using Wilcoxon test(s). P values were corrected for multiple testing using Benjamini-Hochberg (BH) correction. PCoA plots based on taxonomic profiles (relative OTU abundance) of microbiome samples were generated the R Phyloseq package v1.22.3 [48], wherein weighted UniFrac was used as the distance metric. Similar PCoA plot was also generated using imputed functional profile of the microbiome samples (KEGG functional modules) wherein Jensen-Shannon divergence (JSD) was used as the distance metric. The extent of variation explained by geography and disease status was tested with permutational multivariate analysis of variance (PERMANOVA), using adonis2 function available in the R Vegan package v2.5.2. Dispersion of the country and disease status-specific clusters was evaluated using the betadisper function available in the R Vegan package v2.5.2. A negative binomial Wald test using the R package DESeq2v1.10.1 [49] was performed to identify the taxonomic groups (at all different levels of taxonomic hierarchy), which were differentially abundant in NG and PD samples (BH corrected p ≤ 0.05) for Indian and Danish cohorts separately. PD-specific microbiome abundance signatures were also evaluated after pooling together Indian and Danish cohorts, while correcting the negative binomial Wald test results for the anticipated geography-specific cohort effect. Further a forest-plot-based meta-analysis of the differentially abundant factors identified in the pooled analysis was also performed to put in context the effect sizes (log2 fold enrichment of mean abundances in PD with respect to NG) and directions in individual geographies. Additional negative binomial Wald tests were performed (using DESeq2) separately on Indian and Danish subjects to identify discriminating OTUs, while correcting for certain observed covariates of glycemic status, viz., waist-to-hip ratio, systolic BP, IL6, TNFα, LBP, and IAP, which might also influence the microbiome structure. Corrections were also performed for age and gender of the subjects, given that the age and gender distribution of normoglycemic and prediabetic cohorts from the two countries had some variations. Spearman correlations between abundances of differentially abundant microbial OTUs (between NG and PD subjects) and measured phenotypic traits of the subjects were calculated. It is relevant to mention here that HbA1c levels were used to define the NG and PD groups, and one might expect that correlations identified might be artifacts of the partitioning process. However, HbA1c levels for all subjects taken together were observed to follow a normal distribution (in both geographies), and therefore, partitioning of the subjects (NG/PD) based on clinically prescribed HbA1c thresholds is not expected to have any confounding effects on the computed correlations. Heatmaps depicting identified significant correlations were generated using the R “gplots” package v3.0.1. The correlations were evaluated separately for the Indian and Danish cohorts. Random forest (RF) classifier(s) were constructed for classifying PD samples based on gut microbiome composition using R Random forest package (v4.6–12). Detailed methods are provided in Additional file 1.

Results

Distinct phenotypes and inflammation marker levels in Indian and Danish cohorts

Table 1 (and Additional file 3: Table S2) shows the clinical and biochemical characteristics of the Danish individuals (normoglycemic = 138, prediabetic = 121) and Indian (normoglycemic = 137, prediabetic = 141) individuals participating in the study. Among Danish participants, individuals with prediabetes were significantly (Wilcoxon test, padj < 0.05) older, had higher waist to hip ratios, and higher systolic blood pressure compared to normoglycemic participants. While the clinical differences seen between the normoglycemic individuals and individuals with prediabetes in Denmark are as expected [50, 51], no significant differences in these respects could be observed in Indian participants with prediabetes when compared to the normoglycemic volunteers. When comparing countries, the Danish subjects with prediabetes were significantly older, taller, heavier, and had higher systolic blood pressure compared to their Indian counterparts. Similar differences in height, weight, and systolic blood pressure were also observed when normoglycemic individuals from both countries were compared. Another intriguing observation pertained to the magnitude of difference in HbA1c levels between the prediabetic and normoglycemic individuals from two countries. The normoglycemic individuals from India had an overall higher level of HbA1c (median = 37 mmol/mol) compared to the Danish normoglycemic participants (median = 33 mmol/mol). In effect, the difference between HbA1c levels of normoglycemic and prediabetic individuals appeared to be much higher in case of Danes, when compared to Indians.

Table 1 Differences in phenotypic traits in Danish and Indian cohorts

Results on a panel of 11 fasting serum inflammatory biomarkers are also presented in Table 1. Among Danes, individuals with prediabetes had significantly (Wilcoxon test, padj < 0.05) higher levels of interleukin 6 (IL6), tumor necrosis factor α (TNFα), lipopolysaccharide-binding protein (LBP), and intestinal alkaline phosphatase (IAP) compared to normoglycemic individuals. In the Indian cohort, there were no significant differences in any of the circulating inflammatory markers in individuals with prediabetes when compared to those with normoglycemia. Interestingly, irrespective of the glycemic status, the overall levels of high-sensitivity C-reactive protein (hsCRP), TNFα, and LBP were significantly higher among Indians compared to Danes (Additional file 4: Table S3). On the other hand, the overall levels of interleukin 13 (IL13) and monocyte-chemoattractant protein 1 (MCP1) were higher among Danes. Considering the inter-individual variations in the biomarkers, we also reanalyzed the data from normoglycemic and prediabetic subjects while dividing into tertiles and found some interesting insights (Additional file 5: Table S4). Among Danes, while considering the tertile-based data analysis (particularly the tertile 2 and/or tertile 3 levels of biomarkers), most of the inflammatory markers were significantly higher in individuals with prediabetes compared to normoglycemic individuals. The only exceptions were interleukin 10 (IL10) and interleukin 17A (IL17A) levels, which were significantly lower in individuals with prediabetes. Similar analysis in Indians showed significantly higher levels of inflammatory biomarkers like high-sensitive C-reactive protein (hsCRP), IL1β, IL13, IL17A, IL6, TNFα, and IAP in individuals with prediabetes compared to normoglycemic individuals.

Dominant and core bacterial taxa in Danish and Indian gut microbiota

While the gut microbiota of Danish participants were significantly (t-test, p < 0.05) more diverse when compared to the Indian volunteers (Additional file 6: Figure S1), no significant differences in alpha diversity were observed between microbiota belonging to the normoglycemic and prediabetic groups in the respective cohorts. Firmicutes, followed by Bacteroidetes, were the dominant phyla across all samples in both populations (Additional file 6: Figure S2, Additional file 7: Table S5A). While Actinobacteria, Proteobacteria, and Elusimicrobia were seen to be present in significantly (negative binomial Wald test; Benjamini-Hochberg-corrected padj < 0.05) higher proportions in the Indian cohort, Bacteroidetes, Tenericutes, Verrucomicrobia, and Synergistetes were observed to be significantly enriched in the Danish subjects. When resolved at a family level (Additional file 6: Figure S3, Additional file 7: Table S5B), Ruminococcaceae, Bacteroidaceae, Rikenellaceae, and Christensenellaceae were among the major families which exhibited more than twofold enrichment in Danes compared to Indians (negative binomial Wald test; Benjamini-Hochberg-corrected padj < 0.05). In contrast, Prevotellaceae, Veillonellaceae, Erysipelotrichaceae, Lactobacillaceae, Coriobacteriaceae, Streptococcaceae, and Atopobiaceae were seen to be significantly enriched (over twofold) in the Indian cohort.

A search for core genera (present in at least 80% of the subjects with minimum 0.1% abundance) in the gut microbiota of normoglycemic and prediabetic individuals showed Dorea, Agathobacter, Collinsella, Lachnoclostridium, Lachnospira, Blautia, Faecalibacterium, Roseburia, and Subdoligranulum to be present ubiquitously in subjects from both ethnic groups (Fig. 1). Although Megasphaera and Lactobacillus could be identified as a core microbiota in the gut of Indian subjects, their prevalence was very low in the Danish population. On the other hand, Parabacteroides and Alistipes were only present in a small fraction of the Indian samples, but could be identified as core genera in the Danish population. Strong geography-specific patterns were identified in the distribution of core OTUs (Additional file 8: Table S6). While a total of 32 OTUs were observed to be ubiquitously present across samples from both the geographies with normalized abundance> 0.01%, OTUs specific to the Danish (29 OTUs) and Indian (16 OTUs) participants could also be identified. Out of the 29 core OTUs specific to the Danish samples, 17 were Firmicutes, while 11 belonged to the phylum Bacteroidetes, including 6 from the genus Bacteroides. In contrast, the Indian cohort had only 3 Bacteroidetes OTUs, all from the genus Prevotella9, along with 12 Firmicutes OTUs and a single OTU belonging to the genus Senegalimassilia (phylum Actinobacteria).

Fig. 1
figure 1

Core genera indifferent groups. Core genera identified in normoglycemic (NG) and prediabetic (PD) groups of samples corresponding to the Indian (IN) and Danish (DK) cohorts. Genera which are present in at least 80% of the samples belonging to a particular group, having a minimum (normalized) abundance of 0.1%, have been defined to constitute the core. The values indicated in the heatmap represent ubiquity of a taxon as a percentage of samples (in the respective groups) wherein the taxon is present at a relative abundance of ≥ 0.1%

Gut microbiome composition in individuals with prediabetes

Principal coordinate analysis (PCoA) based on OTU abundance using weighted UniFrac distance (see Supplementary Methods in Additional file 1) did not reveal any prediabetes-specific patterns when the Danish and Indian samples were combined (Fig. 2a). Instead, a strong country-specific effect on gut microbiota was apparent from the distinct clustering of Indian and Danish samples. The strong effect of geography on the gut microbiome was also confirmed by a PERMANOVA test (R2 = 11.2%; p = 0.001). A negative binomial Wald test, after correcting for the country-specific cohort effect, identified 160 OTUs, which were differentially abundant (padj < 0.05) in the samples based on glycemic status (Additional file 9: Table S7). OTUs belonging to Prevotella9, Phascolarctobacteriumfaecium, Barnesiellaintestinihominis, Flavonifractorplautii, Tyzzerellanexilis, Bacteroidesnordii, Faecalibacterium, and Agathobacter were among the OTUs that were enriched in normoglycemic subjects by two folds or more (Table 2). In addition, three OTUs from the family Ruminococcaceae, and one OTU each from the families Muribaculaceae and Christensenellaceae had more than twofold enriched abundance in normoglycemic subjects. In contrast, OTUs enriched by two folds or more in the subjects with prediabetes included those belonging to Megasphaera, Streptococcus, Prevotella9, Alistipes, Mitsuokella, Escherichia/Shigella, Prevotella2, Vibrio cholerae, Lactobacillus, Alloprevotella, Rhodococcus, Klebsiella and two more belonging to the family Ruminococcaceae. A meta-analysis of the differentially abundant factors presented in Table 2 is provided in Additional file 6: Figure S4. The forest plot depicts the effect sizes and directions of the factors in individual geographies, as well as the combined effect size. For almost all the OTUs identified through negative binomial Wald test on the pooled data (after correcting for the country-specific cohort effect), the effect direction of microbial association with dysglycemia was observed to be same in both geographies. However, effect sizes showed geography-specific trends and in many cases did not attain statistically significant values in one of the geographies. OTUs which showed different effect directions included those belonging to Phascolarctobacteriumfecium, Tyzerella_4 nexiilis, Eschirichia/Shigella, Prevotella2, Alloprevotella, and one de novo OTU belonging to Lactobacillus. In most of these cases, the effect was significantly strong in one of the geographies, which influenced the combined effect during pooled analysis. Further, for one of the OTUs belonging to Falvonifractor plautii, contrasting effects were observed during pooled (cohort-effect corrected) and meta-analyses, which can probably be attributed to differences in fitting its taxonomic abundance data to negative binomial distributions, once for the pooled data and subsequently for the geography-specific abundance data.

Fig. 2
figure 2

Taxonomic and functional diversity of microbiomes. PCoA plots based on a OTU presence using weighted Unifrac distances and b KEGG functional modules present in different microbiome samples (as inferred with Picrust) using JSD distances. The microbiome samples have been plotted along the first two principal components

Table 2 Differentially abundant OTUs between NG and PD subjects from both Danish and Indian cohorts

When the Indian and Danish cohorts were considered separately, additional OTUs discriminating between the normoglycemic and prediabetic groups could be identified (Tables 3, 4, Additional file 10: Table S8, Additional file 11: Table S9). A total of 89 OTUs were found to be differentially abundant (padj < 0.05) in either the normoglycemic or the prediabetic group in Indian subjects (Additional file 10: Table S8A). In the Danish cohort, 56 OTUs were found to be differentially abundant (padj < 0.05) in either of these two groups (Additional file 11: Table S9A). Normoglycemic subjects from India were characterized by an overabundance (two folds or more) of multiple OTUs belonging to the Prevotella9 group (which includes Prevotella copri), along with a few OTUs belonging to the family Ruminococcaceae, including the short-chain fatty acid (SCFA) producing Faecalibacterium [52] (Table 3). The Danish normoglycemic subjects also exhibited enriched abundance (two folds or more) of OTUs from the family Ruminococcaceae, along with a few OTUs from the genera Phascolarctobacterium and Oscillibacter and three OTUs belonging to Prevotella9 (Table 4). Indian participants with prediabetes, on the other hand, were enriched in OTUs belonging to the genera Lactobacillus, Megasphaera, Subdoligranulum, Escherichia/Shigella, Dialister, Vibrio, Streptococcus, Achromobacter, and Blautia. Overall, an enrichment of Firmicutes OTUs was apparent in the Indian prediabetics. In Danish subjects with prediabetes, multiple OTUs belonging to the genus Bacteroides and family Lachnospiraceae were enriched. Interestingly, several fold enrichments of two OTUs belonging to Prevotella2 group (which includes Prevotella stercorea) were identified in the Danish subjects with prediabetes. It may however be noted that most of the prominent cohort-specific microbial associations with glycemic status listed in Tables 3 and 4 did not follow a similar significant trend in the other cohort. In some cases, certain discriminating OTUs (e.g., those belonging to Prevotella9 specific to the Indian cohort) were absent in the other cohort.

Table 3 Differentially abundant OTUs between NG and PD subjects (Indian cohort)
Table 4 Differentially abundant OTUs between NG and PD subjects (Danish cohort)

It may also be noted that the above observations present an overall view of microbial associations that can either be directly related to the glycemic status, or any associated comorbidities, or other intrinsic/extrinsic host factors relevant to the studied cohorts. While for the Indian subjects, none of the measured physical/ biochemical parameters (other than glucose levels or HbA1c) or inflammation markers, as reported in Table 1, showed significant variations between the normoglycemic and prediabetic cohorts, the Danish subjects showed differences in multiple parameters including waist-to-hip ratio, systolic BP, IL6, TNFα, LBP, and IAP levels, as well as differences in age and gender distribution of the normoglycemic and prediabetic volunteers who could be recruited for the study. Given this observation, negative binomial Wald tests were repeated on the data from Indian and Danish cohorts, while correcting for the mentioned covariates (see Additional file 1). It was intriguing to note that post correcting for covariates, 129 differentially abundant OTUs (padj < 0.05) were identified to be associated with either the normoglycemic or the prediabetic groups belonging to the Indian cohort (Additional file 10: Table S8B). As expected for the Indian cohort, most of the differentially abundant OTUs (64 out of 89) between normoglycemic and prediabetic groups, identified prior to correcting for covariates, were still observed to be significant (padj < 0.05) discriminating factors. Out of 25 OTUs exhibiting differential abundance of two folds or more (depicted in Table 3), 19 OTUs were identified to be significantly discriminating even after correcting for the covariates. However, in case of the Danish cohort, the number of differentially abundant OTUs (padj < 0.05), in either the normoglycemic or the prediabetic group, decreased to 39 after correcting for the mentioned covariates (Additional file 11: Table S9B). Out of these, only 11 OTUs were in common with the earlier obtained list (Additional file 11: Table S9A) of differentially abundant OTUs between Danish normoglycemic and prediabetic groups. Out of 26 OTUs from the Danish cohort exhibiting differential abundance of two folds or more (depicted in Table 4), only 5 OTUs were identified to be significantly discriminating after correcting for the covariates. This set included one OTU belonging to the order Mollicutes and another belonging to the family Ruminococcaceae, which were enriched in the Danish normoglycemic subjects, as well as two OTUs belonging to the genus Lachnoclostridium and one OTU belonging to the genus Bacteroides, which were enriched in the Danish prediabetic subjects.

Differences in gut microbiomes pertaining to normoglycemic and prediabetic individuals were also apparent at higher taxonomic ranks (Additional file 12: Table S10). A negative binomial Wald test, after correcting for country-specific cohort effect, indicated that the families Enterobacteriaceae, Enterococcaceae, Vibrionaceae, and Burkholderiaceae, all from the phylum Proteobacteria; Streptococcaceae from phylum Firmicutes; and Nocardiaceae from phylum Actinobacteria had relatively higher abundances (Benjamini-Hochberg-corrected padj < 0.05) in the PD samples. However, at the phylum level, no significant variations could be observed.

Inferred functional profiles of gut microbiome

Principal coordinate analysis (PCoA) of predicted functional profiles (KEGG functional modules) based on Jensen-Shannon distances (see Supplementary Methods in Additional file 1) did not reveal any prediabetes-specific signatures (Fig. 2b), which was in line with the results obtained using taxonomic profiles (Fig. 2a). Intriguingly, and in contrast with the taxonomy-based PCoA analysis, no country-specific separation was apparent. However, dispersion of predicted functional profiles pertaining to Indian gut microbiomes was significantly higher than that of the Danish functional profiles (Additional file 6: Figure S5). Such dispersion was not observed when taxonomic compositions of Danish and Indian gut microbiota were tested.

Certain predicted functional pathways and modules discriminating between the normoglycemic and prediabetic subjects could be identified using negative binomial Wald tests (Additional file 13: Table S11, Additional file 14: Table S12, Additional file 15: Table S13). However, the fold enrichments of these predicted pathways and modules, in either of the normoglycemic or prediabetic groups, were minimal in most cases (average log2 fold change = 0.06). The predicted pathways included tyrosine metabolism and ascorbate and aldarate metabolism, as well as multiple xenobiotic degradation pathways that were enriched in subjects with prediabetes (Additional file 13: Table S11). On the other hand, d-glutamine and d-glutamate metabolism exhibited an inverse trend and were depleted in prediabetic subjects.

Investigating the predicted functional profile at the module level led to further insights (Additional file 14: Table S12, Additional file 15: Table S13). Multiple modules pertaining to transport of sugars and phosphotransferase system (PTS) were enriched in the gut microbiome of individuals with prediabetes, which is in line with previous observations [6, 53]. In addition, several predicted functional modules pertaining to drug resistance and efflux pumps were observed to be enriched in the microbiome of prediabetic subjects, suggesting increased exposure to antibiotics or other xenobiotics. One of the interesting observations pertains to the metabolism of the neurotransmitter gamma-aminobutyric acid (GABA shunt and GABA biosynthesis functions), which was predicted to be enriched in prediabetic subjects after correcting for country-specific cohort effect. The enrichment was more prominent in the Indian cohort and assumes importance in context of previous studies indicating effects of GABA on the islet beta cells [54].

Association between gut microbiota and clinical biomarkers

For both Indian and Danish cohorts, a relatively small proportion of OTUs enriched in the normoglycemic subjects exhibited correlations with clinical variables and inflammatory biomarkers (Additional file 6: Figure S6, Additional file 16: Table S14). In the Indian subjects, these OTUs were predominantly from the genus Prevotella9 (4 OTUs), along with one OTU each from the genera Faecalibacterium, Agathobacter, Alloprevotella, and one OTU belonging to the family Muribaculaceae. All other OTUs exhibiting significant correlation(s) with one or more phenotypic variables were enriched in prediabetic samples. It was interesting to note that a considerable fraction of these OTUs (7 de novo OTUs out of 10) belonged to the genus Megasphaera, most of which exhibited significant positive correlations with fasting plasma glucose and HbA1c levels, and weak negative correlations with HDL cholesterol and inflammation markers like TNFα and LBP. Another intriguing observation pertains to two OTUs belonging to the family Burkholderiaceae including the one from the lymphoid tissue-resident commensal bacterial (LRC) genus Achromobacter and another from the GKS98 freshwater group, which showed significant positive correlations with inflammatory biomarkers like IL10 and IL17A. An OTU belonging to Faecalibacterium (OTU 319275) was also observed to be positively correlated with IL10 and IL6 levels in the Indian cohort, which is in line with previous reports suggesting anti-inflammatory and IL10 inducing roles of some Faecalibacterium strains [55, 56]. The heatmap corresponding to Danish subjects showed a small but coherent grouping of OTUs enriched in normoglycemic participants, which included four OTUs from the family Ruminococcaceae (including one OTU each from the genus Ruminococcus1 and Oscillibacter), an OTU belonging to the genus Phascolarctobacterium faecium and another two OTUs belonging to the order MollicutesRF39 and family XIII AD3011 group from the order Clostridiales. Almost all OTUs depicted in this heatmap, which were associated with Danish prediabetic subjects, belonged to the order Clostridiales. A considerable number of these were from the family Lachnospiraceae followed by those from the family Ruminococcaceae, both these families being ubiquitously present in Danish gut microbiota. A couple of Prevotella2 OTUs exhibiting modest negative correlations with HDL cholesterol also pertained to this prediabetes-associated group of OTUs identified in the Danish population.

Microbiome signature-based classifiers for indicating predisposition to dysglycemia

Random forest (RF) classifiers were constructed to assess the ability of the abovementioned microbiome signatures in segregating the normoglycemic and prediabetic subjects (see Supplementary Methods in Additional file 1). When trained with taxonomic data (1897 OTUs as features) for all Indian and Danish gut microbiota samples, a RF model with area under the receiver operating characteristic curve (AUC) of 62.7% and out-of-bag (OOB) error rate of 40.04% could be obtained. However, the anticipated effect of extraneous predictors on a predictive model [57] and earlier reports of RF classifiers built on microbiome data [58], prompted adoption of an additional feature selection step. For feature selection, the whole dataset was randomly split (stratified considering proportions of NG and PD samples) into a training set and an independent test set in the ratio 66:34. Post feature selection step described in Additional file 1, a bagged RF model with 76 selected features (Additional file 17: Table S15) could be obtained with an improved AUC of 77.54%, and a decent test AUC of 66.86% (Fig. 3). While the clinical relevance of these RF models might be limited, results of this exercise reiterate the distinct gut microbiome signatures in prediabetic subjects from India and Denmark. Further, the set of selected OTUs (obtained through the feature selection step) used in the model holds relevance for future studies in this direction.

Fig. 3
figure 3

Microbiome-based random forest classifier for prediabetes. Performance of RF classifier (trained on OTU abundance) distinguishing between NG and PD samples from India and Denmark. [Note: The data was split into training and independent test sets (in the ratio of 66:34) and a feature selection step adopted while training the model performed with 10-fold cross-validation (× 10 replicates); Top 10 features were selected from each cross-validation fold and ranked based on their cumulative importance (gini score used). Final “bagged” RF model was built using a set of features providing best training AUC, selected through progressively adding the ranked features into the model (up to a maximum of 100), while evaluating training AUC]

Discussion

Recent evidences of causal or consequential relationship of the gut microbiota with metabolic phenotypes suggest the need of studying these aspects in each other’s context. The Danish and Indian cohorts were significantly different in multiple phenotypic aspects, and intriguingly the signs of metabolic syndrome like higher waist-to-hip ratios and systolic blood pressure were more apparent in the Danish prediabetic subjects. Similar patterns were also observed in levels of inflammatory biomarkers. While the Danish prediabetic subjects exhibited higher levels of several inflammatory biomarkers like IL6, TNFα, LBP, and IAP compared to normoglycemic individuals, there were no such differences between the Indian prediabetic and normoglycemic subjects. Notably, the fasting serum levels of a majority of inflammatory markers in Indian participants were higher than in the Danish participants. The only inflammatory markers having higher levels in the Danish participants included IL13 and MCP1, which have roles in allergic inflammation [59,60,61]. While several inflammatory markers have known association with T2D and the metabolic syndrome [62,63,64,65,66], an earlier study by Cappuccio and Miller [67] has also indicated ethnic differences in the level of circulating inflammatory markers which may be partially related to demographic, lifestyle, or genetic or gut microbiome factors. On the one hand, our observations suggest a state of proinflammation as early as in prediabetes. On the other hand, the observed characteristic pattern of inflammatory markers in the Indian cohort probably indicates prevalence of systemic and chronic intestinal inflammation at an overall population level. Higher levels of IL23, TNFα, and LBP have been reported to be associated with intestinal inflammation as well as systemic inflammation triggered by LPS and other bacterial products [68,69,70]. Recent studies also imply a role of IL-23/IL-17 pathway alterations in several disease states including T2D [71, 72] and our study supports the existence of these alterations as early as in prediabetes. In this context, the higher IAP values in Danish prediabetic subjects were in a subtle contrast with earlier reports on the role of IAP deficiency in metabolic syndrome [73], but this could reflect a mounting adaptive response to inflammation.

Comparing the Indian and Danish gut microbiota based on alpha diversity measures indicated higher diversity in the Danish cohort. This observation seems intriguing in context of earlier studies reporting higher alpha diversity of gut microbiota in many non-western populations [74,75,76]. However, reduced gut-microbial diversity is known to be associated with systemic inflammation [77, 78], and a relatively lower gut microbial diversity in the Indian subjects may be related to the observed levels of inflammatory biomarkers. Analyses investigating beta diversity, to some extent, echoed earlier findings pertaining to gut microbiota of Indian subjects, wherein the phylum Actinobacteria, and families Prevotellaceae, Veillonellaceae, and Streptococcaceae, were enriched, when compared to Americans [18]. In contrast, the Danish gut microbiota, profiled in the current study, was quite similar to that of the Americans and harbored a relatively larger proportion of microbes belonging to families like Ruminococcaceae, Bacteroidaceae, and Rikenellaceae. The observed distribution of core OTUs are also in line with our expectations pertaining to the characteristic features of Indian and Danish microbiota, such as a higher number of Bacteroides OTUs in the Danish samples and ubiquitous presence of Prevotella OTUs in the Indian samples [79,80,81]. The presence of a Megasphaera OTU in the Indian core set also concurs with observations made in recent Indian studies [18, 82].

Despite the strong country effect on the gut microbiota, certain taxonomic groups associated with prediabetes could be identified when the microbiome data from India and Denmark were pooled together. Additional taxonomic groups could also be identified when the microbiome data from the two countries were analyzed separately. Both the Danish and Indian normoglycemic subjects were enriched with multiple OTUs from the Prevotella9 group as well as those belonging to the family Ruminococcaceae. A depletion of the butyrate producing family Ruminococcaceae has been reported earlier in Indian T2D subjects [22], as well as in Finnish prediabetic subjects [83]. On the other hand, the enrichment of OTUs belonging to pathogenic genera like Vibrio and Streptococcus in subjects with prediabetes was interesting, given the role of inflammation in diabetes. A recent study on Danish individuals with prediabetes has indicated significant enrichment of the genus Streptococcus and has suggested that the associated gut microbial alterations may be a signature of low-grade inflammation [30]. Enrichment of certain Blautia OTUs and depletion of bacteria belonging to Clostridialesvadin BB60 family noted in prediabetic subjects enrolled in the current study also appears to be coherent with the earlier observations pertaining to gut microbiota associated with Danish prediabetic subjects. However, the observation made in the earlier study pertaining to depletion of Akkermansia muciniphila in gut microbiota of Danish prediabetic individuals was not apparent in the current study population. An increased abundance of the genera Lactobacillus in prediabetic subjects, which was more prominent in the Indian population, could be correlated with earlier reports mentioning the genus’ association with T2D [22]. On the other hand, significant abundance of Megasphaera OTU(s) in Indian prediabetic subjects is a novel observation and particularly intriguing. Although Megasphaera has been reported to be a core gut microbe in the Indian population [18], its association with impaired glucose tolerance has not been reported earlier in any country or ethnicity. Multiple Megasphaera OTUs identified in the samples from Indian prediabetic subjects also exhibited significant positive correlations with fasting plasma glucose and HbA1c levels, and weak negative correlations with HDL cholesterol and inflammation markers like TNFα and LBP. Apart from a couple of recent studies on the Indian gut microbiome [18, 22], Megasphaera has not been reported to be a prevalent gut microbial taxon, especially in Caucasians. However, its role in lactate fermentation as well as its positive association with Lactobacillus ruminis, especially in cases of intestinal malabsorption, or increased availability of dietary sugars in the large intestine, has been reported [84]. The association of Megasphaera with Indian prediabetic subjects assumes importance in this context. Two other OTUs associated with Indian prediabetic subjects, belonging to the genus Achromobacter and GKS98 freshwater group (both belonging to the family Burkholderiaceae), showed significant positive correlations with the inflammation biomarkers IL10 and IL17A. Interestingly, many members of the family Burkholderiaceae, e.g., the genus Achromobacter, are known to constitute the group of lymphoid tissue-resident commensal (LRC) bacteria, which colonize the intestinal lymphoid tissue of healthy mammals [85]. The LRCs play a major role in intestinal immunity and are known to induce anti-inflammatory interleukins like IL10, IL6, IL1β, and IL17a. Another interesting observation pertained to enriched abundances of OTUs belonging to Prevotella9 (which includes Prevotella copri) in both Indian and Danish normoglycemic subjects, and those belonging to Prevotella2 (which includes Prevotella stercorea) in Danish prediabetic subjects. A couple of Prevotella2 OTUs identified in the Danish population were also observed to exhibit modest negative correlations with HDL cholesterol. These observations probably reflect distinct roles of different Prevotella species in the gut and are in line with earlier findings indicating beneficial as well as pathogenic effects of members belonging to the genus Prevotella [86,87,88]. Microbiome composition is influenced by a multitude of factors, and while the current study did set out to find associations of microbial taxa with glycemic status, several of the measured covariates including physical/biochemical parameters of the subjects as well as the inflammation markers could have influenced the observed microbiome state. As discussed earlier, the phenotypic traits in normoglycemic and prediabetic subjects showed a deeper contrast in Danes than in Indians. Consequently, efforts towards identifying microbial association to dysglycemia in the Danish cohort, while correcting for the measured covariates using linear modeling, resulted in a fewer number of discriminating taxa between the normoglycemic and prediabetic gut microbiomes. In a sharp contrast though, correcting for covariates in the Indian cohort could fetch a higher number of discriminating taxonomic groups between the normoglycemic and prediabetic subjects. Literature suggests that microbiome signatures corresponding to different diseases and physiological conditions often overlap and can be a mixed effect from different host extrinsic and intrinsic factors [89]. The resultant microbiome shifts also are seldom unidirectional, with the microbiome sending feedback to the host, and in certain instances, modulating host factors. Given the limitations in identifying all possible underlying medical conditions as well as measuring all the potential confounders, confident assertions related to the disease-microbiome association (in this case with dysglycemia) remains difficult. Therefore, the lists of microbial taxa associated with the studied prediabetic and normoglycemic cohorts, both before and after correcting for the measured covariates, are presented in this report. It is likely that some of these observed associations, despite being statistically significant, may not be a direct outcome of glycemic status and may be related to associated comorbidities.

Functional potential of gut microbiomes inferred from 16S taxonomic profiles may not provide an estimate as accurate as that obtained with shotgun metagenomics or metatranscriptomics data. However, certain observations made in our study related to estimated enrichment of tyrosine metabolism, xenobiotic degradation, and ascorbate and aldarate metabolism in gut microbiota associated with prediabetes could be placed in context of earlier observations related to dysglycemia. Higher tyrosine levels have been associated with the risk of T2D [90]. A previous study has highlighted a higher proportion of bacterial genes related to xenobiotic degradation pathways harbored by the gut microbiome of Chinese subjects with T2D [6]. Gut bacteria of leptin-deficient transgenic mice with metabolic syndrome have been reported to show enrichment of ascorbate and aldarate metabolism [91]. Another interesting insight pertained to the inferred depletion of d-glutamine and d-glutamate metabolism, and enrichment of GABA metabolism functional modules in prediabetic gut microbiota. Previous studies in mice have indicated potential protective and regenerative effects of GABA on the islet beta cells [54], as well as the role of microbiota in modulating GABA and glutamate circuits [92]. Another study reported relatively higher GABA levels in subjects with T2D, and its possible impact on cognitive abilities [93]. Our observations hint at a probable association of the gut microbiota and GABA level modulation in early prediabetic stages. However, understanding the effects of this modulation with respect to insulin production or a progression to diabetic neuropathy requires further research.

The above observations, coupled with the results pertaining to phenotypic data as well as levels of inflammatory biomarkers, indicate that the role of gut microbiome in the pathophysiology of prediabetes in Indian subjects is different compared to that in Europeans. While chronic systemic inflammation appears to be characteristic of the Indian population in general, the observed anti-inflammatory and protective effects induced by various factors in the Indian gut microbiome appear to play key roles in defining gut-health status and modulating the onset and progression of diabetes.

Conclusions

In complex metabolic disorders, identifying biological signatures at the onset of disease is crucial to reduce or prevent the rapid progression of disease. The compositional and functional potential alterations of gut microbiota and proinflammation observed in prediabetic subjects in the present study is an important and significant advancement. In fact, the importance of sub-clinical detection of gut microbial biomarkers of obesity and T2D has recently been emphasized by several others [94]. Microbial abundance patterns and distinct levels of inflammatory markers identified in this study appear as robust sub-clinical signatures of prediabetes and may be explored further as potential early indicators for individuals at risk of dysglycemia.