Introduction

According to an analysis of global data through years 1990–2018, diabetes was prevalent in almost half a billion people, a number expected to rise by 25% and 51%, respectively, by 2030 and 2045 (Saeedi et al. 2019). Type 2 diabetes (T2D) accounts for over 90% of all diabetes cases (Zheng et al. 2018), affecting nearly 6.8% (537 million) of the world population in year 2021 (Sun et al. 2022). Current evidence suggests a complex and multifactorial etiology of T2D characterized by genetic and environmental interactions (Arroyo et al. 2021), although T2D pathogenesis is not yet fully elucidated. There are likely varying degrees of shared genetic origins in the pathogenesis of T2D and other diabetes phenotypes such as type 1 diabetes (T1D) (Aylward et al. 2018), latent autoimmune diabetes in adults (Basile et al. 2014), and maturity-onset diabetes of the young (MODY) (Bonnefond et al. 2020). Recent studies also support the deconstruction of T2D heterogeneity to define T2D sub-types (Udler et al. 2018) and the delineation of a continuum of diabetes sub-types (Flannick et al. 2016) instead of the status quo characterized by a few distinct diabetes phenotypes. Understanding the genetic basis of T2D is fundamental to precision medicine approaches striving for impeccable matching and an individualized level of T2D care (Prasad and Groop 2019).

Cardinal tissues of the body impacted by heightened insulin resistance and diminished insulin secretion in T2D include the pancreas, liver, skeletal muscle, and adipose tissue (Batista et al. 2021). Exploration of gene perturbations in these tissues can deepen our understanding of the molecular etio-pathology of T2D. Highly up- and downregulated genes expressed consistently across different tissue types may uncover potential genome-wide biomarkers or “gene signatures,” which are integral to achieving precision diagnostic, prognostic, monitoring, and treatment approaches. Topologically, hub genes are defined as highly and tightly connected nodes in typically scale-free gene regulatory networks (GRN) (Buchberger et al. 2021). As such, criteria such as high correlation in candidate modules (Liu et al. 2019a) and above-average betweenness, closeness, and degree (Liu et al. 2019b) in GRN have been used to demarcate hub genes in previous studies. Functionally, they perform critical regulatory roles in biological processes interacting with many other genes in associated pathways. Given their crucial structural and functional characteristics, hub genes are highly sought-after in precision medicine approaches as plausible niches for developing drug and treatment targets (Liu et al. 2021). In this context, the importance of identifying highly perturbed genes and hub genes associated with T2D for the purpose of individualizing T2D care is unequivocal.

Downstream analyses of gene sets provide invaluable insights into associated core biological functions, pathways, diseases, drugs, and many other aspects. Frequently used gene ontology (GO) analysis provides evidence as a snapshot of contemporary biological knowledge related to a given gene including its function at the molecular level, the cellular location(s) it functions at, and the biological processes reliant on it (Hill et al. 2008). Pathway analyses such as Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto 2000) entail derivation of coherent and meaningful biological phenomena attributable to input genes (Nguyen et al. 2019). Additional downstream analyses of diseases, drugs, metabolomes, and tissue enrichment are also available and can provide valuable insights into associated genes. Taken together, downstream analysis of highly perturbed and hub genes associated with T2D may render valuable information on aspects such as affected biological processes, dysregulated pathways, related diseases, and metabolomic biomarkers.

Advances in high-throughput technologies have generated a wealth of gene expression data, while the availability of open-source platforms such as the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) (Edgar et al. 2002; Barrett et al. 2013) and simultaneous advent in big data and bioinformatics analytic tools such as microarray and RNA-seq meta-analysis and gene–gene interaction network analysis strategies have offered unprecedented opportunities for high-level evidence synthesis from a multitude of gene expression datasets. Such approaches are likely to render new knowledge on complex diseases like T2D and acquire adequate statistical power to identify genes associated with a disease that may not have been evoked via prior analysis of a single or a few datasets.

Yet, to date, no comprehensive evidence synthesis study has been performed to identify highly perturbed genes and hub genes associated with T2D in human adults using an extensive bioinformatics analytic pipeline. Prior studies were limited to identifying hub genes in a few (n = 3) microarrays from a single tissue type (pancreatic islets) (Lin et al. 2020) or performing an ad hoc gene expression meta-analysis of all diabetes phenotypes (n = 13) (Mei et al. 2017). In this study, we aimed to identify highly perturbed genes and hub genes associated with T2D in different tissues of adult humans via a pre-defined and extensive bioinformatics analytic workflow consisting of systematic review, meta-analysis, identification and classification of differentially expressed genes (DEGs), network analysis, and downstream analysis.

Methods

The methodological approach consisted of five sequential steps: (1) systematic review of NCBI GEO expression data and related publications, (2) analysis of microarrays to identify DEGs, (3) meta-analysis of DEGs to identify highly perturbed genes in T2D, (4) network analysis of gene–gene/protein–protein interactions to identify hub genes in T2D, and (5) downstream analysis of highly perturbed genes and hub genes associated with T2D.

Systematic review

A preliminary search on the NCBI GEO database was first run on 1st February 2021 using a pre-defined search string: “(“diabetes mellitus, type 2” [MeSH Terms] OR type 2 diabetes[All Fields]) AND “Homo sapiens” [porgn] AND “Expression profiling by array” [Filter].” The resulting microarrays and related publications were further screened against pre-defined eligibility criteria. Microarrays of conditions other than T2D, all diabetes phenotypes other than T2D, and early dysglycemic conditions such as impaired glucose tolerance, insulin resistance, and prediabetes were excluded. Studies with non-human specimens, children, without healthy controls, or with notable comorbidity in control samples were also excluded. We also omitted studies involving long non-coding RNA (lncRNA), micro RNA (miRNA), samples subject to drug treatments and other interventions, pluripotent stem cells, xenografts, transfected or transgenic tissues, undifferentiated tissues, and sub-samples in super-series. Microarrays passing these eligibility criteria were selected for manual curation which were then further screened along with the full texts of related publications (where available) for the presence of adequate information such as clinical diagnosis (healthy vs T2D) and gene symbol/Entrez ID. Following this, all microarrays with sufficient information were selected for subsequent analyses.

Identification of DEGs

All microarrays selected from the systematic review were imported to R using getGEO function of GEOquery package (Davis and Meltzer 2007). In each expression set, phenoData component was examined to determine the number of eligible samples and confirm the presence of outcome (T2D vs controls) variable, while featureData component was explored to verify the presence of gene annotation information. Where required, non-normalized gene expression matrices were log2 transformed in order to alleviate skewness and create symmetric distributions (Le et al. 2020). Samples were assessed for the presence of any batch effects between the two groups by running principal components analysis on transposed expression matrices and were rectified using removeBatchEffect function in limma package (Ritchie et al. 2015). Relevant features were annotated with expression matrices to generate curated data for running differential gene expression analysis. Samples were ascribed to the relevant group (T2D/control) using model.matrix function of limma package (Ritchie et al. 2015) producing a binary design matrix. As the detection of DEGs can be enhanced by filtering genes with a low expression level, we assumed a median (50%) cut-off for the gene expression level. A uniform analytic pipeline consisting of the following sequential steps was applied to each microarray: (1) median expression levels were calculated, and those above the median were retained. (2) From the resulting genes, those expressed in more than two samples were retained, while the others were removed. (3) Model fitting was performed using lmFit function of limma (Ritchie et al. 2015) to enumerate expression levels of T2D and control groups. (4) Contrasts were defined as “T2D, control,” and empirical Bayes step was run to derive differential expression results. (5) The DEGs, defined as those with log2FC > 1 and Benjamini–Hochberg (BH)-adjusted p < 0.05 for upregulated genes and log2FC <  − 0.5 and BH-adjusted p < 0.05 for downregulated genes, were identified for each microarray.

Microarrays with no DEGs were excluded, and those with non-zero DEGs were visualized with a clustered bar chart. Furthermore, DEGs were classified by tissue types and visualized as a Venn diagram. Information on clinical and other features of the microarrays with non-zero DEGs, individual DEGs identified by each dataset, and tissue-based classification of DEGs were also summarized.

Meta-analysis of DEGs: identification of highly perturbed genes

In order to identify highly perturbed genes associated with T2D, we conducted meta-analysis of DEGs using MetaVolcanoR package (Prada et al. 2020). We implemented all 3 meta-analytic strategies incorporated in this package, namely, random effects model (REM), vote-counting approach (VC), and p value combining approach (CA).

In brief, REM synthesizes a summary fold-change of multiple microarrays based on variance, producing a summary p value which indicates the probability that the summary fold-change is not different to zero. The metathr parameter can be specified to filter the desired percentage of the top-most consistently perturbed genes. Gene perturbation is ranked as per the topconfects approach (Harrison et al. 2019). The VC algorithm produces highly perturbed genes according to user-specified p values and fold-change cut-off levels, taking into account both the number of studies in which a DEG appeared and its gene fold-change sign consistency. Here also, metathr parameter can be defined to extract the required percentage of highly perturbed genes. Meta-synthesis of gene perturbation by CA algorithm is at the mean or median level along with p values derived by Fisher method. A required proportion of top-most DEGs can be identified by specifying metathr parameter with CA as well (Prada et al. 2020).

As required by the package, all microarray datasets with non-zero DEGs; each consisting of the columns gene name (symbol), fold-change (log2FC), and p value; and confidence intervals of the fold-change (CI.L and CI.R) were merged to build a list item. For all 3 meta-analytic models, metathr was set at 0.01. For VC, p value and absolute fold-change thresholds were set at 0.05 and 0, respectively.

Highly perturbed genes identified by each model as well as the compiled list of all highly perturbed genes were presented in tabular format. Volcano plots were drawn to illustrate the top-most perturbed genes identified by REM, VC, and CA methods. Inverse cumulative distribution of consistently differentially expressed genes as per VC was plotted to demonstrate the number of genes with perturbed expression in ≥ 1 studies. Detailed meta-analytic outputs from all 3 approaches and the highly perturbed genes identified by each method were also compiled.

Network analysis: identification of hub genes

The list of highly perturbed genes was fed into GENEMANIA (Warde-Farley et al. 2010) to determine the gene–gene interaction network. The interaction network formulated by the GENEMANIA gene function prediction program, based on the multiple association network integration algorithm (MANIA), incorporates a multitude of functional associations including co-expression, pathways, physical interactions, co-localization, genetic interactions, and protein domain similarity. It has been found more accurate and computationally efficient than other gene function prediction methods (Mostafavi et al. 2008; Peña-Castillo et al. 2008). The gene–gene interaction network was first constructed and visualized in GENEMANIA. Next, these interactions were imported to visualize the protein–protein interaction (PPI) network in STRING version 11.0 (Szklarczyk et al. 2019). We used the Centiscape application (Scardoni et al. 2009) in the Cytoscape software (Shannon et al. 2003) to analyze the PPI network and determine the hub nodes. After removing nodes with a connection number < 2, the network was visualized in Cytoscape. Hub genes were defined as those nodes in the network with betweenness, closeness, and degree higher than their mean values. A similar approach has been previously used to demarcate hub genes (Liu et al. 2019b). Topological features of the hub nodes and details of the PPI network derived by Centiscape were summarized.

Downstream analysis of highly perturbed genes and hub genes

Using Enrichr (Chen et al. 2013; Kuleshov et al. 2016; Xie et al. 2021) platform, we ran a series of downstream analyses for both highly perturbed genes and hub genes as outlined below:

  • Ontologies: GO Biological Process 2018 (Hill et al. 2008)

  • Pathways: KEGG 2019 Human (Kanehisa and Goto 2000)

  • Diseases/drugs: COVID-19-related gene sets

  • Cell types: GTEx tissue sample gene expression profiles up and GTEx tissue sample gene expression profiles down

  • Miscellaneous: HMDB metabolites

The Genotype-Tissue Expression (GTEx) portal contains tissue-specific gene expression and regulation data (GTEx Consortium 2013), whereas the Human Metabolome Database (HMDB) records human metabolomics data (Wishart et al. 2018).

Results

The bioinformatic analytic workflow is summarized in Online Resource 1.

Systematic review outputs

The preliminary search resulted in 178 eligible microarrays, while 45 of these were selected for manual curation. We selected 27 microarrays with sufficient information for subsequent analyses, the details of which are provided in Online Resource 2.

Differential gene expression analysis outputs

The number of DEGs identified by each dataset is shown in Table 1. There were 11 microarrays with no DEGs. In Fig. 1, microarrays with non-zero DEGs (n = 16) are visualized as a clustered bar chart. The identified DEGs belonged to four tissue types (i.e., circulatory; adipose; digestive; skeletal muscle) as visualized in the Venn diagram (Fig. 2). Clinical and other information of the 16 microarrays with non-zero DEGs is presented in Online Resource 3. Details of DEGs identified by each dataset are presented in Online Resource 4, while DEGs classified by tissue type are presented in Online Resource 5.

Table 1 Number of differentially expressed genes in type 2 diabetes identified from different tissues of adult humans (n = 27)
Fig. 1
figure 1

Clustered bar chart depicting the number of differentially expressed (up- and downregulated) genes in the 16 datasets

Fig. 2
figure 2

Venn diagram depicting the number of differentially expressed genes by the 4 tissue types

A significantly larger proportion of genes associated with T2D is downregulated

Of all the DEGs identified by different tissues (n = 6284), the proportion of downregulated genes (n = 4592) was significantly higher than the proportion of upregulated genes (n = 1692) (p < 0.00000001). At the level of tissue type, circulatory (255 downregulated vs 94 upregulated; p < 0.000001), adipose (1426 downregulated vs 1212 upregulated; p = 0.000017), and digestive (2664 downregulated vs 135 upregulated; p < 0.000001) tissues revealed a similar pattern, while the skeletal muscle (247 downregulated vs 251 upregulated; p = 0.44653) showed no such over-expression of downregulated genes. Of the 27 microarrays, 11 contained no DEGs, while the two datasets containing the highest number of DEGs were from pancreatic islets (GSE25724; 2164 DEGs) and visceral adipose tissue (GSE29231; 1655 DEGs) (Table 1, Fig. 1).

Tissue-specific DEGs are predominant compared to DEGs shared between tissues in T2D

Proportions of tissue-specific DEGs were significantly higher than the proportions shared with one or more other tissue types in circulatory (242 specific vs 89 non-specific; p < 0.000001), adipose (1989 specific vs 342 non-specific; p < 0.000001), and digestive (1875 specific vs 409 non-specific; p < 0.000001) groups, while skeletal muscle (244 specific vs 217 non-specific; p = 0.112959) had no such over-expression (Fig. 2).

Meta-analytic outputs

As shown in Table 2, the three meta-analytic algorithms REM, VC, and CA identified 49, 27, and 8 highly perturbed genes, respectively. The compiled list, after removing redundancies, comprised 79 highly perturbed genes. Volcano plots illustrating the top-most perturbed genes identified by REM, VC, and CA are given in Figs. 3, 4, 5, respectively. Figure 6 presents the inverse cumulative distribution of consistently differentially expressed genes as per VC, plotted to demonstrate the number of genes with perturbed expression in ≥ 1 studies. We present detailed meta-analytic outputs from all 3 approaches in Online Resource 6 and highly perturbed genes identified by each method in Online Resource 7.

Table 2 Highly perturbed genes (n = 79) associated with type 2 diabetes identified by meta-analysis of microarray (n = 16)
Fig. 3
figure 3

Highly perturbed genes (n = 49) identified by random effects model meta-analysis in MetaVolcanoR package with metathr set at 0.01. Consistently upregulated genes appear in red and consistently downregulated genes appear in blue

Fig. 4
figure 4

Highly perturbed genes (n = 27) identified by vote-counting approach meta-analysis in MetaVolcanoR package with metathr set at 0.01. Consistently upregulated genes appear in red and consistently downregulated genes appear in blue

Fig. 5
figure 5

Highly perturbed genes (n = 8) identified by p value combining approach meta-analysis in MetaVolcanoR package with metathr set at 0.01. Consistently upregulated genes appear in red and consistently downregulated genes appear in blue

Fig. 6
figure 6

Inverse cumulative distribution of the consistently differentially expressed genes as per vote-counting approach

Highly perturbed genes associated with T2D comprise both up- and downregulated genes

There was no significant difference (p = 0.410983) between the proportions of up- (38/79) and downregulated (41/79) genes that constituted the highly perturbed gene set (n = 79) identified by all 3 meta-analytic algorithms (Table 2, Figs. 3, 4, 5, 6). The 38 upregulated genes were ALDOB; BCL3; CNTFR; CNTNAP2; CRTAC1; DYRK3; EGR2; ELFN1; ERAP2; ESPNL; ISLR; LOC100008589; LOC644422; LOC649456; MCL1; MGRN1; MMP9; NPTX2; PCOLCE2; PHACTR3; PHLDA1; PI3; POMZP3; PRIMA1; PVRL2; RASL11B; RNF19B; SCN1B; SLC9A3R2; SOD3; TAPBP; TNFAIP6; U2AF2; VNN2; XYLT1; ZBTB16; ZNF423; and ZNF75. The 41 downregulated genes were API5; APOL4; ARG2; ASCL2; C14orf132; C19orf33; COG2; CTSC; DHRS2; DYRK2; ENPP2; ENTPD3; FUT11; HADH; HLA-DRB4; HLA-DRB5; IAPP; IFNA7; KIAA1279; KIAA1984; LARP4; LOC389286; LOC650885; LOC731682; MARK1; MCOLN3; MTRR; NAALAD2; NMNAT2; OR8B12; PAAF1; POPDC3; PPM1K; PPP1R1A; RPL14; SLC2A2; SNAP25; STMN2; TAP2; TMEM37; and UGT2B7.

Network analysis outputs

Network analysis identified 28 hub genes, the topological features of which are summarized in Table 3. Figure 7 presents the gene–gene interaction network produced on GENEMANIA, and details of this network are given in Online Resource 8. The PPI network visualized on STRING version 11.0 (Szklarczyk et al. 2019) is given in Fig. 8. The network created by Cytoscape is provided in Fig. 9. Details of the PPI network derived by Centiscape are provided in Online Resource 9.

Table 3 Topological characteristics of the hub genes (n = 28) identified via network analyses
Fig. 7
figure 7

Gene–gene interactions network visualized in GENEMANIA

Fig. 8
figure 8

Protein–protein interactions network visualized in STRING

Fig. 9
figure 9

Two different presentations of the protein–protein interactions network in Centiscape

Hub genes associated with T2D also comprise both up- and downregulated genes

The 28 hub genes consisted of 13 upregulated, 9 downregulated, and 6 predicted genes. There was no significant difference (p = 0.261216) between the proportions of up- (13/22) and downregulated (9/22) genes constituting the hub gene set. The 13 upregulated hub genes were CNTFR; CNTNAP2; ISLR; MCL1; MMP9; PHLDA1; RASL11B; SCN1B; SOD3; TNFAIP6; XYLT1; ZBTB16; and ZNF423. The 9 downregulated genes were CTSC; ENPP2; HADH; PPP1R1A; RPL14; SLC2A2; SNAP25; STMN2; and UGT2B7. The 6 hub genes predicted by GENEMANIA were ARG1; CD226; DPEP1; PPP1R15A; TAP1; and ZMIZ1 (Table 3).

Downstream functional analysis outputs

Findings from downstream analyses of highly perturbed genes are summarized in Tables 4, 5, 6 and illustrated in Figs. 10, 11, 12 with details in Online Resource 10. Results from downstream analyses of hub genes are presented in Tables 7, 8, 9 and visualized in Figs. 13, 14, 15 with details in Online Resource 11.

Table 4 Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: GO biological processes and KEGG pathways
Table 5 Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: COVID-19-related gene sets and HMDB metabolites
Table 6 Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: GTEx tissue sample gene expression profiles
Fig. 10
figure 10

Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: a GO biological processes, b KEGG pathways

Fig. 11
figure 11

Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: a COVID-19 related gene sets, b HMDB metabolites

Fig. 12
figure 12

Downstream analyses of highly perturbed genes (n = 79) associated with type 2 diabetes in different tissues of human adults: a GTEx tissue sample gene expression profiles down, b GTEx tissue sample gene expression profiles up

Table 7 Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: GO biological processes and KEGG pathways
Table 8 Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: COVID-19-related gene sets and HMDB metabolites
Table 9 Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: GTEx tissue sample gene expression profiles
Fig. 13
figure 13

Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: a GO biological processes, b KEGG pathways

Fig. 14
figure 14

Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: a COVID-19-related gene sets, b HMDB metabolites

Fig. 15
figure 15

Downstream analyses of hub genes (n = 28) associated with type 2 diabetes in different tissues of human adults: a GTEx tissue sample gene expression profiles down, b GTEx tissue sample gene expression profiles up

Enriched biological processes associated with highly perturbed genes and hub genes of T2D

Ontology analysis of highly perturbed genes identified 89 enriched biological processes (Online Resource 10). The top 10 enriched biological processes were intrinsic apoptotic signaling pathway in response to DNA damage (GO:0,008,630); intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator (GO:0,042,771); antigen processing and presentation of peptide antigen via MHC class I (GO:0,002,474); cellular response to oxidative stress (GO:0,034,599); plasma membrane bounded cell projection organization (GO:0,120,036); intrinsic apoptotic signaling pathway by p53 class mediator (GO:0,072,332); myeloid cell differentiation (GO:0,030,099); nicotinamide nucleotide metabolic process (GO:0,046,496); regulation of intrinsic apoptotic signaling pathway (GO:2,001,242); and myeloid leukocyte differentiation (GO:0,002,573). Three of these, antigen processing and presentation of peptide antigen via MHC class I (GO:0,002,474); myeloid cell differentiation (GO:0,030,099), and regulation of intrinsic apoptotic signaling pathway (GO:2,001,242), were associated with upregulated genes in T2D (Table 4).

Ontological analysis of hub genes identified 179 enriched biological processes (Online Resource 11). Top 10 enriched biological processes were neutrophil degranulation (GO:0,043,312); neutrophil activation involved in immune response (GO:0,002,283); neutrophil-mediated immunity (GO:0,002,446); regulation of intrinsic apoptotic signaling pathway (GO:2,001,242); negative regulation of intrinsic apoptotic signaling pathway (GO:2,001,243); positive regulation of cell projection organization (GO:0,031,346); cellular response to reactive oxygen species (GO:0,034,614); negative regulation of apoptotic signaling pathway (GO:2,001,234); negative regulation of cysteine-type endopeptidase activity involved in apoptotic process (GO:0,043,154); and regulation of peptide hormone secretion (GO:0,090,276). Of these, three (regulation of intrinsic apoptotic signaling pathway (GO:2,001,242); negative regulation of intrinsic apoptotic signaling pathway (GO:2,001,243); negative regulation of apoptotic signaling pathway (GO:2,001,234)) were associated with upregulated genes in T2D, whereas one (regulation of peptide hormone secretion (GO:0,090,276)) was associated with downregulated genes in T2D (Table 6).

Enriched KEGG pathways associated with highly perturbed genes and hub genes of T2D

As per KEGG analysis of highly perturbed genes, 21 pathways were enriched (Online Resource 10). The top 10 pathways were Epstein–Barr virus infection; antigen processing and presentation; autoimmune thyroid disease; maturity-onset diabetes of the young; asthma; allograft rejection; graft-versus-host disease; type 1 diabetes mellitus; intestinal immune network for IgA production; and cell adhesion molecules (CAMs). Seven of these (autoimmune thyroid disease; maturity-onset diabetes of the young; asthma; allograft rejection; graft-versus-host disease; type 1 diabetes mellitus; intestinal immune network for IgA production) were associated with downregulated genes in T2D (Table 4).

According to KEGG analysis of hub genes, 12 pathways were enriched (Online Resource 11). The top 10 pathways were insulin secretion; apoptosis; adrenergic signaling in cardiomyocytes; cell adhesion molecules (CAMs); JAK-STAT signaling pathway; transcriptional mis-regulation in cancer; arginine biosynthesis; maturity-onset diabetes of the young; ascorbate and aldarate metabolism; and fatty acid elongation. Four of these (insulin secretion; maturity-onset diabetes of the young; ascorbate and aldarate metabolism; fatty acid elongation) were associated with downregulated genes in T2D, while two pathways (JAK-STAT signaling pathway and transcriptional mis-regulation in cancer) were associated with upregulated genes in T2D (Table 6).

COVID-19 related gene sets associated with highly perturbed genes and hub genes of T2D

Downstream analyses revealed 20 COVID-19-related gene sets associated with the highly perturbed genes of T2D (Online Resource 10), and the top 10 of these are visualized in Fig. 11a. There were 23 COVID-19-related gene sets associated with the hub genes of T2D (Online Resource 11), the top 10 of which are visualized in Fig. 14a.

HMDB metabolites associated with highly perturbed genes and hub genes of T2D

Four HMDB metabolites (zinc (HMDB01303), manganese (HMDB01333), magnesium (HMDB00547), C10H13N2O7P (HMDB01570)) were associated with the highly perturbed genes of T2D (Online Resource 10). Of these, Zinc (HMDB01303) was associated with upregulated genes in T2D, while C10H13N2O7P (HMDB01570) was associated with downregulated genes in T2D (Table 5).

There were 45 HMDB metabolites associated with the hub genes of T2D (Online Resource 11). The top 10 metabolites were zinc (HMDB01303); ethyl glucuronide (HMDB10325); 3-acetoacetyl-CoA (HMDB01484); (S)-methylmalonate semialdehyde (HMDB02217); C18H31NO14S (HMDB00632); ornithine (HMDB00214); 17beta-estradiol glucuronide (HMDB10317); (3alpha,5beta,20S)-20-hydroxypregnan-3-yl beta-D-glucopyranosiduronic acid (HMDB10318); 3,17-androstanediol glucuronide (HMDB10321); and 17alpha-estradiol-3-glucuronide (HMDB10322). Seven of these (Ethyl glucuronide (HMDB10325); 3-acetoacetyl-CoA (HMDB01484); (S)-methylmalonate semialdehyde (HMDB02217); 17beta-estradiol glucuronide (HMDB10317); (3alpha,5beta,20S)-20-hydroxypregnan-3-yl beta-D-glucopyranosiduronic acid (HMDB10318); 3,17-androstanediol glucuronide (HMDB10321); and 17alpha-estradiol-3-glucuronide (HMDB10322)) were associated with downregulated genes in T2D. One metabolite (C18H31NO14S (HMDB00632)) was associated with upregulated genes of T2D (Table 8).

Gene expression in different cell types associated with highly perturbed genes and hub genes of T2D

There were 168 downregulated GTEx profiles (Online Resource 10) associated with highly perturbed genes of T2D, the top 10 of which consisted of one brain, two thyroids, and seven blood tissue samples (Table 6 and Fig. 12). There were 77 upregulated GTEx profiles (Online Resource 10) associated with highly perturbed genes of T2D, the top 10 of which consisted of six adipose and one each of lung, breast, heart, and blood vessel tissue samples (Table 6 and Fig. 12).

There were 223 downregulated GTEx profiles (Online Resource 11) associated with hub genes of T2D, the top 10 of which consisted of five nervous system (three brain, one nerve, one pituitary), three blood, one breast, and one bladder tissue samples (Table 9 and Fig. 15). There were 178 upregulated GTEx profiles (Online Resource 11) associated with hub genes of T2D, the top 10 of which consisted of six adipose, two blood, and two skin tissue samples (Table 9 and Fig. 15).

Discussion

In this study, we identified highly perturbed genes and hub genes associated with T2D in different tissues of adult humans, via an extensive bioinformatics analytic workflow. Downstream analyses revealed valuable insights on T2D pathogenesis, including associations with other diabetes phenotypes and COVID-19 and patterns of tissue-specific and tissue non-specific differential gene expression as well as pathophysiological manifestations such as those related to insulin action, immunity, and apoptosis. Salient findings of the study which contribute towards the understanding of the genetic basis of T2D are further discussed below. The comprehensive evidence synthesis approach with open-source gene expression data exemplified in this study could be replicated to gain high-level evidence synthesis for other clinical conditions.

Patterns of differential gene expression in T2D

Our findings indicate that T2D seems rather a disorder of gene downregulation than upregulation, when the whole genome is considered. This is consistent with previous studies where a preponderance of gene downregulation was associated with T2D (Takematsu et al. 2020; Palsgaard et al. 2009). Also, hyperglycemia-induced global downregulation of gene expression in adipose and skeletal muscle tissues have been documented previously (Meugnier et al. 2007). A similar pattern has been observed in T1D (Yip et al. 2020) and other endocrine disorders such as polycystic ovary syndrome (Idicula-Thomas et al. 2020). In contrast, highly perturbed genes and hub genes associated with T2D, which might together constitute the candidate gene set critical for pathogenicity of T2D, were found to contain both up- and downregulated genes. This presentation suggests a more complex dysregulation at the crux of the GRN of T2D, involving actions and interactions between both repressed and augmented genes.

Tissue-specific and tissue non-specific DEGs associated with T2D

Results of the present study indicate the predominance of tissue-specific DEGs in T2D. This supports the use of target tissue gene expression analysis as a viable avenue for identifying tissue-specific T2D biomarkers. A previous analysis integrating multiple tissue transcriptomics and PPI data to explore molecular biomarkers of T2D confirmed the presence of common signatures (Calimlioglu et al. 2015). We also observed common DEGs across different tissue types which can act as confluent molecular signatures of T2D. Identification of tissue-specific and non-specific molecular gene signatures of T2D facilitates downstream exploration of key pathways amenable to therapeutic targeting and drug repurposing efforts.

Shared gene enrichment across diabetes phenotypes

As revealed by KEGG pathway analyses, both MODY and T1D were enriched pathways associated with highly perturbed genes of T2D, while MODY was also an enriched pathway associated with hub genes of T2D. Specifically, SLC2A2 (GLUT2), and IAPP genes were commonly enriched in MODY, while HLA-DRB4 and HLA-DRB5 genes were underlying the enrichment with T1D. A gene expression meta-analysis also revealed the existence of possible pleiotropic mechanisms manifest via common gene signatures (PGRMC1 and HADH) across different diabetes phenotypes (Mei et al. 2017).

Downregulation of SLC2A2 is associated with not only T2D (Solimena et al. 2018) but also neonatal diabetes (Sansbury et al. 2012) and early childhood diabetes (Alhaidan et al. 2020), suggesting a likely role in insulin secretion. Amylin (IAPP), a gluco-modulatory hormone co-expressed with insulin by pancreatic β cells, is downregulated in both T1D and advanced T2D (Abedini et al. 2013), while amylin agonists are considered novel therapeutic agents for treating diabetes (Sonne et al. 2021). Moreover, human amylin plays a protective role against autoimmune diabetes inducing CD4 + Foxp3 + regulatory T cells (Zhang et al. 2018). Downregulation of HLA-DRB4 in peripheral blood mononuclear cells is associated with T2D as well as dyslipidemia and periodontitis (Corbi et al. 2020), while a meta-analysis revealed that the lack of HLA-DRB5 increased T2D risk (Jacobi et al. 2020). Of note, both HLA-DRB4 and HLA-DRB5 are associated with β cell autoantibodies and T1D (Zhao et al. 2016), with previous studies reporting that both T1D and T2D share HLA class II locus components (Jacobi et al. 2020). Interestingly, two of the hub genes of T2D found in our study (MMP9, ARG1) have been found as hub genes of T1D in a previous analysis (Yang et al. 2020). Collectively, these findings support some degree of shared genetic architecture between T2D and other diabetes pathologies.

T2D as a disorder of insulin secretion and action

Downstream analyses provided insights into the characterization of T2D as a disorder of insulin secretion and action. We found that insulin secretion was the most significant KEGG pathway associated with hub genes, whereby two downregulated hub genes (SLC2A2, SNAP25) in T2D were underlying this enrichment. Zinc, the most significant HMDB metabolite associated with both highly perturbed genes and hub genes of T2D, is an essential element with key regulatory roles in insulin synthesis, storage, and secretion (Kim and Lee 2012). Other metabolomic markers associated with highly perturbed genes included magnesium which is necessary for insulin signaling (Piuri et al. 2021) as well as manganese which is involved in insulin synthesis and secretion (Chen et al. 2018). Together, these findings underscore the effects on insulin production and action as pivotal to T2D pathogenesis.

Pathophysiological manifestations of T2D

Apoptosis

Downstream analyses revealed that multiple GO and KEGG pathways associated with apoptosis, including intrinsic apoptotic signaling pathway, were enriched in T2D. It is known that hyperglycemia-induced β cell apoptosis, a hallmark in T2D progression, occurs via intrinsic pathways causing reduced islet mass and metabolic abnormalities (Wali et al. 2013). Hyperglycemia-induced apoptosis has been reported to occur in other sites such as renal cells (Jung et al. 2012) and coronary arteries (Kageyama et al. 2011), indicating a possible role in disease progression and the onset of complications.

Immunity

Downstream analyses also revealed that multiple immunity-related GO and KEGG pathways, encompassing both innate and humoral immune responses, were enriched in T2D. These included multiple ontologies involving neutrophils and antigen processing and presentation as well as severe immune reactions such as graft-versus-host disease and allograft rejection. Impaired immunity in T2D and consequent susceptibility to infections and complications is frequently observed (Berbudi et al. 2020). A deeper understanding of the genomics underlying impaired immunity in T2D might provide opportunities to personalize the management of comorbidities and pharmacotherapy.

COVID-19 and T2D

Epidemiological studies strongly suggest poorer prognosis of COVID-19 among people with T2D (Selvin and Juraschek, 2020), although underlying mechanisms are not well-understood (Apicella et al. 2020). Downstream analysis of highly perturbed genes and hub genes of T2D in the present study revealed a large number of enriched COVID-19-related gene sets, providing support for this putative link at a more granular level.

Over- and under-expression of genes in different tissues associated with T2D

Downstream analysis of GTEx profiles identified tissues that are likely to demonstrate under- and over-expression of DEGs associated with T2D. Findings indicate that adipose tissue tends to over-express marker genes of T2D, while these might be under-expressed in other tissues such as those of the nervous system. These findings have implications for biomarker discovery and can guide further research on tissues which should be explored for identifying DEGs.

Conclusions

Findings of this study contribute towards the understanding of the genetic basis of T2D, and further research is warranted to substantiate the molecular mechanisms underlying these findings which is fundamental to establishing precision T2D medicine initiatives. The proposed bioinformatics pipeline may have broader use as a judicious strategy to identify gene perturbations and pathophysiological mechanisms of other clinical conditions beyond T2D which ought to be validated in future research. Finally, this study describes an exemplary approach to applying comprehensive evidence synthesis using existing open-source gene expression data. Other researchers are encouraged to apply this methodology to obtain high-level evidence from existing multiple datasets, thereby getting the most value from existing bioinformatics sources.