Introduction

The comorbidity of coronary artery disease and rheumatoid arthritis (CAD&RA) can lead to higher mortality rates than those of independent diseases [1], but the biological mechanisms connecting the two remain unclear. Patients with rheumatoid arthritis (RA) have a markedly higher incidence and mortality of cardiovascular disease than general population [2, 3]. Patients with RA accompanied with coronary artery disease (CAD) and revascularization by percutaneous coronary intervention (PCI) are significantly correlated with a higher risk of long-term, major adverse cardiac events [4]. Meanwhile, RA has been considered an independent risk factor for CAD development [5, 6]. Patients with RA have a larger coronary plaque and inflammation burden compared to patients without RA [7,8,9]. Some CAD-related risk factors, such as dyslipidemia, type 2 diabetes mellitus, hypertension, could also contribute to the CAD risk for patients with RA [10, 11]. Drugs such as corticosteroids, which are utilized for treating RA, might increase cardiovascular risk factors and aggravate heart diseases [12].

The underlying mechanisms of CAD-associated progression of RA are not fully elucidated. Studies found that the high mortality of CAD&RA is due to endothelial dysfunction and the circulating acute phase reactants such as C-reactive proteins [13, 14]. Inflammation can promote coronary atherosclerosis and induce coronary microvascular dysfunction in patients with RA, leading to an inadequate supply of myocardial oxygen, with the primary incipient procedures for the two changes being endothelial dysfunction and immune system dysregulation [15, 16]. Neutrophil activation-related genes of S100A8 and S100A12 are under investigation as therapeutic targets for both RA and CAD, hinting at the common pathogenic mechanisms of CAD&RA [17].

To discover the complex pathological mechanisms of CAD and RA, the conventional single target paradigm is not enough to illuminate the molecular basis of CAD&RA, and a novel systematic paradigm is urgently required. Rather than a simple view of the disease due to individual genomic variations, it requires network perspectives to understand the complex phenome-genome relationships of diseases and their comorbidities. Network medicine is thought to be capable of uncovering complex disease relationships using disease modules and network-based approaches, which may help to discover the shared biological mechanisms of associated diseases [18, 19]. A complex disease is rarely the direct consequence of a single gene alternation; rather, it is the result of the interaction of multiple molecular processes. Disease genes usually interact with each other and form closely connected subgraphs, i.e. disease modules, which play important roles in disease–disease relationships [20]. The identification of precise disease modules may help us understand the molecular interactions of complex diseases. Understanding comorbidities can also help physicians evaluate disease progression and improve treatment. Disease-related genes have been used to assess the similarity between different diseases [21, 22]. Accordingly, module-based strategies rather than single gene and targeted strategies are becoming increasingly important for revealing the relationship between multiple gene interactions and disease mechanisms [23, 24].

In this study, the gene expression array profile of CAD patients with and without RA was used to construct gene co-expression networks by weighted gene co-expression network analysis (WGCNA). Network modularization analyses were performed to identify the characteristic modules and susceptibility hub genes for CAD&RA to reveal the potential molecular mechanisms of the comorbid presence of CAD and RA. The workflow is shown in Fig. S1.

Materials and methods

Gene expression profile data and differentially expressed genes analysis

The CAD and CAD&RA datasets GSE110008 were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). The datasets included eight CAD&RA and eight control CAD samples according to the analysis of biopsies of the ascending aorta, and the platform was Affymetrix Human Genome U133A 2.0 Array (HG-U133A_2). The primary data was annotated to form an expression matrix, each probe was matched to their homologous gene symbols, and the repeated gene symbols in the matrix were excluded.

Differentially expressed genes (DEGs) between CAD&RA and CAD patients were identified using R (version 4.1.1) limma package. Genes with a false discovery rate (FDR) adjusted to p < 0.05 were considered as DEGs. Then, the DEGs were compared with the CAD-related and RA-related genes. To obtain CAD-related and RA-related genes, data were retrieved using the key words “coronary artery disease” and “rheumatoid arthritis” in the HPO (https://hpo.jax.org/app/), OMIM (https://omim.org/) and dbSNP (https://www.ncbi.nlm.nih.gov/snp/) databases during October, 2021.

WGCNA network construction and clinical traits analysis

The R package WGCNA was applied to construct the DEG co-expression network. The DEG dataset was checked through the goodSamplesGenes step in WGCNA [25] to remove unqualified genes which do not qualify for inclusion because of missing values in multiple samples. The co-expression network of DEGs was constructed using appropriate soft-threshold β. Topological overlap measure (TOM) and Dynamic Hybrid Tree Cut algorithm were used to perform hierarchical clustering and partition the branches of dendrogram as a module with the following parameters (minModuleSize = 3, mergeCutHeight = 0.25 and verbose = 3). Then, the correlation coefficient between the expression level of each module and the different disease traits was analyzed.

Analysis of module preservation using Zsummary statistic

To quantitatively analyze whether modules significantly varied between different disease groups, Zsummary [26] statistic was calculated to screen the differentiated modules between CAD and CAD&RA. Modules with a Zsummary ≥ 2 were regarded as preserved common modules, and if a module had a Zsummary score < 2, it was defined as a differentiated characteristic module for CAD&RA. Each identified modules was visualized by the Cytoscape software (version 3.7.2) to display the overall gene relationships that were obtained within a module [27].

Functional enrichment analysis

Genes in the selected modules and all DEGs were respectively uploaded for functional enrichment analysis in the Metascape website (https://metascape.org/). The website is an open tool that helps the biomedical research community analyze gene/protein lists and make better data-driven decisions. A Gene Ontology (GO) function enrichment analysis and a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [28] were conducted to identify the function and pathways correlated with these modules (Count = 2; EASE = 0.01; and species = Homo sapiens). P < 0.05 was considered as the cutoff criterion. DiNGO [29] software was used to perform the HPO functional enrichment analysis.

Identification of hub genes

The module membership (MM) was defined as the correlation between a gene and a given module [30]. At the same time, the gene significance (GS) of the gene in the module represented the correlation of the gene with clinical traits [31]. Genes with GS ≥ 0.5 and MM ≥ 0.8 in the clinically related gene module networks were defined as hub genes. Then, the expression level of hub genes with higher GS and MM rank were compared between the CAD and CAD&RA groups. A typical t-test was conducted to compare the difference in the expression with a p < 0.05 to indicate statistical significance. To validate whether these hub genes can classify patients into CAD or CAD&RA, three models—logistic regression (LR) [32], K-nearest neighbor (KNN) [33] and support vector machine (SVM) [34] were applied. Moreover, we used two other data-driven methods to screen featured genes for CAD&RA, i.e. the homogeneity of variance test in machine learning and Chi-square test (χ2) in statistics. Two other methods, i.e. Sab [20] and shortest distance [35], were used to calculate the proximity between the screened genes and CAD/RA-related disease genes from network angel. Smaller Sab and shortest distance values indicate closer proximity between the gene and diseases.

Results

DEGs between CAD and CAD&RA

Among all the 22,215 genes in GSE110008, 278 DEGs (Fig. 1A, Table. S1) were ultimately obtained after the duplicate genes were removed. In the up-regulated genes, XIST, DEFA1, ACTA1, and FAM118A genes’ differential expression were the most significant, and DDX3Y, RPS4Y1, TXLNGY, and KDM5D genes’ differential expression were the most significant among the down-regulated genes.

CAD and RA-related genes were obtained from the HPO, OMIM and dbSNP databases (Table. S2). Nine DEGs overlapped with both CAD and RA-related genes, and 54, 33 and 485 overlapped genes were detected between DEGs and CAD-related genes, DEGs and RA-related genes, CAD-related and RA-related genes, respectively (Fig. 1B).

Fig. 1
figure a

DEGs between CAD and CAD&RA. (A) Volcano plot for the gene’ expression level of GSE110008. Purple dots represent the down-regulated genes (log2Fold Change<-1, p < 0.05), red dots represent up-regulated genes (log2Fold Change > 1, p < 0.05). (B) The overlapped genes between the DEGs and known CAD, RA-related genes

Co-expression modules identification and their correlation to clinical traits

The one-step network construction function in WGCNA was used to construct the gene co-expression network based on the DEGs of CAD&RA. According to the scale-free independence and mean connectivity of the gene matrix, the soft thresholding power β was set at 7. Therefore, 41 modules were obtained, ranging in size from 3 ~ 24 (Table. S3). The cluster dendrogram of module distribution and the coexpression network heatmap are shown in (Fig. 2A-B).

Fig. 2
figure b

Identification of gene co-expression networks, modules and the correlation with clinical traits. (A) Cluster dendrogram of 278 DEGs based on the topological overlap. Each branch of the cluster tree with a certain color represents a co-expression module. (B) Heatmap of the topological overlap matrix (TOM) among all 41 modules of DEGs. (C) Heatmap of module-trait relationships. Each row represents a module, and each column represents a trait. Each cell contains the corresponding correlation coefficient

To confirm the modules’ preservation and reproducibility, 1/2 and 3/4 of the samples were selected as testing sets and the Zsummary value [26] for each module was calculated. In the 1/2 sample testing set, 100% modules had a Zsummary score ≥ 0, 78.05% modules had a Zsummary score ≥ 2; In the 3/4 sample testing set, 100% modules with Zsummary score ≥ 0, and 92.68% modules with Zsummary score ≥ 2. These values demonstrate the robustness of our identified modules (Fig. S2).

To define the modules’ clinical characteristic, the correlation coefficient between modules’ expression and disease clinical traits was calculated. Overall, all the 41 modules, 35 modules (85.37%) had an absolute correlation coefficient to CAD or CAD&RA of over 0.6 (Fig. 2C). Among these modules, 17 modules were positively correlated with CAD&RA and negatively correlated with CAD. Conversely, 24 modules were positively correlated with CAD and negatively correlated with CAD&RA. Among the modules positively correlated with CAD&RA, the turquoise module had the largest corresponding correlation coefficient (0.77, p = 5 e − 04). For the modules negatively correlated with CAD&RA, the orange module (-0.71, p = 0.002), yellow module (-0.71, p = 0.002) and magenta module (-0.7, p = 0.002) had higher correlation coefficients (Fig. 2C).

The characteristic differentiated modules of CAD&RA identification

Judging from the Zsummary, the preserved modules (Zsummary ≥ 2) and differentiated modules (Zsummary < 2) for CAD&RA were identified (Fig. 3A). Finally, 13 modules were selected as the characteristic modules for CAD&RA, i.e., the blue, darkmagenta, lightcyan, lightgreen, lightsteelblue1, mediumpurple3, paleturquoise, plum1, royalblue, saddlebrown, skyblue, skyblue3 and yellowgreen module. In particular, the paleturquoise (OXSR1, ZNF141, CACNA1A, IL19) had a Zsummary less than 0 (Zsummary =-0.21), which represents the obvious differentiation between the two groups. All of the 13 selected modules are shown in Fig. 3(B-N). In addition to Zsummary, the overall expression values of 13 differentially expressed modules between CAD&RA and CAD were assessed using a t-test. We found that six modules have significant differences between the two groups (Fig. S3).

Fig. 3
figure c

The identified differentiated modules for CAD&RA. (A) Preservation analysis of defined modules using Zsummary. The x-axis represents module size; the y-axis represents the Zsummary value. Each labeled color represents a module. The dashed blue line indicates the threshold Zsummary = 2. (B-N) Networks of the 13 characteristic modules. The genes marked in yellow are hub genes

Functional and pathway enrichment analysis

GO function enrichment analysis and KEGG pathway enrichment analysis were performed in 13 differentiated modules and all DEGs respectively. In the GO functional enrichment analysis of all 278 DEGs, the top 20 terms were selected by the p value in each category. Thus, for biological processes (Fig. 4A), the genes were mainly enriched in cation homeostasis, positive regulation of cellular component movement, and nucleosome organization. Regarding the molecular functions (Fig. 4B), the genes were mainly enriched in cell adhesion molecule binding, chromatin binding and cAMP-dependent protein kinase activity. When it comes to cellular components (Fig. 4C), the genes were mainly enriched in lytic vacuole, distal axon and postsynapse. For the 13 differentiated modules, there were 14 enriched GO functions (Fig. 4D), which were mainly cAMP-dependent protein kinase activity, demethylase activity, and regulation of calcium ion import.

Fig. 4
figure d

The enriched GO terms and KEGG pathways. (A-C) Biological process, molecular function, cellular component in GO function for all 278 DEGs. (D) Enriched GO functions of 13 selected modules. On the left of each figure are the on-target numbers of the enriched genes in certain GO terms. (E) Enriched KEGG pathways of 278 DEGs

For the top 20 GO terms, some are common to multiple modules, and some are unique. For example, cAMP-dependent protein kinase activity and heart process are common terms of the blue and skyblue modules; the positive regulation of tyrosine phosphorylation of STAT protein and response to inorganic substance are common terms of yellow, plum3 and royalblue modules. In addition, the term of translation is unique to the blue module (Table. S4).

Two overlapped functions were found between the biological process of DEGs and the 13 modules, tissue migration and response to inorganic substances. Similarly, 2 enriched overlapping molecular functions of DEGs and 13 modules were found, which were cAMP-dependent protein kinase activity and demethylase activity.

In terms of KEGG pathway enrichment analysis, we found that the DEGs could enrich multiple pathways (Fig. 4E). The top three pathways were transcriptional misregulation in cancer, cell adhesion molecules, and mineral absorption. Furthermore, HPO functional enrichment analysis revealed that 13 differentially expressed modules were significantly enriched to 2 HPO terms. One of these terms was Y-linked inheritance and the other was gonosomal inheritance, which contains the genes: DDX3Y, KDM5D, CDKL5, USP9Y, ROM1, KDM6A. Moreover, among the 16 enriched pathways, 12 genes were enriched in transcriptional misregulation in cancer, accounting for 4.4%; 9 genes were enriched in the neuroactive ligand-receptor interaction pathway, accounting for 3.3%; and 8 genes were enriched in cell adhesion molecules pathway, accounting for 2.93% (Table. S5).

The identified hub genes for CAD&RA

The 13 selected modules contained 68 genes. With a GS over 0.5 and a MM over 0.8 as cut-off criteria, 49 genes were identified as hub genes (Table 1). Seven out of these 49 hub genes had a GS greater than 0.6 and MM greater than 0.9, i.e., POT1, ADO, ABCA11P, GALC, ZNF141, GPATCH8 and ATF6. Compared to the known CAD and RA-related genes, 9, 2 and 485 genes were found between CAD-related and hub genes, RA-related and hub genes, CAD-related and RA-related genes, respectively (Fig. 5A). Interestingly, the MIA3 overlapped hub genes were found to be related to both CAD and RA.

Based on their significance, the top five up-regulated genes were XIST, DEFA1, ACTA1, FAM118A and C10orf10, in which the XIST was also a hub gene. The top five down-regulated genes were EIF1AY, KDM5D, TXLNGY, RPS4Y1 and DDX3Y, and all of them were hub genes. The expression level of representative hub genes was significantly different between CAD and CAD&RA (Fig. 5C-I). The results showed that ZNF141 was down-regulated in CAD&RA, while other genes were up-regulated.

Table 1 Hub genes of the differentiated modules for CAD&RA (GS>0.5 and MM>0.8)
Fig. 5
figure e

(A) The overlapped genes among hub gene and known CAD, RA-related genes. (B) Gene expression difference significance ranking. X-axis represents the rank of DEGs, Y-axis represents log2FoldChange. (C-I) The expression level of representative hub genes between CAD and CAD&RA. (J) Classification ability of the three models based on the representative seven hub genes

Moreover, the area under the curve (AUC) values of the 7 hub genes under the three models were greater than 0.88 (Fig. 5J). Compared with the top 7 featured genes based on the homogeneity of variance test (TMX1, TCF7L2, CDC6, ZNF157, HIST3H3, COQ7 and CLDN18) and χ2 test (XIST, DDX3Y, TXLNGY, RPS4Y1, KDM5D, USP9Y and EIF1AY), our 7 susceptible genes (POT1, ADO, ABCA11P, MIA3, ZNF141, GPATCH8 and ATF6) based on GS/MM yielded optimized results in the three models, with AUC of 1.00, 1.00 and 0.88 for LR, KNN, and SVM, respectively (Table. S6), which indicated the excellent classification effect of 7 hub genes. In addition, our 7 hub genes had smaller Sab and shortest distance values than those of genes identified by other two methods, indicating the superiority of the susceptible genes identified by our modular-based analysis (Table S7).

Discussion

Considering the increased mortality for the comorbid presence of CAD and RA [1,2,3], it is essential to uncover the underlying mechanisms of CAD&RA. For the complexity of CAD and RA, a network modularization approach was used to identify the characteristic module and susceptibility gene for CAD&RA. Thus, 13 modules and 49 hub genes that were related with CAD&RA were screened, and further functional enrichment analysis revealed their potential mechanisms.

Among the identified hub genes, several were reportedly related to CAD or RA. A study demonstrated that the IL19 risk allele was relevant to stroke/MI in SLE and RA, but not in the general population, showing that shared immune pathways may be contained in cardiovascular disease pathogenesis and inflammatory rheumatic diseases [36]. The expression of UTY and PRKY was found associated with the risk of CAD [37, 38]. Studies have proved that the SNP rs17465637 in the MIA3 gene was associated with the risk of CAD and RA [39, 40]. Another study found that high-intensity interval training (HIIT) could improve RA skeletal muscle gene-BCKDHB, which can increase amino acid catabolism and interconversion [41]. FYN is one of genes that is likely to play a significant role in maintenance and functioning of several of the replicated pathways of CAD [42]. Simultaneously, FYN gene is a diagnostic biomarker and one of key driver genes in RA synovial tissue subtypes C1 and C3 [43,44,45]. The ATF6 gene also plays an important role in both CAD and RA [46, 47]. Additionally, a study analyzed that AKAP13 was one the of hub genes unique to CAD [48], a finding consistent with our study. Another study showed that CYP1A2 genotype can modify the risk of RA and CYP1A2*1F allele may relate to leflunomide toxicity in RA patients therapy [49, 50]. A previous study has shown that INSL6 which produced by TNF-polarized macrophages can stimulate bone formation in mice with RA [51]. In addition, the binding of XIST to GATA1 can promote to RA [52]. The genes LIPT1 [53, 54] and TMEM40[55] were reported to be susceptibility genes with RA. Moreover, POT1 expression levels are significantly lower in RA than in the control group in vitro [56]. Mass spectrometry results revealed [57] that GALC expression levels were significantly increased in patients with atherosclerosis. In samples collected from male patients with new-onset heart failure, the RPS4Y1 was overexpression [58]. In addition, coffee intake is correlated with a risk of nonfatal myocardial infarction; this correlation is believed to be influenced by CYP1A2, which is related with the development of RA in Korea [59, 60]. After infarction, the expression of CACNA1A can enhance cardiac differentiation of brown adipose-derived stem cells to regenerate the myocardium after infarction [61]. Besides, using advanced technologies of lncRNA subcellular localization and silencing, lnc-KDM5D-4 expression was shown to be associated with atherosclerosis and CAD in men [62].

A total of 14 GO function terms were enriched by the differentiated modules, the top three terms were cAMP-dependent protein kinase activity, demethylase activity, and regulation of calcium ion import. A related study found that during cardiac preservation, a cAMP pulse could reduce the incidence and severity of transplant-related CAD [63]. Vasoactive intestinal peptide (VIP) may be an effective anti-RA treatment because it leads to the elevation of intracellular cAMP, which can inhibit TNF-α production in macrophages [64]. Another study indicates that a combination of cilostazol and MTX can activate the cAMP-dependent protein kinase pathway in the synovial fibroblasts resulting in the suppression of the inflammation of RA [65]. Fibroblast-like synoviocytes (FLSs) are involved in RA joint destruction, and pathologic process and elevated JMJD3 promotes the proliferation and migration of FLS [66]. A study found that the` Janus kinase-signal transducer and activator of transcription (JAK-STAT) pathway is an emerging target in inflammation, mainly in RA, and it heightens the cardiovascular risk [67]. Overexpression of a histone demethylase KDM4B could boost cell growth, migration and invasion, and inhibit apoptosis of FLS in RA by activating STAT3 signaling [68]. Basal intracellular calcium ion concentrations in patients with inactive RA were significantly higher than in healthy individuals, which in turn were greater than in the active RA group, which showed the important roles of calcium ions in the pathological process of RA [69].

In the top three pathways of 16 KEGG pathways, cell adhesion molecules and mineral absorption were associated with both CAD and RA. For the pathway related to cell adhesion molecules, the expression levels of cell adhesion molecules increased in patients with RA, and were associated with disease activity, oxidative stress, and inflammatory markers targeting the expression of these molecules is an important therapeutic strategy for RA [70, 71]. Moreover, the expression of both CDC42 and microRNA-34a was correlated with that of cell adhesion molecules in patients with CAD [72, 73]. For the mineral absorption pathway, clinical trials revealed that the concentrations of the mineral copper were higher in patients with RA than in healthy people [74], and zinc and selenium levels in patients with CAD admitted for coronary artery bypass grafting were reduced compared to those before surgery [75].

Although we have found several of the related modules and susceptible genes, certain limitations for our study also exist. For limited datasets and samples involving the comorbidities, CAD and RA, cross validation could not be performed. With further clinical sequencing and updated cardiovascular disease and RA-related databases, investigations should continue to validate the modular mechanism of CAD&RA. Besides, the proposed susceptible genes also need further experimental and clinical validation.

In conclusion, thirteen characteristic modules and 49 susceptible hub genes for CAD&RA were identified by network modularization analysis, including ADO, ABCA11P, GALC, ZNF141, GPATCH8, ATF6, MIA3, etc. These hub genes and their corresponding molecular functions may reflect the underlying mechanism of CAD&RA, which can provide novel perspectives for their clinical therapy strategies and precise drug discovery.