Background

Ankylosing spondylitis (AS) is a rheumatic disorder of the axial skeleton that is characterized by inflammation and osteoproliferation, leading to a gradual loss of spinal mobility [1]. First identified in the late 1600s, AS is a type of spondyloarthropathy that predominantly affects young men, with a higher prevalence among white Europeans and a lower prevalence among sub-Saharan Africans [2,3,4].

AS is a highly heritable condition, and familial history has been strongly correlated with its pathogenesis in a number of twin studies [5,6,7]. Since 1973, the HLA-B27 gene, a major histocompatibility complex (MHC) class I allele, has been recognized as a major genetic risk factor for AS [8]. In fact, HLA-B27 frequency in a certain population is correlated with the prevalence of spondyloarthropathies, including AS, among its individuals [9, 10]. However, the disease is far from being monogenic in nature. In fact, the more than 100 genetic variants associated with AS account for no more than 30% of its heritability, indicating that much research needs to be carried out in this regard [11].

Despite being associated with several environmental and genetic factors, AS genetics and pathogenesis are poorly understood. The main aim of the current study was to improve the understanding of AS on the genetic level. Thus, in the present study, publicly available gene expression profiles of AS patients were analyzed using several bioinformatics techniques. A gene network was constructed that may serve as an AS gene signature for disease diagnosis and therapeutic target identification.

Methods

Data acquisition

The dataset used in the current study was obtained from The National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) repository (accession number GSE25101) [12].

It is worthwhile to note here that this was the only dataset available for ankylosing spondylitis at the time of writing this paper. The data contained peripheral blood samples from active AS patients (n = 16) and gender- and age-matched controls (n = 16) that were profiled on the Illumina HumanHT-12 V3.0 Expression BeadChip platform. As per the original study by Pimentel-Santos et al., total RNA was extracted from whole blood samples according to standard PAXgene protocol, PAXgene Blood RNA System® tubes (Qiagen, Doncaster, VIC, Australia), was quantified and the integrity assessed by Agilent 2100 BioAnalyser (Agilent, Santa Clara, CA, USA). Only samples with RNA integrity number above 7.5 were used [12].

Patients were Portuguese, aged 21 to 64, and had Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) scores of > 4 and Bath Ankylosing Spondylitis Functional Index (BASFI) scores of > 4. They were diagnosed using the updated New York criteria. Patients were receiving NSAIDs and/or sulphasalazine only. No patients treated with TNF, corticosteroids, or methotrexate were included [12]. For further details of the study subjects and complete microarray processing, please refer to the study by Pimentel-Santos et al. [12].

Data analysis and pre-processing

Over 10,000 genes were identified as differentially expressed (DE) genes after using the GEO2R software that is available on NCBI’s website (https://www.ncbi.nlm.nih.gov/geo/geo2r/). GEO2R facilitates the R-based analysis of GEO data and presents it as a list of genes that is sorted by significance [13]. Further case-control comparison analysis of this gene list was carried out using Microsoft Excel in order to identify the most DE genes in AS, yielding a total of 114 genes. After applying a strict p-value cut-off (p < 0.05), the final gene list consisted of 97 genes, 5 genes were removed for lack of identifier, and hence the final cured list of genes was 92. One of the genes was the HLA gene locus that contained 6 genes, and these 6 genes were each entered separately instead of a one locus. The final gene list contained 97 genes which was used for the final analysis and network construction.

Functional enrichment analysis

Following that, the 97 genes were entered into three databases for gene-gene network and pathway analysis: the Molecular Signatures Database (MSigDB), the geneMANIA database, and the Reactome Pathway database.

The MSigDB is a collection of annotated gene sets where a predefined gene set can be examined for overlap with the previously annotated genes in this database https://www.gsea-msigdb.org/gsea/msigdb/index.jsp. It is used with the Gene set enrichment analysis software (GSEA). GSEA is an analytical software that extracts insights from RNA expression analysis using gene sets, which are groups of genes with common function, location, or regulation (https://www.gsea-msigdb.org/gsea/index.jsp). For this first tool, the 97 genes were compared for overlap with three main gene set categories in MSigDB, the Gene Ontology (GO), Hallmark collection, and the canonical pathways (CP).

Similarly, GeneMANIA is an online tool that is used to predict the function of poorly understood genes as well as the genes that are most likely to be functionally similar (http://genemania.org/). For this second tool, the top 20 DE genes were input, in addition to the genes that were most downregulated (fold change < 0.6 and p-value < 0.05) and the genes that were most upregulated (fold change > 1.24 and p-value < 0.05).

For the next tool, Reactome which is an open-source and peer-reviewed database that facilitates stable over-representation analysis of genome-wide datasets (https://reactome.org/) [14,15,16,17,18], the full list of 97 genes were input for analysis.

Lastly, the Search Tool for the Retrieval of Interacting Genes (STRING) database, v. 11.0, was used to analyze high-confidence (> 0.7) protein-protein network interactions among the 97 DE genes [19]. The STRING database scores protein-protein interaction information from public sources and complements this data with computational predictions (https://string-db.org/).

Results

Identification of differentially expressed (DE) genes

Scrutinization of the 10,000 DE genes yielded 97 genes with a significant level of DE. All genes were deemed significant based on the parametric p-value only, as none were significant in terms of the adjusted p-value. Genes with no identifier were not included in the analysis. The most significantly DE genes were the ITM2A, ICOS, VSIG10L, CD59, TRAC, and CTLA-4 genes, among others (Supplementary file 1). However, the most significantly downregulated genes (fold change [FC] < 0.5 and p-value < 0.05) were the HLA histocompatibility gene locus: HLA-DRB6, HLA-DRB5, HLA-DRB4, HLA-DRB3, HLA-DRB1, HLA-DQB1, the integral membrane protein 2A gene (ITM2A), and cytotoxic T-lymphocyte associated protein 4 gene (CTLA-4).

Pathway analysis (MSigDB overlaps)

Figure 1 illustrates the number of genes in each of the results of the overlap with the MSigDB genesets. Tables 1, 2, and 3 specify the genes in the overlaps. Namely, overlaps were most common for the CD3E, IL2RA, CD27, CTLA-4, and ICOS genes. It is interesting to note here that the top overlapping genes were associated with regulation of the immune system processes, parts of cell membrane, and signaling related to T cell receptor and antigen receptor, in addition to some overlaps related to the IL2 STAT signaling, as well as the androgen response.

Fig. 1
figure 1

Analysis of the overlap of the cured genes with the three annotated sets in MsigDB Gene ontology BP and CC (GO), b Hallmark, and c canonical pathways (CP). The y-axis depicts the number of overlapping genes and the x-axis lists the pathway terms. Blue indicates cellular component (CC); red, biological process (BP); orange, Hallmark; and green, canonical pathways (CP)

Table 1 Analysis of the overlap of the cured genes with GO: biological processes and cellular components annotated sets in MsigDB
Table 2 Analysis of the overlap of the cured genes with Hallmark annotated sets in MsigDB
Table 3 Analysis of the overlap of the cured genes with canonical pathways annotated sets in MsigDB

Gene network analysis (GeneMania)

Three gene networks were constructed: one for the top 20 DE genes (Fig. 2), one for the most significantly downregulated genes (fold change < 0.6 and p-value < 0.05) (Fig. 3), and one for the genes that were most upregulated (fold change > 1.24 and p-value < 0.05) (Fig. 4). For the top 20 DE genes, the genes had 57.5% co-expression, 29.91% physical interactions, 7.58% co-localization. 2.51% of the gens had predicted interactions, 2.01% had shared protein domains, and only 0.49% were part of a pathway. For the downregulated genes, they had 78.04% physical interactions, 7.3% co-localization. 7.27% of the gens had predicted interactions, 4.95% were part of a pathway, 1.68% had genetic interactions, and 0.67% shared protein domains. Finally, for the upregulated genes, they had 67.64% physical interactions, 13.5% co-expression, 6.35% were predicted, 6.17% had co-localization, 4.35% were part of a pathway, 1.4% had genetic interactions, and 0.59% had shared protein domains. Functions of the genes are presented in each figure.

Fig. 2
figure 2

Construction of a gene-gene network using the top 20 genes with the most significant differential expression. Black circles with white stripes indicate genes that were entered as query terms, while black circles indicate the associated genes. The size of the circle corresponds with the number of correlations with other genes in the network

Fig. 3
figure 3

Construction of a gene-gene network using the most significantly downregulated genes (fold change < 0.6 and p-value < 0.05). Black circles with white stripes (inside circle) indicate genes that were entered as query terms, while black circles indicate the associated genes. The size of the circle corresponds with the number of correlations with other genes in the network

Fig. 4
figure 4

Construction of a gene-gene network using the most significantly upregulated genes (fold change > 1.24 and p-value < 0.05). Black circles with white stripes (inside circle) indicate genes that were entered as query terms, while black circles indicate the associated genes. The size of the circle corresponds with the number of correlations with other genes in the network

REACTOME pathway investigation

The most significantly over-represented pathways in the AS samples were the “RUNX1 and FOXP3 which control the development of regulatory T lymphocytes (Tregs)” and the “GABA receptor activation” pathways (Fig. 5).

Fig. 5
figure 5

Results of query of Reactome. Top 25 most over-represented pathways in terms of the false discovery rate (FDR)

Protein-protein interaction (PPI) network

The STRING database, v. 11.0, was utilized to identify high-confidence protein-protein interactions (PPI) of the most DE genes. With a significantly low PPI enrichment value (4.83e−6), it can be ascertained that the submitted query items are at least partially biologically connected as a group (Fig. 6).

Fig. 6
figure 6

High-confidence protein-protein interactions of the significantly differentially expressed genes. Disconnected network nodes are not shown

Discussion

HLA-B*27 is a major susceptibility allele for ankylosing spondylitis (AS), with over 95% of AS patients expressing this genotype. However, AS heritability is not fully understood, as the vast majority of individuals with the HLA-B*27 genotype do not develop AS [20]. An interplay between genetics and the environment is believed to be involved in AS, and its environmental trigger is thought to be ubiquitous [5]. In addition, the HLA-B*27 accounts for only around 20% of AS heritability, indicating the need for further scientific elucidation [20, 21]. In this study, we identified several DE genes, upregulated genes, and downregulated genes in AS. The significantly downregulated genes (p < 0.05; FC < 0.6) in AS were namely several HLA class II genes, the ITM2A gene, and the CTLA-4 gene.

The most downregulated genes in AS in the present study were HLA-DQ B1 and HLA-DR (B1, B3, B4, B5) genes as well as the HLA-DRB6 pseudogene. These genes were previously related to the inheritance in multiple sclerosis. For instance, epistasis between the HLA-DRB1 and HLA-DQB1 alleles was found to determine the susceptibility to multiple sclerosis [22]. The HLA-DQB1 gene is responsible for exogenous peptide presentation and encodes one of the two components of the HLA-DQ molecule, which regulates T cell response to islet cell autoantigens [23, 24]. In fact, HLA-DQB1 alleles have been identified as susceptibility loci for type I diabetes, celiac disease, multiple sclerosis, and narcolepsy [22, 25,26,27,28]. With regard to AS, HLA-DQB1 was previously associated with the age of onset and severity of AS in a Han Chinese population [29]. However, in Tunisian AS patients, HLA-DRB1 and HLA-DQB1 were not directly linked to the disease [30]. Among HLA-B27-negative AS patients, negative associations were observed for the HLA-DRB1*15:01, HLA-DQB1*02:01, and HLA-DQB1*06:02 alleles, but a positive association was seen for the HLA-DRB1*11 allele [31]. In one large previous study, there was an independent association with variants in the HLA-A, HLA-DPB1, and HLA-DRB1 loci with AS [32]. It is likely that the role of these genes in AS is more complex and population related, and we must keep in mind that our dataset is derived from a Portuguese population.

The second most downregulated gene was the integral membrane protein 2A (ITM2A) gene. ITM2A is primarily involved in the cellular differentiation of chondrocytes, myocytes, and osteoblasts as well as cellular autophagy [33,34,35]. In fact, enhanced ITM2A expression inhibits the chondrogenic differentiation of mesenchymal stem cells [36]. Also, ITM2A downregulation has been associated with decreased autophagy of breast and ovarian cancer cells [37, 38]. ITM2A was reported to be consistently downregulated in a meta-analysis of four AS case-control datasets [39]. Our results confirm this.

The cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) gene, also known as CD152, was the third most significantly downregulated gene in AS patients. This gene encodes the CTLA-4 surface glycoprotein, an inhibitory molecule that is constitutively expressed by T cells and which maintains tolerance to self-antigens [40]. Although the precise mechanisms are not fully understood, interactions of ligands with CTLA-4 serve to inhibit T cell responses [41]. Klocke et al. created a murine model for CTLA-4 deficiency by breeding mice in which CTLA-4 can be conditionally deleted in adulthood using tamoxifen treatment. The mice developed abnormal immune activation, multiorgan lymphocyte infiltration, and autoantibodies in a short period of time [42]. CTLA-4 polymorphisms have been significantly associated with autoimmune disorders, namely AS, Graves’ disease, and rheumatoid arthritis, in British Caucasian, Irish Caucasian, Iranian, West Algerian, and West Mexican populations but not in a Turkish population [43,44,45,46,47]. It is interesting that the HLA genes and the CTLA-4 proteins were connected through the CD3E protein (Fig. 6).

In the present study, the genes with the most overlaps in AS patients were the CD3E, IL2RA, CD27, CTLA-4, ICOS, ITGA6, SERPINE2, and CD2 genes. The CD3E gene encodes a part of the T cell receptor-CD3 (TCR/CD3) complex, which is essential in linking antigen recognition to signal transduction. CD3E has been previously associated with type I diabetes in women, but it did not show a significant association with susceptibility to coeliac disease [48, 49]. Moreover, a reduced number of immunoreceptor tyrosine-based activation motifs (ITAMs) in CD3E, among the other TCR/CD3 complex subunits, was associated with a breakdown in central tolerance in a murine model [50]. To our knowledge, CD3E was not previously related to AS.

Similarly, the interleukin 2 receptor subunit alpha (IL2RA) gene, also known as CD25, encodes a part of the IL-2 receptor, a cytokine signaling molecule that is important for microbial defense and self-recognition. In one study, IL2RA polymorphisms did not confer the risk of acute anterior uveitis with AS, while another study found that the IL2RA rs2104286 polymorphism was associated with intermediate uveitis but not HLA-B27-associated acute anterior uveitis [51, 52]. Transcriptome analysis identified IL2RA as a potential candidate gene for AS, but genotyping assays found no association between IL2RA polymorphisms and non-anterior uveitis susceptibility [53].

The CD27 gene encodes a tumor necrosis factor (TNF) receptor that is thought to act as a co-stimulatory immune checkpoint for T cell generation and maintenance [54]. The CD70 molecule belongs to the tumor necrosis factor ligand superfamily, and its transient availability on lymphocytes and dendritic cells activates B or T cells that express CD27 [55, 56]. As part of a surface ligand-receptor pair with CD70, CD27 dysregulation has been linked to inflammatory and autoimmune disease [57]. Clonally expanded T cells belonged to the CD27−/CD28− populations in 2 AS patients, while 18 newly diagnosed AS patients were found to exhibit a decreased frequency of CD27+ B cells compared to controls [58, 59].

The inducible T cell co-stimulator (ICOS) gene, also known as CD278, is part of the CD28 superfamily receptors, along with CD28 and CTLA-4. ICOS is essential for T cell function and activation, and it is expressed on activated CD4 and CD8 T cells [60, 61]. It was discovered to have important roles in T cell proliferation and cytokine secretion after it was first identified as a marker of T cell activation [62]. ICOS superinduces IL10 and co-induces a number of other cytokines [63,64,65]. ICOS dysregulation has been previously implicated in a murine model of spondyloarthritis, where it was found to be essential to maintaining the balance between IL10 and IL17 expression [66]. B cells express the ICOS ligand (ICOSL) constitutively, which is inducible in monocytes, dendritic cells, fibroblasts, and endothelial cells. Blocking of the ICOS-ICOSL interactions inhibits both T-helper (Th) type 2 and Th1-dependent responses [60]. In an AS context, conflicting findings were reported with regard to CD4+/CXCR5+/ICOS+ T cell percentages in the peripheral blood of AS patients [67,68,69,70].

The integrin alpha-6 (ITGA6) gene, which encodes a transmembrane receptor, mediates adhesion on the cell-to-cell and cell-to-matrix levels, and it is commonly upregulated in a wide range of tumors [71]. In fact, inhibition of the ITGA6 gene led to the suppression of tumor angiogenesis and gallbladder cancer growth [72]. ITGA6 has been previously identified as part of the differential expression profile of rheumatoid arthritis, and it was among the most differentially expressed genes in a murine model of AS [73, 74]. Similarly, the CD2 gene encodes a cell adhesion molecule that is expressed on the surfaces of natural killer cells and T cells as well as some sub-populations of normal B cells [75]. In rheumatoid arthritis, high levels of CD2 activation epitope T11-3 were reported on blood and synovial fluid T cells [76].

The serpin family E member 2 (SERPINE2) gene encodes a glycoprotein that is expressed on a diverse range of cell types, but it has a key role in the pathophysiology of cardiac fibrosis [77]. Little information is available on the role of the SERPINE2 gene in AS, but, in osteoarthritis, this gene has been reported to prevent cartilage catabolism via the inhibition of MMP-13 expression [78].

There are some limitations to the present study. Genes were only found to be significantly expressed when considering the parametric p-value and not the adjusted p-value. Moreover, the analyzed dataset was obtained from a Portuguese population, which might limit the extrapolation of these findings to other populations. We could assume that these patients are ethnically homogenous, as the study by Pimentel-Santos did, but we also do not have information about other diseases that these patients may have had that may have interfered with the results. Thus, the results of our analysis need to be further validated in a different set of patients and their matched controls. Since the research was retrospective in nature, we were unable to collect any additional information about the patients or the microarray characteristics. This research discovered many new genes and networks. However, further studies may be needed to validate the effect of these genes/networks in a clinical setting.

Conclusion

We have described the differential expression profile of AS patients from a previously published dataset using recent analysis techniques. Our data pointed to the involvement of immune system processes and inflammatory processes, as well as other processes and pathways that were not described in the original article. These pathways and networks once validated might serve as biomarkers for AS and can potentially help in diagnosing the disease and identifying future targets for treatment.