Background

Systemic lupus erythematosus (SLE) is a disabling multisystem rheumatic disease with substantial epidemiological variation [1]. SLE is characterized by the dysregulation of the immune system and high phenotypical diversity [2]. This phenotypical heterogeneity includes a wide range of clinical manifestations that are exemplified by the current use of multiple clinical phenotypes as criteria to diagnose the disease [3]. So far, however, little is known about the causes of this phenotypic variation. Understanding the molecular mechanisms associated with the pathogenesis of SLE phenotypes could therefore be of high relevance to develop more efficient therapeutic approaches and preventive strategies.

SLE is characterized by a strong genetic component, with a sibling recurrence rate (λs) of 8–29 and estimated heritability of approximately 66% [4]. To date, nine genome-wide association studies (GWAS) of SLE risk have been performed in European and Asian populations [5]. Together these studies have led to the identification of >40 loci associated with SLE susceptibility. Despite this extraordinary success, there is still a lack of understanding of the genetic variation that is relevant for the development of specific phenotypes within the disease. There is evidence, however, that the main clinical phenotypes in SLE aggregate in families [6], suggesting a genetic basis underlying disease heterogeneity.

To date, only a few candidate gene studies have been performed to uncover the genetics of clinical heterogeneity in SLE [7]. These studies have identified immunity-related genes associated with clinically relevant SLE phenotypes [8]. From these, the most significant associations have been detected between renal disorder and genetic variation in the ITGAM and STAT4 genes, which have been also associated with discoid rash and oral ulceration, respectively [9, 10]. Other significant findings include the association between renal disorder and TNFSF4, malar rash and FCGR2A and hematological disorder and variation in the IL21 gene. So far, however, the genetic component for most SLE phenotypes has been only partially explained. Therefore, the analysis of genetic variation at a genome-wide scale is needed to identify additional variation associated with SLE clinical heterogeneity.

Complex traits like disease risk or clinical phenotypes have been shown to be caused by multiple genes of small effect size [11]. The identification of these small-effect genes is currently one of the major challenges in the characterization of the genetic background for disease phenotypes [12]. Importantly, single-marker GWAS do not allow the identification of genetic variants with small effect sizes, unless extremely large sample sizes are used [13]. In this common type of GWAS, a large number of markers are tested for association and, consequently, stringent significance thresholds are applied, which makes the identification of small-effect variants very difficult [14]. In addition, single-marker GWAS ignore the joint contribution of multiple genes that act coordinately in the same biological process [15]. The characterization of the genetic basis of many complex traits will therefore require the development of new powerful methods that are able to leverage biological knowledge and efficiently integrate the evidence from multiple loci with moderate to small effect sizes.

Recently, novel statistical methodologies that are able to test genetic risk associations at the pathway level have been developed [16]. Pathway-based approaches test whether sets of functionally related genes are jointly associated with a particular phenotype [17]. This methodology strongly reduces the number of association tests and, therefore, it can substantially increase the power to identify new genetic variation compared to single-marker GWAS [18]. The genome-wide pathway analysis (GWPA) has been recently used to characterize the genetic basis of several complex diseases like cancer [19]. Very recently, using the GWPA approach we have identified new genetic variation associated with psoriasis, an autoimmune disease of the skin [20]. This result confirms the utility of GWPA in the study of the genetics of autoimmune diseases.

To gain a better understanding of the genetic basis underlying phenotype heterogeneity in SLE we have performed, for the first time, a GWAS of clinical phenotypes using the GWPA approach. In this study we have analyzed a discovery cohort of 482 SLE patients of European ancestry to determine the association between 798 reference biological pathways and the main clinical phenotypes of SLE represented that are used as diagnosis criteria. Using an independent cohort of 425 SLE patients from the same ancestry, we have then performed a validation study of the most significant genetic pathways. Based on these results, we have performed an in silico validation analysis to evaluate the functional impact of drugs commonly used to treat the associated phenotype. Our findings provide new insights into the biological mechanisms associated with clinical phenotypes of SLE.

Methods

Study population

In the discovery stage, a total of 482 SLE patients were recruited. SLE patients were collected from the outpatient clinics of the rheumatology departments of 15 Spanish University Hospitals belonging to the Immune-Mediated Inflammatory Disease (IMID) Consortium [21]. All patients were diagnosed by a rheumatologist. Only those patients with SLE that fulfilled ≥4 of the 1982 revised American College of Rheumatology (ACR) diagnosis criteria were included in the present study [3]. All patients included in this study were >16 years old at the time of sample collection and had >3 years of evolution from the diagnosis date. SLE patients with psoriasis, inflammatory bowel disease (Crohn’s disease or ulcerative colitis) or other rheumatic diseases like rheumatoid arthritis, or multiple sclerosis were excluded from the study. All SLE patients were Caucasian European with all four grandparents born in Spain.

In the validation stage, an independent cohort of 425 SLE patients was used to replicate the genetic pathways that were significantly associated with the SLE phenotypes in the discovery stage. All patients from the validation cohort fulfilled the ACR diagnostic criteria for SLE and were also collected from the IMID Consortium, following the same inclusion and exclusion criteria as for the discovery cohort. All the procedures were followed in compliance with the principles of the Declaration of Helsinki.

The main epidemiological and clinical variables of the discovery and validation cohorts are summarized in Table 1. The distribution of each variable was compared between the discovery and validation cohorts using Fisher’s exact test or Student’s t test for categorical and quantitative variables, respectively.

Table 1 Main epidemiological and clinical features of the discovery and validation patient cohorts

SLE phenotypes

The diagnosis of SLE is of major importance to guide both the disease classification and the patient therapy [22]. Given the high phenotypic heterogeneity of SLE, in order to analyze the most relevant clinical manifestations for disease diagnosis we defined the SLE phenotypes according to the established ACR diagnostic criteria for SLE [3]. Consequently, the 11 SLE phenotypes represented by the ACR diagnostic criteria were analyzed using the GWPA approach. These criteria include malar rash, discoid rash, photosensitivity, oral ulcers, arthritis, serositis, renal disorder, neurologic disorder, hematologic disorder, immunologic disorder and antinuclear antibodies. The distribution of each clinical phenotype in the discovery and replication cohorts is shown in Table 1.

DNA extraction and genome-wide genotyping in the discovery and validation patients

In the discovery stage, the genome-wide genotyping of the 482 SLE patients was performed using the Illumina Quad610 Beadchips (Illumina, San Diego, CA, USA) at the Centro Nacional de Genotipado (CeGen, Madrid, Spain). The genotyping quality control analysis was performed using PLINK software (Additional file 1: Figure S1) [23]. To evaluate the presence of potential population stratification in the SLE patient cohorts, we used the principal component analysis (PCA) implemented in the EIGENSOFT (v4.2) software [24]. Using the first 10 PCs of variation over 10 iterations we identified 14 samples showing an outlier genetic background and were excluded from downstream analysis (Additional file 1: Figure S1). After the quality control analysis, a final dataset of 507,051 single nucleotide polymorphisms (SNPs) and 395 SLE patients was available for the GWPA.

The validation of the two genetic pathways associated with SLE in the discovery stage required the genotyping and analysis of a total 1347 SNPs. Given the large number of variants to be tested and the utility of genome-wide data for accurate genetic ancestry identification, the 425 SLE patients in the validation cohort were genotyped using the same microarray platform. Genotyping for the validation stage was performed at the HudsonAlpha Institute for Biotechnology (Huntsville, AL, USA). The same quality control analysis as in the discovery stage was performed (Additional file 1). A total of 394 SLE patients and all 1347 SNPs from the two genetic pathways passed the quality control and were available for the pathway-based analysis of the validation stage.

Analysis of association between established SLE risk SNPs and SLE phenotypes

Genetic variants associated with SLE risk

The list of established genetic variants (P < 5e-8) for SLE risk was obtained from a recent GWAS meta-analysis in a case-control cohort of European ancestry [5]. A total of 43 genetic variants associated with SLE risk were identified and selected for the analysis of association with SLE clinical phenotypes (Additional file 1: Table S6).

Imputation of genetic variants associated with SLE risk

From the established autosomal SLE risk SNPs (N = 41 SNPs), the genetic variants that were not directly genotyped by the GWAS Quad610 genotyping array (N = 17 SNPs) were imputed (Additional file 1). Those SNPs that did not pass the stringent imputation quality control filter (N = 1 SNP, information quality metric <0.8) were excluded from the study. Therefore, after excluding two non-autosomal variants and a low-quality imputed SNP, a total of 40 from the initial 43 established SLE risk SNPs were finally available for analysis of association with SLE phenotypes in the discovery cohort. In the validation cohort, the same procedure was followed to obtain the genotypes of the risk variants to be tested for replication.

Statistical association analysis

The statistical association analysis between the allele dosage of the established SLE risk SNPs and the SLE clinical phenotypes was performed using the logistic regression model implemented in the SNPTEST v2 software (Oxford, UK) [25]. In this model, the allele dosage was defined as follows:

$$ \mathrm{Allele}\ {\mathrm{Dosage}}_{\mathrm{i}}={\displaystyle \sum_{\mathrm{g}=0}^2} \Pr \left(\mathrm{G}=\mathrm{g}\right)*\mathrm{g} $$

Where g represents each genotype of a particular genetic variant i and Pr(G = i) is the marginal posterior probability obtained by imputation. The allele dosage takes values between 0 and 2. SLE patients without phenotypical data available for the phenotype analyzed were excluded from the association analysis. Finally, the P values obtained from the discovery and replication stages were combined using the METAL software [26].

GWPA

GWPA method

The gene-set analysis, also referred to as pathway analysis, is a very powerful methodology to analyze the genetic architecture of complex diseases using GWAS data [27, 28]. An important advantage of this approach is that the hypothesis space is significantly reduced compared to single-marker GWAS. While extremely large number of markers (>500,000 to several millions) are independently tested for association in single-marker GWAS, the number of simultaneous tests is several orders of magnitude lower in pathway-based GWAS (typical range 500–2000 pathways). Consequently, the threshold for significance is much less stringent than the consensus threshold used for single-marker GWAS (P < 5e-8) [18, 27]. In addition, pathway-based studies integrate the effects of multiple genetic risk variants that participate in the same biological processes. For these reasons, pathway-based studies can have high statistical power to discover new susceptibility genetic variants provided that they operate within the analyzed pathways.

In the present study, the GWPA was performed using the set-based test implemented in the PLINK software as described previously [20, 23]. Compared to other methods, this set-based test uses genotype data to estimate pathway association instead of P values for significance. Importantly, this approach accounts for the linkage disequilibrium between SNPs and therefore avoids an increase in false positive results due to genes with multiple, highly correlated markers. For each pathway, independent SNPs are first identified (linkage disequilibrium of r 2 < 0.2 here), and from these an average statistic is calculated. Finally, the statistical significance of the pathway is computed using permutation, thereby efficiently correcting by the number of SNPs within the pathway (Additional file 1). As described for the analysis of association with established SLE risk SNPs, those patients with missing data for the phenotype tested for association were excluded from the analysis. In order to account for multiple testing, the false discovery rate (FDR) method was used. The corrected empirical P values from the discovery and validation stages were combined using Fisher’s method.

Gene set definition

Reference biological pathway annotation databases BioCarta, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome were used for the present study [29]. A total of 217, 186 and 674 curated biological pathways from the Biocarta, KEGG and Reactome databases, respectively, were included, respectively (5th October 2015). Very small uninformative pathways (i.e. <=15 genes) were excluded from the analysis. As described previously, we also excluded large genetic pathways (i.e. >300 genes) [17]. The SNP-gene mapping was performed using the NCBI RefSeq database release 63 (12th October 2015) and an SNP-gene distance window of 20 Kb [30]. The final gene set used for the present GWPA was composed of 211,724 SNPs mapping to 798 different pathways. The list of genetic pathways included in the GWPA is shown in Additional file 1: Table S7.

In silico analysis of VEGF pathway genes after treatment with topical immunotherapies for cutaneous SLE

In the GWPA, we identified significant genetic association between oral ulceration and the VEGF pathway. The VEGF pathway plays a crucial role in angiogenesis, and there is increasing evidence supporting the implication of this biological process in the pathogenesis of SLE cutaneous phenotypes [31]. These disease phenotypes are commonly treated with steroid and non-steroid topical immunotherapies in SLE [32, 33]. Consequently, we hypothesized that the topical immunotherapies prescribed for cutaneous SLE mediate their therapeutic effect in this tissue through the VEGF pathway and, therefore, should induce significant transcriptional changes in the pathway genes. In order to test this hypothesis, we used transcriptional data from microarray experiments in the NCBI Gene Expression Omnibus microarray database (GEO, https://www.ncbi.nlm.nih.gov/geo/). In this database, we searched for whole genome expression profiling datasets generated from cutaneous/mucocutaneous human samples or cell cultures (5 November 2015). From these, we looked for tissue or cell cultures treated with any of the common steroid and non-steroid topical immunotherapies most widely used in SLE (Additional file 1). We found a total of three datasets analyzing the transcriptional variation after treatment with four common immunotherapies: betamethasone valerate and pimecrolimus (GSE32473), diphencyprone (GSE52360) and imiquimod (GSE68182). The first two transcriptional datasets (i.e. GSE32473 and GSE52360) were obtained from skin biopsies and the latter (GSE68182) from an in vitro study on vaginal mucosal cells (i.e. cell line Vk2/E6E7). For each gene expression dataset, we performed quality control analysis and subsequent normalization on the log2 scale using the quantile normalization method. The analysis of differential expression of the VEGF pathway genes between treated and non-treated samples was performed using Student’s t test. The statistical significance of the global perturbation of the VEGF pathway was assessed using the binomial test. All analyses were performed using the R statistical software [34].

Results

Phenotypic characterization of the studied cohorts

The epidemiological and phenotypical characteristics of the discovery and replication SLE populations are shown in Table 1. There were no significant differences between the discovery and replication cohorts in the distribution of the epidemiological and phenotypical variables (P > 0.05, Table 1).

Identification of SLE risk genetic variants associated with SLE phenotypes

In the discovery stage, we found that 19 out of the 43 SNPs previously associated with SLE risk were also significantly associated with one or more clinical phenotypes (P Discovery < 0.05, Table 2). The association results between each established genetic variant and SLE phenotype are summarized in Additional file 1: Table S1. However, in the independent validation cohort, only the association between PTPN22 and hematologic disorder (P Replication = 0.043 , P Combined = 8.25e-4) and between PTPN22 and the production of antinuclear antibodies (P Replication = 0.028, P Combined = 0.001) were significantly replicated. Combining the statistical evidence from the two cohorts, an additional seven loci were found to be associated with SLE phenotypes at the nominal level (P Combined < 0.05, Table 2).

Table 2 SLE risk SNPs association with clinical phenotypes

Identification of genetic pathways associated with SLE phenotypes

In the GWPA, two genetic pathways were found to be significantly associated with an SLE phenotype after multiple test correction (P value for false discovery rate (P FDR) < 0.05, Table 3). The VEGF pathway was associated with the presence of oral ulcers (P FDR = 0.044) and the RIG-I/MDA5 negative regulation signaling pathway was associated with the production of antinuclear antibodies (P FDR = 0.016). The results for analysis of association between each genetic pathway and SLE phenotype are shown in Additional file 1: Table S2.

Table 3 Genetic pathways associated with the SLE phenotypes in the discovery stage

Using the independent validation cohort, the association between the VEGF genetic pathway and oral ulcers in SLE was significantly replicated (P FDR = 0.026, Table 3). The details of association between the VEGF pathway and oral ulcers are shown in Additional file 1: Table S3.

Perturbation of the VEGF genetic pathway by current therapies for cutaneous SLE

Topical immunotherapies are drugs commonly used to treat cutaneous phenotypes of SLE like oral ulceration. Given the observed genetic association between oral ulcers and VEGF pathway, we performed an in silico analysis to evaluate the effect of four current topical immunotherapies on the expression of its constituent genes. Using whole genome expression datasets from patients and relevant cell types treated with these therapies, we found that three out of the four analyzed drugs significantly perturb the expression of VEGF pathway genes (Fig. 1, Additional file 1: Table S4). A total of 16, 12 and 7 genes out of the 29 genes from the VEGF pathway were significantly differentially expressed after imiquimod (P = 5.38e-14), betamethasone valerate (P = 5.69e-9) and diphencyprone (P = 4.59e-4) treatment, respectively (Additional file 1: Table S5).

Fig. 1
figure 1

Vascular endothelial growth factor (VEGF) pathway perturbation after topical immunotherapy. Network representation of VEGF genes according to the differential gene expression after treatment with common topical immunotherapies: imiquimod (a), betamethasone valerate (b), diphencyprone (c) and pimecrolimus (d). Genes are represented as nodes and are connected by edges according to experimental or computational evidence of interaction between their encoded proteins. The diameter of each node is proportional to the significance of differential expression, with significant genes (P < 0.05) in red and non-significant genes (P ≥ 0.05) in blue

Discussion

One of the major challenges in the pathogenesis of SLE is to understand the biological mechanisms responsible for its phenotypic heterogeneity. Although single-marker GWAS have successfully identified a large number of genetic variants associated with SLE risk, the genetic basis of SLE phenotypes has so far been analyzed only in a few candidate gene studies. In order to identify new genetic variation, we have here performed the first GWAS on SLE phenotypes using a pathway-based approach. Using a discovery cohort of individuals with European ancestry and a validation cohort with the same ancestry, we have identified and validated the association between VEGF genetic pathway and oral ulcers, a common manifestation of SLE. The results of this study provide new insights into the genetic basis and the biological mechanisms associated with the clinical heterogeneity of SLE.

The VEGF pathway is a network of genes that are involved in the transduction of different intracellular signals and act coordinately to modulate inflammatory and angiogenic processes [35, 36]. The dysregulation of angiogenesis has been described as an important biological mechanism in the pathogenesis of SLE [37]. In particular, there is growing evidence that angiogenesis is also involved in the development of skin manifestations in SLE patients [38, 39]. The serum levels of VEGFA protein itself have been suggested as a useful marker for disease activity monitoring in SLE patients [40]. Importantly, the serum levels of VEGFA have also been found to be significantly elevated in SLE patients with cutaneous manifestations [41]. Despite this clinical evidence, the VEGF pathway had not been previously associated with SLE at the genetic level. Our study provides the first evidence of a genetic association between the VEGF pathway and oral ulceration in SLE.

Oral ulcers are frequently chronic mucocutaneous lesions that affect up to 54% of patients with SLE [42, 43]. This clinical manifestation is characterized by high angiogenic activity and a loss of the epithelial and connective tissue in the oral mucosa [44, 45]. Accordingly, anti-angiogenic therapies like thalidomide have been suggested to promote oral ulcer healing and to control ulcer recurrence [46, 47]. From a clinical perspective, oral ulceration has been associated with an increase in the disease activity and a worse prognosis in SLE [48, 49]. The early detection of oral ulcers is therefore highly valuable as it contributes to an earlier diagnosis of SLE and, consequently, to faster initiation of treatment.

The genetic association identified in this study is consistent with previous evidence from other ulcer-related diseases like Behçet disease (BD), recurrent aphtosus ulceration (RAU) and gastroduodenal ulcers. BD is an inflammatory disorder characterized by an extremely high frequency of oral ulcers (>95%) [50]. Clinical evidence suggests that VEGF cytokine could be directly implicated in the formation of oral ulcers in BD [51]. In RAU, the most common oral mucosal disease, the salivary levels of VEGF have been also associated with oral ulceration [52]. Finally, genetic variation in the VEGF gene has been also associated with the risk of developing gastroduodenal ulcers [53]. Evidence from these studies implicates angiogenesis in the pathophysiological development of oral ulcers, at both the genetic and at the functional level. Accordingly, the genes in the VEGF pathway are strong candidates for susceptibility in diseases with a high prevalence of oral ulcers like BD or RAU. Future studies aimed at testing the association between VEGF pathway genes and these diseases are therefore warranted.

Topical steroid and non-steroid immunotherapies have been successfully used for the treatment of the cutaneous manifestations in SLE. However, topical immunotherapies are not exempt from side effects, including skin atrophy or telangiectasias with corticosteroids and the exacerbation of the inflammatory processes with non-steroid immunotherapies like imiquimod [32]. In this study, we have demonstrated that topical steroid and non-steroid immunomodulators significantly affect the expression of the VEGF pathway genes. The results of this in silico analysis indicate that the VEGF genetic pathway could be a key mediator of the benefits of topical immunotherapy to reduce oral ulceration. Therefore, our findings suggest that the VEGF pathway is a relevant source of new drug targets for oral ulceration that could allow more specific treatment while reducing the undesirable side effects of current therapies. In order to confirm the VEGF pathway as new source for drug discovery in the treatment of SLE oral ulceration, further prospective studies using oral ulcer samples from SLE patients are clearly needed.

In the present study, we have also identified and validated the association between SLE risk locus PTPN22 and the production of antinuclear antibodies and with the presence of a hematologic disorder. It is the first time that this coding SNP (rs2476601) has been associated with the development of hematologic disease in SLE. A previous study reported a non-significant trend for association between PTPN22 and antinuclear antibody positivity [54]. The results from our study provide the evidence to confirm the association between this susceptibility gene and the production of a common autoantibody in SLE. Also, the previously identified association between genetic variation in the TNFSF4 gene and renal disorder was replicated in the discovery cohort [9]. Conversely, the reported association between renal disorder and ITGAM and STAT4 genes was not replicated. The renal disorder phenotype encompasses different clinical manifestations (e.g. persistent proteinuria or cellular casts) and, therefore, differences in frequencies in any of these sub-phenotypes could have prevented the replication. Additional studies performing specific analysis of association for each renal disease subtype could help to further refine this genetic association. Oral ulceration was also previously found to be associated with variation in STAT4. This association was not replicated in the present single-marker analysis. The lack of replication could be explained by the comparatively smaller sample size of our study cohorts and by the small effect size reported for the association (OR ~ 1.12). For example, in the European cohort, STAT4 association was not statistically significant despite analyzing >4,000 individuals [9]. Finally, after combining both patient cohorts, we also found seven other SLE risk loci - IL10, IKZF2, MHC class III, UHRF1BP1, ETS1-FLI1, SH2B3 and SLC15A4 - to be nominally associated with different phenotypes. These also represent new genetic associations with SLE heterogeneity. Further studies using independent cohorts of phenotypically well-characterized SLE patients like the present one are needed to corroborate these associations.

GWPA represents a new and powerful approach to identify genetic variation associated with complex phenotypes. However, this study has limitations. First, the sample size of the discovery and replication cohorts is moderate compared to recent case-control GWAS and this could have led to missed pathway associations with SLE clinical phenotypes. Second, the statistical power to detect significant associations is lower for those clinical phenotypes with more extreme frequencies (e.g. 7% of SLE patients have a neurologic disorder). Therefore, larger cohorts of well-characterized SLE patients will be needed to identify additional genetic variation associated with clinical phenotypes. Finally, the GWPA methodology also has intrinsic limitations, mainly related to the current knowledge of biological processes and subsequent definition of the pathways [55]. For many human genes, the functional annotation is far from complete, which precludes their mapping to reference biological pathways. Also, genomic variation located far from the transcribed region itself could be relevant for the regulation of the gene expression. Conversely, variants within genes could be influencing the activity of other distant genes or genes in other chromosomes (i.e. trans-regulation). With the increase in knowledge of genomic regulation [56], the mapping of SNPs to their functionally related genes will clearly improve. Consequently, the integration of this knowledge into GWPA will likely increase the power of this approach to identify new pathways associated with human diseases.

Conclusions

In conclusion, we have performed, for the first time, a genome-wide association analysis to identify genetic risk factors for the main phenotypes in SLE. To do this, we have used a pathway-based analysis approach. Using this approach, we have identified and validated the association between the VEGF genetic pathway and the presence of oral ulcers in SLE. These findings show the existence of a genetic basis underlying SLE heterogeneity that is independent from the genetic component associated with disease risk. The results of the present study could contribute to the development of more efficient therapies to treat cutaneous manifestations of SLE in the near future.