Introduction

Scleroderma (SSc) is a chronic connective tissue disease characterised by fibrosis of the skin and internal organs, vascular damage and immune dysregulation [1]. SSc is characterised by marked heterogeneity of clinical manifestations and disease course, and its pathogenesis remains poorly understood. SSc carries one of the highest mortality rates among connective tissue diseases, with interstitial lung disease (ILD) being the leading cause of death [2]. Although the majority of SSc-ILD patients have a relatively mild and/or stable lung disease, a substantial minority have progressive lung fibrosis [2]. The molecular pathways which underlie development and progression of SSc-ILD are currently unknown, but are likely to be driven by an interaction between predisposing genetic factors and environmental triggers. Identification of the genetic determinants of lung fibrosis in SSc could improve understanding of pivotal molecular pathways, potentially leading to better prognostic and therapeutic tools for SSc-ILD.

Evidence for a genetic predisposition to SSc as a whole includes a higher prevalence in first degree relatives, and variation in prevalence among different ethnic groups. Twin studies have revealed a strong genetic influence on antinuclear antibody status, in turn linked with internal organ involvement, with ATA antibodies strongly associated with development of SSc-ILD, and ACA antibodies protective for ILD. A number of genes have been consistently associated with SSc as a whole. Similarly to other autoimmune diseases, there is a strong effect of the HLA (human leukocyte antigen) region, mainly with specific autoantibodies [3]. Immune response–related genes are among the most consistently replicated non-HLA associations, including interferon regulatory factor 5 (IRF5) [4, 5], signal transducer and activator of transcription 4 (STAT4) [6, 7], and cell receptor CD3ζ (CD247) [8].

A smaller number of studies have looked at SSc-ILD specifically, although conflicting evidence is reported, with only a few associations replicated in more than one study [9]. Genetic associations reported as specific to SSc-ILD in more than one cohort, including IRF5, STAT4, DNAX accessory molecule 1 (CD226) and interleukin-1 receptor-associated kinase-1 (IRAK1). A number of SNPs in IRF5 have been associated with SSc-ILD, including in a French [10, 11] and a Han Chinese [12] population. IRF5 SNP rs4728142 was associated with improved survival in SSc [4]. A SNP in STAT4, rs7574865, has also been associated with SSc-ILD in a French [13] and a Han Chinese [7] population. CD226 SNP rs763361 was significantly associated with SSc-ILD in a meta-analysis study of three European populations, with a trend towards significance when each population was analysed separately [14]. The minor allele of rs1059702 in IRAK1, on the X chromosome, results in increased NFκ-B activity. Two studies performing meta-analysis on European populations have been reported. In both studies, which both comprised meta-analysis of three populations, rs1059702 was associated with SSc-ILD [15, 16].

In this study, we focused on genes previously reported as risk factors for SSc-ILD in more than one population, and selected the SNPs which had been most consistently associated [9]. We also sought to determine their association with mortality and ILD progression.

Materials and methods

Study populations

DNA samples were collected from consecutive, unrelated SSc patients attending clinics at the Royal Brompton and Royal Free Hospitals, London. The diagnoses were made from well-defined criteria for SSc [17]. Only individuals of European descent were included. The control population (n = 503) comprised individuals of European descent from the publicly available 1000 Genomes Project [18].

Clinical assessment

ILD was defined as the presence of fibrosis on chest imaging (chest X-ray or HRCT) and/or a forced vital capacity (FVC) < 75%. Pulmonary function tests (expressed as percent predicted) from the time of first presentation at the Royal Brompton Hospital were available for 578 patients. As a marker of ILD severity which adjusts for the extent of emphysema, the composite physiological index (CPI) was calculated as CPI = 91.0 − (0.65 × DLCO% predicted) − (0.53 × FVC% predicted) + (0.34 × FEV1% predicted) [19]. Time to decline was quantified using serial pulmonary functional indices starting from first visit. Significant functional deterioration was defined as a decline (quantified as percentage change from baseline) of ≥ 10% in FVC and/or of ≥ 15% in DLCO. To allow for possible response to treatment or spontaneous fluctuations, time to irreversible decline was used, defined as time to first significant change observed on at least two consecutive occasions. Data at a sufficient number of time points was available to calculate time to decline in 374 patients. All-cause mortality was also analysed (n = 553).

Genotyping

The gene locations of the seven SNPs selected for testing are shown in Fig. 1. DNA was extracted from blood using Gentra PureGene DNA kits (Qiagen). Genotyping was carried out according to manufacturer’s instructions using a commercially available TaqMan® assay and TaqMan® universal PCR master mix, no AmpErase® UNG (Applied Biosystems), on a Rotor-Gene 6000 real-time PCR machine (Qiagen). Quality control and genotype determination were performed using the Rotor Gene 6000 Series Software 1.7 (Corbett Research).

Fig. 1
figure 1

Location of the studied SNPs in IRF5, CD226, STAT4 and IRAK1. Shown are the locations of the SNPs tested in this study in relation to the gene exons of IRF5, CD226, STAT4 and IRAK1. rs4728142 is located in the promoter region of IRF5 and rs10488631 in the downstream region of IRF5. (IRF5) Interferon regulatory factor 5, (CD226) DNAX accessory molecule 1, (STAT4) signal transducer and activator of transcription 4, (IRAK1) interleukin-1 receptor-associated kinase-1

Statistical analysis

To test for deviation from Hardy-Weinberg equilibrium (HWE), genotype frequencies were determined by direct counting, and the chi square statistic or Fisher’s exact test were used as appropriate. Chi square analyses for association were carried out in Unphased v 3.1. To assess the most appropriate genetic model for significant signals, logistic regression analysis was applied in STATA v 15. Only female patients (n = 483) and controls (n = 263) were included in the analysis of the X chromosome IRAK1 SNP. Bonferroni correction was applied to correct for multiple testing of seven SNPs. A corrected p value (pcorr) < 0.05 was considered significant. The current study had 80% power to detect an association with SSc as a whole with an OR of at least 1.5 for a SNP with a minor allele frequency of 0.5 at pcorr < 0.05, and 80% power to detect an association with SSc-ILD with an OR of at least 1.6. Cox proportional hazards analysis was used to evaluate time to decline in FVC, time to decline in DLCO, and mortality, as implemented in the Stata v 15.1 (Computing Resource Centre).

Results

A total of 612 patients were included in the study, of whom 394 had ILD. Patient demographic and clinical characteristics are shown in Table 1. The genotyping success rate for all seven SNPs was ≥ 97.7%. All seven SNPs conformed to Hardy-Weinberg equilibrium in the control population.

Table 1 Patient characteristics

As shown in Table 2, a total of three of the tested SNPs were significantly associated with SSc compared with controls. IRF5 rs2004640 T allele (OR 1.30 (95% CI 1.10–1.54), pcorr = 0.015), IRF5 rs10488631 C allele (OR 1.48 (95% CI 1.14–1.92), pcorr = 0.022) and STAT4 rs7574865 T allele (OR 1.43 (95% CI 1.18–1.73), pcorr = 0.0015) were risk factors for SSc.

Table 2 Allele frequency in control, SSc, SSc-ILD and SSc-non ILD cohorts

No significant difference in allele frequency of the tested SNPs was observed between patients with SSc-ILD and controls (Table 2). However, the minor allele of two SNPs in IRF5 were more frequent in SSc patients without ILD than in controls, rs2004640 T allele (OR 1.39 (95% CI 1.11–1.75), pcorr = 0.03) and rs10488631 C allele (OR 1.72 (95% C 1.24–2.39), pcorr = 0.0098). STAT4 SNP rs7574865 T allele was also significantly associated with SSc-non ILD (OR 1.86 (95% CI 1.45–2.38), pcorr = 6.6 × 10−6). STAT4 rs7574865 T allele was significantly less frequent in SSc patients with ILD than those without (OR 0.66 (95% CI 0.51–0.85), pcorr = 0.0084) (Table 2). A logistic regression analysis using an additive model provided comparable results (Supplementary table 1).

Given the higher proportion of females in the patient cohort, 80.9% compared with 52.3% in the control population, we performed a logistic regression with sex as a covariate and observed no change to the significance of the association with any of the variants (Supplementary table 2).

None of the seven tested SNPs were associated with mortality (Table 3). An association was seen between IRF5 rs10488631 and time to decline in FVC by ≥ 10% (OR 1.42 (95% CI 1.08–1.87), p = 0.012) and with time to decline in DLCO by ≥ 15% (OR 1.32 (95% CI 1.02–1.71), p = 0.038), although neither remained significant when Bonferroni correction was applied (pcorr = 0.084 and pcorr = 0.27 respectively). Both of these associations were present on multivariate analysis correcting for age at baseline, gender, smoking status, and disease severity (CPI) (FVC decline by ≥ 10% OR 1.34 (95% CI 1.02–1.85), p = 0.04 and DLCO decline by ≥ 15% OR 1.32 (95% CI 1.01–1.74), p = 0.044), although again neither remained significant following Bonferroni correction (pcorr = 0.28 and pcorr = 0.31, respectively) (Table 3).

Table 3 Relationship between individual SNPs, ILD progression and survival

Discussion

A number of genetic associations with SSc-ILD have been reported. However, conflicting evidence exists, with only a few associations replicated in more than one study [9]. We selected seven SNPs over four genes, for which the most robust evidence of an association with SSc-ILD had been reported, with the aim to test these associations in our UK-based SSc cohort of patients of European descent.

In this study, three of the tested SNPs were significantly associated with SSc as a whole, confirming previous findings [9]. However, we found no evidence that any of the seven SNPs are associated specifically with the presence of ILD. By contrast, we report the novel finding that the STAT4 rs7574865 T allele may be protective against the development of lung fibrosis in SSc patients.

Although well replicated associations between the IRF5 and STAT4 SNPs are reported with SSc as a whole [4,5,6], conflicting results exist for genetic associations specifically with SSc-ILD. A meta-analysis of five European populations found all three IRF5 SNPs to be associated with all of the tested SSc subtypes, including, as we found in our study, no ILD [20], suggesting that the IRF5 association is with SSc as a whole, rather than specifically with ILD. Similarly, although the STAT4 SNP rs7574865 association with SSc-ILD has been reported in both a French [13] and a Han Chinese [7] population, a study of six European populations found no significant association in any of the populations individually, nor in the meta-analysis [21]. In fact, rs7574865 was associated with limited but not diffuse cutaneous skin disease, the phenotype more frequently associated with ILD [21]. Although a meta-analysis of three European populations found CD226 SNP rs763361 to be associated with SSc-ILD [14]; a larger study, comprising patients from seven European cohorts, did not confirm an association with the individual SNP, while reporting an association with a haplotype [22]. The IRAK1 SNP rs1059702, which we did not find to be associated with SSc-ILD nor with SSc as a whole, has been found to be associated with SSc-ILD in two meta-analysis studies of multiple European populations, although not in the individual populations [15, 16].

The observation that the STAT4 variant is significantly less frequent in patients with SSc-ILD compared with SSc patients without ILD is interesting. However, the findings of this study will need to be replicated in independent populations. No individual SNP is currently sufficiently strongly associated with either SSc or SSc-ILD for use in clinical diagnostics. It is possible that in the future, a panel of genetic variations, possibly combined with other biomarkers, could be utilised in the clinic to aid diagnostics or prognostics, but currently no test is sufficiently powered to provide information for an individual patient.

The current study has some limitations. Even though our SSc-ILD cohort was fairly large for a relatively infrequent entity, being favourably comparable with published single cohort studies reporting an association with SSc-ILD [7, 10, 12], it may have been underpowered to detect small genetic effects. This is particularly true for the IRAK1 SNP, a gene found on the X chromosome, such that only female patients (n = 483) and controls (n = 263) were included in this analysis. With these sample sizes, the study had 80% power to detect an association with an OR of at least 1.97 at pcorr < 0.05, for the IRAK1 SNP, which has a minor allele frequency of 0.16. SSc-ILD is a complex disease, and it is expected that there will be a number of genetic susceptibility loci, each with modest effect, contributing to increasing the risk of lung fibrosis. We may therefore have been unable to detect associations of small effect size and acknowledge that larger patient sample sizes derived from multicentre collaborations are required to definitively characterise the genetic risk of SSc-ILD. Another limitation is the lack of matching between the patient and the control population. While all patients and controls included in the study were of European descent, we were unable to account for fine-scale population structure as we did not have genome-wide data. Information on age and smoking history of the control population was not available. Although we do not expect age to influence any of the described genetic associations, smoking history could affect our interpretation as we could detect SNP associations with differences in smoking behaviours between cases and controls. As we did not have smoking information for our control population, we have performed a GWAS Catalogue search and have verified that none of the SNPs included in this study are associated with smoking behaviour.

The biological pathways involved in the susceptibility to ILD in patients with SSc remain largely unknown. Immunosuppression has been associated with modest improvement in SSc-ILD [23, 24], suggesting that immune mediated pathways are key drivers of lung fibrosis in this disease, and nintedanib, a multityrosine kinase inhibitor, has recently been approved as treatment for SSc-ILD based upon reduction in decline in lung function in the large SENSCIS clinical trial [25]. The genes investigated in this study are involved in the functioning of the immune system, with most having also been associated with other autoimmune diseases. Whether genes encoding for immune pathways are involved in the genetic predisposition to SSc-ILD uniquely, rather than with SSc as a whole, remains to be determined. On the other hand, the genetic risk may lie within genes involved in aberrant wound healing/pro-fibrotic pathways. A progressive fibrotic phenotype despite immunosuppression is observed in a subset of patients with SSc-ILD, and anti-fibrotic treatments currently used in IPF, have recently shown promise also in SSc-ILD [25]. However, despite some similarities with IPF and other idiopathic interstitial pneumonias (IIPs), SSc-ILD appears to be genetically distinct. Loci associated with IIPs tend to be involved with host-defence, epithelial injury/dysfunction and wound healing [26]. The gain-of-function mucin 5B (MUC5B) promoter variant, the most consistent common genetic risk factor for IIPs, is not associated with SSc-ILD [27]. Similarly, other genetic susceptibility loci identified in recent genome-wide association studies in IIPs have not been confirmed in SSc-ILD cohorts [26]. By contrast, IIP associated variants have also been found in rheumatoid arthritis-ILD, and for the MUC5B variant at least, specifically with an underlying usual interstitial pneumonia (UIP) pattern [28]. As SSc-ILD is most frequently characterised by a fibrotic non-specific interstitial pneumonia (NSIP) pattern, one can speculate that the genetic architecture may differ between these two histological patterns. However, there may instead be disease-specific genetic risks for SSc-ILD, which will require an approach focused on the development and progression of ILD in SSc patients.

The majority of previous studies have focused on SSc as a whole, with SSc-ILD investigated as a post hoc sub-analysis. These studies are therefore often underpowered to detect associations with SSc-ILD specifically. Furthermore, few have sought out to detect genetic risk factors for significant ILD outcomes, including lung function decline. With the availability of large cohorts with adequate long-term lung function follow-up, it should be possible to detect specific genetic associations with a progressive fibrotic phenotype and/or potential gene variants associated with increased likelihood of response to anti-inflammatory or anti-fibrotic agents. This study highlights the need for more, adequately powered, studies addressing the specific question of the genetic susceptibility to SSc-ILD. This will require international collaborations aimed at performing hypothesis-free genome-wide association studies specifically targeted at well-defined SSc-ILD cross-sectionally and longitudinally.