Defining genetic risk factors for scleroderma-associated interstitial lung disease

Although several genetic associations with scleroderma (SSc) are defined, very little is known on genetic susceptibility to SSc-associated interstitial lung disease (SSc-ILD). A number of common polymorphisms have been associated with SSc-ILD, but most have not been replicated in separate populations. Four SNPs in IRF5, and one in each of STAT4, CD226 and IRAK1, selected as having been previously the most consistently associated with SSc-ILD, were genotyped in 612 SSc patients, of European descent, of whom 394 had ILD. The control population (n = 503) comprised individuals of European descent from the 1000 Genomes Project. After Bonferroni correction, two of the IRF5 SNPs, rs2004640 (OR (95% CI)1.30 (1.10–1.54), pcorr = 0.015) and rs10488631 (OR 1.48 (1.14–1.92), pcorr = 0.022), and the STAT4 SNP rs7574865 (OR 1.43 (1.18–1.73), pcorr = 0.0015) were significantly associated with SSc compared with controls. However, none of the SNPs were significantly different between patients with SSc-ILD and controls. Two SNPs in IRF5, rs10488631 (OR 1.72 (1.24–2.39), pcorr = 0.0098), and rs2004640 (OR 1.39 (1.11–1.75), pcorr = 0.03), showed a significant difference in allele frequency between controls and patients without ILD, as did STAT4 rs7574865 (OR 1.86 (1.45–2.38), pcorr = 6.6 × 10−6). A significant difference between SSc with and without ILD was only observed for STAT4 rs7574865, being less frequent in patients with ILD (OR 0.66 (0.51–0.85), pcorr = 0.0084). In conclusion, IRF5 rs2004640 and rs10488631, and STAT4 rs7574865 were significantly associated with SSc as a whole. Only STAT4 rs7574865 showed a significant difference in allele frequency in SSc-ILD, with the T allele being protective against ILD. Key points • We confirm the associations of the IRF5 SNPs rs2004640 and rs10488631, and the STAT4 SNP rs7574865, with SSc as a whole. • None of the tested SNPs were risk factors for SSc-ILD specifically. • The STAT4 rs7574865 T allele was protective against the development of lung fibrosis in SSc patients. • Further work is required to understand the genetic basis of lung fibrosis in association with scleroderma. Electronic supplementary material The online version of this article (10.1007/s10067-019-04922-6) contains supplementary material, which is available to authorized users.


Introduction
Scleroderma (SSc) is a chronic connective tissue disease characterised by fibrosis of the skin and internal organs, vascular damage and immune dysregulation [1]. SSc is characterised by marked heterogeneity of clinical manifestations and disease course, and its pathogenesis remains poorly understood. SSc carries one of the highest mortality rates among connective tissue diseases, with interstitial lung disease (ILD) being the leading cause of death [2]. Although the majority of SSc-ILD patients have a relatively mild and/or stable lung disease, a substantial minority have progressive lung fibrosis [2]. The molecular pathways which underlie development and progression of SSc-ILD are currently unknown, but are likely to be driven by an interaction between predisposing genetic factors and environmental triggers. Identification of the genetic determinants of lung fibrosis in SSc could improve understanding of pivotal molecular pathways, potentially leading to better prognostic and therapeutic tools for SSc-ILD.
Evidence for a genetic predisposition to SSc as a whole includes a higher prevalence in first degree relatives, and variation in prevalence among different ethnic groups. Twin studies have revealed a strong genetic influence on antinuclear antibody status, in turn linked with internal organ involvement, with ATA antibodies strongly associated with development of SSc-ILD, and ACA antibodies protective for ILD. A number of genes have been consistently associated with SSc as a whole. Similarly to other autoimmune diseases, there is a strong effect of the HLA (human leukocyte antigen) region, mainly with specific autoantibodies [3]. Immune responserelated genes are among the most consistently replicated non-HLA associations, including interferon regulatory factor 5 (IRF5) [4,5], signal transducer and activator of transcription 4 (STAT4) [6,7], and cell receptor CD3ζ (CD247) [8].
A smaller number of studies have looked at SSc-ILD specifically, although conflicting evidence is reported, with only a few associations replicated in more than one study [9]. Genetic associations reported as specific to SSc-ILD in more than one cohort, including IRF5, STAT4, DNAX accessory molecule 1 (CD226) and interleukin-1 receptor-associated kinase-1 (IRAK1). A number of SNPs in IRF5 have been associated with SSc-ILD, including in a French [10,11] and a Han Chinese [12] population. IRF5 SNP rs4728142 was associated with improved survival in SSc [4]. A SNP in STAT4, rs7574865, has also been associated with SSc-ILD in a French [13] and a Han Chinese [7] population. CD226 SNP rs763361 was significantly associated with SSc-ILD in a meta-analysis study of three European populations, with a trend towards significance when each population was analysed separately [14]. The minor allele of rs1059702 in IRAK1, on the X chromosome, results in increased NFκ-B activity. Two studies performing meta-analysis on European populations have been reported. In both studies, which both comprised meta-analysis of three populations, rs1059702 was associated with SSc-ILD [15,16].
In this study, we focused on genes previously reported as risk factors for SSc-ILD in more than one population, and selected the SNPs which had been most consistently associated [9]. We also sought to determine their association with mortality and ILD progression.

Study populations
DNA samples were collected from consecutive, unrelated SSc patients attending clinics at the Royal Brompton and Royal Free Hospitals, London. The diagnoses were made from welldefined criteria for SSc [17]. Only individuals of European descent were included. The control population (n = 503) comprised individuals of European descent from the publicly available 1000 Genomes Project [18].

Clinical assessment
ILD was defined as the presence of fibrosis on chest imaging (chest X-ray or HRCT) and/or a forced vital capacity (FVC) < 75%. Pulmonary function tests (expressed as percent predicted) from the time of first presentation at the Royal Brompton Hospital were available for 578 patients. As a marker of ILD severity which adjusts for the extent of emphysema, the composite physiological index (CPI) was calculated as CPI = 91.0 − (0.65 × DLCO% predicted) − (0.53 × FVC% predicted) + (0.34 × FEV1% predicted) [19]. Time to decline was quantified using serial pulmonary functional indices starting from first visit. Significant functional deterioration was defined as a decline (quantified as percentage change from baseline) of ≥ 10% in FVC and/or of ≥ 15% in DLCO. To allow for possible response to treatment or spontaneous fluctuations, time to irreversible decline was used, defined as time to first significant change observed on at least two consecutive occasions. Data at a sufficient number of time points was available to calculate time to decline in 374 patients. All-cause mortality was also analysed (n = 553).

Genotyping
The gene locations of the seven SNPs selected for testing are shown in Fig. 1. DNA was extracted from blood using Gentra PureGene DNA kits (Qiagen). Genotyping was carried out according to manufacturer's instructions using a commercially available TaqMan® assay and TaqMan® universal PCR master mix, no AmpErase® UNG (Applied Biosystems), on a Rotor-Gene 6000 real-time PCR machine (Qiagen). Quality control and genotype determination were performed using the Rotor Gene 6000 Series Software 1.7 (Corbett Research).

Statistical analysis
To test for deviation from Hardy-Weinberg equilibrium (HWE), genotype frequencies were determined by direct counting, and the chi square statistic or Fisher's exact test were used as appropriate. Chi square analyses for association were carried out in Unphased v 3.1. To assess the most appropriate genetic model for significant signals, logistic regression analysis was applied in STATA v 15. Only female patients (n = 483) and controls (n = 263) were included in the analysis of the X chromosome IRAK1 SNP. Bonferroni correction was applied to correct for multiple testing of seven SNPs. A corrected p value (p corr ) < 0.05 was considered significant. The current study had 80% power to detect an association with SSc as a whole with an OR of at least 1.5 for a SNP with a minor allele frequency of 0.5 at p corr < 0.05, and 80% power to detect an association with SSc-ILD with an OR of at least 1.6. Cox proportional hazards analysis was used to evaluate time to decline in FVC, time to decline in DLCO, and mortality, as implemented in the Stata v 15.1 (Computing Resource Centre).

Results
A total of 612 patients were included in the study, of whom 394 had ILD. Patient demographic and clinical characteristics are shown in Table 1. The genotyping success rate for all seven SNPs was ≥ 97.7%. All seven SNPs conformed to Hardy-Weinberg equilibrium in the control population.
Given the higher proportion of females in the patient cohort, 80.9% compared with 52.3% in the control population, we performed a logistic regression with sex as a covariate and observed no change to the significance of the association with any of the variants (Supplementary table 2).

Discussion
A number of genetic associations with SSc-ILD have been reported. However, conflicting evidence exists, with only a few associations replicated in more than one study [9]. We selected seven SNPs over four genes, for which the most robust evidence of an association with SSc-ILD had been reported, with the aim to test these associations in our UK-based SSc cohort of patients of European descent.
In this study, three of the tested SNPs were significantly associated with SSc as a whole, confirming previous findings [9]. However, we found no evidence that any of the seven SNPs are associated specifically with the presence of ILD. By contrast, we report the novel finding that the STAT4 rs7574865 T allele may be protective against the development of lung fibrosis in SSc patients.
Although well replicated associations between the IRF5 and STAT4 SNPs are reported with SSc as a whole [4][5][6], conflicting results exist for genetic associations specifically with SSc-ILD. A meta-analysis of five European populations found all three IRF5 SNPs to be associated with all of the tested SSc subtypes, including, as we found in our study, no ILD [20], suggesting that the IRF5 association is with SSc as a whole, rather than specifically with ILD. Similarly, although the STAT4 SNP rs7574865 association with SSc-ILD has been reported in both a French [13] and a Han Chinese [7] population, a study of six European populations found no significant association in any of the populations individually, nor in the meta-analysis [21]. In fact, rs7574865 was associated with limited but not diffuse cutaneous skin disease, the phenotype more frequently associated with ILD [21]. Although a meta-analysis of three European populations found CD226 SNP rs763361 to be associated with SSc-ILD [14]; a larger study, comprising patients from seven European cohorts, did not confirm an association with the individual SNP, while reporting an association with a haplotype [22]. The IRAK1 SNP rs1059702, which we did not find to be associated with SSc-ILD nor with SSc as a whole, has been found to be associated with SSc-ILD in two meta-analysis studies of multiple European populations, although not in the individual populations [15,16].
The observation that the STAT4 variant is significantly less frequent in patients with SSc-ILD compared with SSc patients without ILD is interesting. However, the findings of this study will need to be replicated in independent populations. No individual SNP is currently sufficiently strongly associated with either SSc or SSc-ILD for use in clinical diagnostics. It is possible that in the future, a panel of genetic variations, possibly combined with other biomarkers, could be utilised in the clinic to aid diagnostics or prognostics, but currently no test is sufficiently powered to provide information for an individual patient.
The current study has some limitations. Even though our SSc-ILD cohort was fairly large for a relatively infrequent entity, being favourably comparable with published single cohort studies reporting an association with SSc-ILD [7,10,12], it may have been underpowered to detect small genetic effects. This is particularly true for the IRAK1 SNP, a gene found on the X chromosome, such that only female patients (n = 483) and controls (n = 263) were included in this analysis. With these sample sizes, the study had 80% power to detect an association with an OR of at least 1.97 at p corr < 0.05, for the IRAK1 SNP, which has a minor allele frequency of 0.16. SSc-ILD is a complex disease, and it is expected that there will be a number of genetic susceptibility loci, each with modest effect, contributing to increasing the risk of lung fibrosis. We may therefore have been unable to detect associations of small effect size and acknowledge that larger patient sample sizes derived from multicentre collaborations are required to definitively characterise the genetic risk of SSc-ILD. Another limitation is the lack of matching between the patient and the control population. While all patients and controls included in the study were of European descent, we were unable to account for fine-scale population structure as we did not have genome-wide data. Information on age and smoking history of the control population was not available. Although we do not expect age to influence any of the described genetic associations, smoking history could affect our interpretation as we could detect SNP associations with differences in smoking behaviours between cases and controls. As we did not have smoking information for our control population, we have performed a GWAS Catalogue search and have verified that none of the SNPs included in this study are associated with smoking behaviour. The biological pathways involved in the susceptibility to ILD in patients with SSc remain largely unknown. Immunosuppression has been associated with modest improvement in SSc-ILD [23,24], suggesting that immune mediated pathways are key drivers of lung fibrosis in this disease, and nintedanib, a multityrosine kinase inhibitor, has recently been approved as treatment for SSc-ILD based upon reduction in decline in lung function in the large SENSCIS clinical trial [25]. The genes investigated in this study are involved in the functioning of the immune system, with most having also been associated with other autoimmune diseases. Whether genes encoding for immune pathways are involved in the genetic predisposition to SSc-ILD uniquely, rather than with SSc as a whole, remains to be determined. On the other hand, the genetic risk may lie within genes involved in aberrant wound healing/ pro-fibrotic pathways. A progressive fibrotic phenotype despite immunosuppression is observed in a subset of patients with SSc-ILD, and anti-fibrotic treatments currently used in IPF, have recently shown promise also in SSc-ILD [25]. However, despite some similarities with IPF and other idiopathic interstitial pneumonias (IIPs), SSc-ILD appears to be genetically distinct. Loci associated with IIPs tend to be involved with hostdefence, epithelial injury/dysfunction and wound healing [26]. The gain-of-function mucin 5B (MUC5B) promoter variant, the most consistent common genetic risk factor for IIPs, is not associated with SSc-ILD [27]. Similarly, other genetic susceptibility loci identified in recent genome-wide association studies in IIPs have not been confirmed in SSc-ILD cohorts [26]. By contrast, IIP associated variants have also been found in rheumatoid arthritis-ILD, and for the MUC5B variant at least, specifically with an underlying usual interstitial pneumonia (UIP) pattern [28]. As SSc-ILD is most frequently characterised by a fibrotic non-specific interstitial pneumonia (NSIP) pattern, one can speculate that the genetic architecture may differ between these two histological patterns. However, there may instead be disease-specific genetic risks for SSc-ILD, which will require an approach focused on the development and progression of ILD in SSc patients. Table 2 Allele frequency in control, SSc, SSc-ILD and SSc-non ILD cohorts The majority of previous studies have focused on SSc as a whole, with SSc-ILD investigated as a post hoc sub-analysis. These studies are therefore often underpowered to detect associations with SSc-ILD specifically. Furthermore, few have sought out to detect genetic risk factors for significant ILD outcomes, including lung function decline. With the availability of large cohorts with adequate longterm lung function follow-up, it should be possible to detect specific genetic associations with a progressive fibrotic phenotype and/or potential gene variants associated with increased likelihood of response to anti-inflammatory or anti-fibrotic agents. This study highlights the need for more, adequately powered, studies addressing the specific question of the genetic susceptibility to SSc-ILD. This will require international collaborations aimed at performing hypothesis-free genome-wide association studies specifically targeted at well-defined SSc-ILD cross-sectionally and longitudinally.

Compliance with ethical standards
Disclosures None.
Ethical standards All participants gave written informed consent, and the Ethics Committees of the Royal Brompton Hospital and of the Royal Free Hospital gave authorisation for the study (REC 13/LO/0857).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, SNP single nucleotide polymorphism, CI confidence interval, FVC forced vital capacity, DLCO diffusing capacity of the lung for carbon monoxide, IRF5 interferon regulatory factor 5, CD226 DNAX accessory molecule 1, STAT4 signal transducer and activator of transcription 4, IRAK1 interleukin-1 receptor-associated kinase-1 adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.