Introduction

Interstitial lung disease (ILD) is a collection of diseases that lead to varying degrees of inflammation and fibrosis of the pulmonary parenchyma [1]. Common fibrotic ILDs include idiopathic pulmonary fibrosis (IPF), connective tissue disease-associated ILD (CTD-ILD), fibrotic hypersensitivity pneumonitis (HP), and unclassifiable ILD. Comorbidities are prevalent in patients with ILD and occur at different frequencies among the various ILD subtypes [2]. The most common comorbidities found in patients with fibrotic ILD include gastroesophageal reflux disease, chronic obstructive pulmonary disease, and diabetes [2].

Comorbidities have been associated with reduced quality of life, lower functional capacity, increased hospitalization rates, and mortality in patients with ILD [3,4,5,6]. However, it is unclear whether comorbidities lead to negative outcomes because of their direct health effects or if they could also influence ILD progression. This latter possibility is suggested by the variable impact of comorbidities on all-cause mortality across different ILD subtypes [7], but with limited evidence currently available from large prospective cohorts. To address this uncertainty, we sought to identify and characterize clusters of patients with fibrotic ILD based on the presence of similar comorbidities. We further investigated the association between these clusters and the rate of lung function decline and overall mortality.

Methods

Study population and overview

The CAnadian REgistry for Pulmonary Fibrosis (CARE-PF) is a prospective cohort of patients with all subtypes of fibrotic ILD who are over age 18, are able to provide informed consent, and can complete questionnaires in English or French [8]. At the time of data export, there were eight participating centers that came from five of the six most populous Canadian provinces. Patients with IPF, CTD-ILD, fibrotic HP, and unclassifiable ILD were included, representing the four most common fibrotic ILD subtypes in CARE-PF [2]. Diagnoses were made at ILD centres with access to multidisciplinary teams comprised of ILD clinicians, chest radiologists, and lung pathologists. There were no exclusion criteria for this sub-study. Ethics approval for this project was obtained at all CARE-PF centres (coordinating centre: University of British Columbia #H19-01989).

Measurements

Baseline demographic data and smoking history were obtained from patient-completed questionnaires and chart reviews. Patient-completed surveys and clinical records from the date of the first ILD clinic visit were used to record comorbidities as defined by the Charlson Comorbidity Index (CCI), which is associated with mortality in multiple patient populations and diseases [9, 10]. The presence or absence of the 19 major CCI comorbidities were each verified by trained research personnel, rather than using International Classification of Diseases (ICD) codes in order to minimize diagnostic inaccuracies from coding errors. Patients attending the ILD clinic were typically seen every 3–6 months with pulmonary function tests (PFTs) performed at each visit.

Outcomes

The primary outcome was the annual change in forced vital capacity (FVC) %-predicted, which was calculated based on serial PFTs performed from the time of initial consultation in the ILD clinic using established standards at all sites [11]. Time to death or lung transplantation was the secondary outcome, which was defined from the time of diagnosis.

Statistical analyses

Within each fibrotic ILD subtype, hierarchical agglomerative clustering [12] was conducted to identify clusters of patients based on the presence of similar comorbidities, age, sex, and smoking pack-years. These variables were standardized to a unit interval ranging from 0 to 1. The hierarchical agglomerative clustering method began with each patient within his or her own cluster. Two clusters were merged if their combination resulted in a new cluster with the lowest distance (intracluster variation) between observations [12]. The distance between clusters was then re-calculated and the next pair of clusters were merged. This process was repeated until a single cluster that contained all patients remained. The final number of clusters for each ILD subtype was not pre-specified, with the optimal number of clusters determined using the average silhouette method [13]. This method measures how well observations are clustered by estimating the average distance (width) between clusters. An average silhouette width is calculated for each cluster option and ranges between -1 and 1, with 1 representing very well clustered observations, 0 meaning observations lie between two clusters, and negative values representing observations that are likely placed in the wrong cluster [14]. A sensitivity analysis was conducted using the gap statistic method to ensure that the optimal number of clusters was similar using different techniques [15]. The gap statistic is the difference between the intracluster variation and the maximum variation between observations when there are no clusters (reference). The gap statistic is calculated for each cluster option. The optimal number of clusters will have a larger gap statistic, which means that the intracluster variation is lower and further away from the reference.

The annual rate of change in FVC was compared between clusters within each ILD subtype using linear mixed effect models. The models were adjusted for age, sex, body mass index (BMI), and smoking pack-years since clustering may not fully address confounding by these covariates. Comparisons between all combinations of clusters were performed. The models included a random intercept and random slope to account for between-patient variability over and beyond the variability induced by included covariates. Kaplan Meier curves were used to assess differences in survival between clusters. All statistical analyses were performed using R (version 3.5.1) [16]. A two-sided p-value < 0.05 was considered statistically significant.

Results

The study cohort had a total of 1,480 patients (Table 1), including 330 with IPF, 672 with CTD-ILD, 135 with fibrotic HP, and 343 with unclassifiable ILD. Median follow-up was 3.3 years (IQR 1.9–5.5 years), and was similar across ILD subtypes. There were a total of 147 deaths, with 57 in patients who had IPF, 47 in CTD-ILD, 10 in fibrotic HP, and 33 in unclassifiable ILD. These populations had a total of 10,005 PFTs, with 1843, 5288, 963, and 1911 respectively in each ILD subtype. The median number of PFTs per patient was 5 (IQR 3–9). On average, the IPF cohort was older, had a higher proportion of males, and had a greater number of smoking pack-years compared to the other ILD subtypes. Baseline disease severity was mild-to-moderate for all ILD subtypes, with mean FVC ranging from 75 to 81%-predicted and diffusing capacity of the lung for carbon monoxide from 60 to 65%-predicted.

Table 1 Baseline patient demographics

Frequency of comorbidities

The frequency of comorbidities in each ILD subtype is shown in Fig. 1. The most prevalent comorbidities among all ILD subtypes were diabetes, gastroesophageal reflux disease (GERD), and obstructive sleep apnea (OSA). Myocardial infarction occurred in 7% of all patients, with the highest prevalence in IPF (39/330, 12%). A history of malignancy occurred in 7.5% of all patients and was the highest for patients with unclassifiable ILD (31/343, 9%). Congestive heart failure (CHF), cerebrovascular disease, liver disease, and renal disease occurred in < 5% of patients for all ILD subtypes.

Fig. 1
figure 1

Prevalence of baseline comorbidities among ILD subtypes. Abbreviations: MI myocardial infarction; CHF congestive heart failure; CVD cerebrovascular disease; GERD gastroesophageal reflux disease; OSA obstructive sleep apnea

Baseline characteristics of each cluster

Using hierarchical agglomerative clustering, the optimal number of clusters for IPF, CTD-ILD, fibrotic HP, and unclassifiable ILD were 4, 4, 2, and 3, respectively (Fig. 2). Sensitivity analysis for the optimal number of clusters was similar using the gap statistic method. Mean age and smoking pack-years were similar among clusters within each ILD subtype (Additional file 1: Table S1), while sex and the presence of specific comorbidities varied and were the main features that distinguished clusters (Table 2). The distribution of major CTD subtypes among the CTD clusters is shown in Additional file 1: Table S2.

Fig. 2
figure 2

Cluster composition among different ILD subtypes. Clusters are represented by different color bars, with IPF, CTD-ILD, fibrotic HP, and unclassifiable ILD having 4, 4, 2, and 3 clusters each, respectively. The x-axis labels show the total number of patients with a given feature within each ILD subtype. The percentage of females and the percentage of patients with each comorbidity is shown for each cluster on the y-axis. For example, of the 39 patients with IPF who had a MI, 15% were in cluster 2 and 85% were in cluster 3. *Denotes patients who did not have any comorbidities of interest. Abbreviations: MI myocardial infarction; CHF congestive heart failure; CVD cerebrovascular disease; GERD gastroesophageal reflux disease; OSA obstructive sleep apnea

Table 2 Rate of FVC decline between ILD clusters

Change in FVC over time across clusters

Baseline FVC was similar among all clusters within each ILD subtype with the exception that patients in HP cluster 1 (predominantly males) had a 26% higher absolute baseline FVC %-predicted compared to HP cluster 2 (predominantly females). Patients with IPF had the greatest rate of FVC decline compared to other ILD subtypes (Fig. 3), after adjusting for prespecified covariates. Within each ILD subtype, the only significant difference across clusters in rate of FVC change was between IPF clusters 1 and 4. The rate of FVC decline in IPF cluster 4 (males with OSA) was 11.9% per year (95%CI 8.5–15.3) compared to IPF cluster 1 (females without comorbidities), which was 8.1% per year (95% CI 2.7–13.6; p = 0.03). The mean FVC decline for IPF clusters 2 and 3 was 8.8% and 9% annually. There were no significant differences in the rate of FVC change across clusters in CTD-ILD, fibrotic HP, or unclassifiable ILD.

Fig. 3
figure 3

Rate of FVC decline over 2 years from the time of ILD diagnosis by cluster. IPF cluster 4 had significantly greater rate of lung function decline compared to cluster 1 (p = 0.03). There was no statistically significant difference in rate of lung function decline between clusters in other ILD subtypes. Abbreviations: FVC forced vital capacity

Differences in overall survival across clusters

Patients with IPF had the highest mortality compared to the other ILD subtypes. In IPF, females without comorbidities (cluster 1) had lower mortality compared to males with no comorbidities (cluster 2), GERD or history of MI (cluster 3), and OSA (cluster 4), with over 90% of patients alive at 8 years after diagnosis (p = 0.005 for IPF cluster 1 vs cluster 2, and p = 0.007 for IPF cluster 1 vs clusters 3 or 4). The lowest probability of survival occurred in IPF cluster 4 (males with OSA) with less than 30% of patients still alive at 8 years after diagnosis. There were no differences in survival among the clusters in CTD-ILD, fibrotic HP, and unclassifiable ILD (Fig. 4).

Fig. 4
figure 4

Survival by clusters among ILD subtypes. IPF cluster 1 had significantly higher probability of survival over 8 years compared to the other clusters (p < 0.007). There was no significant difference in survival between clusters in the other ILD subtypes

Discussion

In this study, we used an unsupervised approach to identify clusters based on age, sex, smoking pack-years, and comorbidities in an attempt to identify distinct phenotypes of fibrotic ILD that may be associated with a poor prognosis. The main differences between clusters were patient sex and comorbidities for all ILD subtypes. Among patients who had IPF, we found that a cluster of patients predominantly characterized by males with OSA had a significantly greater rate of FVC decline and worse survival compared to females without comorbidities. There were no significant differences in these outcomes between clusters in other ILD subtypes. These findings suggest the need for further studies exploring potential underlying mechanisms that connect certain comorbidities to ILD progression, and particularly the potential role of OSA.

Cluster analysis is an established approach that has not been widely applied in cohorts of patients with ILD. We used this approach as a novel way to predict outcomes in this population. Cluster analysis offered an unbiased hypothesis-generating approach that allowed the data to identify potentially important relationships between predictor variables, rather than restricting the analysis to preconceived notions held by investigators (e.g., only including interaction terms that investigators believe exist). Through cluster analysis, we were able to explore the complex interplay between several biologic (age, sex, comorbidities) and environmental features (smoking pack-years) that can be used to inform future studies.

Males with OSA had the greatest rate of lung function decline and the worst survival in IPF after adjusting for age, sex, BMI, and smoking pack-years. This finding is similar to a previous study of 92 patients with ILD that showed 65% had at least mild OSA diagnosed by nocturnal polysomnogram, with the presence of OSA similarly being associated with worse progression-free survival [17]. It is unclear whether OSA is a risk factor for IPF or vice versa and the reason for these shared findings requires further investigation. One possibility is that OSA results in hypoxia that disrupts cell and tissue homeostasis, thus resulting in more rapid progression of IPF. A second possibility is that large swings in pleural pressure resulting from obstructive events could stretch alveolar walls and lead to repetitive alveolar epithelial cell injury [18]. These repeated injuries, along with aberrant remodeling of the extracellular matrix, could then lead to the development and progression of interstitial lung disease [19]. Additional studies are needed to confirm and further explore the potential link and physiologic impact between OSA and adverse outcomes in IPF. There have been conflicting data on the impact BMI has on survival in ILD. While some studies have shown an increased risk of acute ILD exacerbations with higher BMI [20], others have shown no association of BMI with survival or even an improved survival in patients with a higher baseline BMI [21, 22]. Therefore, although obesity is a risk factor for OSA [23], it is unclear whether an increased BMI impacts disease progression and survival in patients with ILD.

The other main finding from our study is the minimal difference in outcomes between clusters within each ILD subtype despite previous studies suggesting an association between various comorbidities and reduced survival. Arteriosclerosis, other cardiovascular diseases (e.g., valvular heart disease), malignancy, and GERD have all been associated with reduced survival in IPF [24]. For patients with ILD, the presence of additional comorbidities including renal failure, hypothyroidism, and connective-tissue disorders were also associated with reduced survival [7]. However, our results only identified the associations between males with GERD, history of MI, and OSA with reduced survival in IPF. This may be due to differences in comorbidity prevalence between study cohorts. For example, we did not identify a cluster characterized by diabetes in IPF. However, the prevalence of diabetes in our cohort (14%) was lower compared to other cohorts (33%) [7]. In addition, different methods of diagnosing comorbidities (e.g., diagnosis by clinical evaluation versus ICD codes) and the use of diagnostic thresholds to classify disease in previous studies may contribute to these heterogeneous study findings (e.g., hemoglobin A1c > 6% required to diagnose diabetes). This heterogeneity emphasizes the need for ILD registries to prospectively collect comorbidity data and ideally use the same method of classification in order for studies to be comparable.

This study has several limitations. Although we applied a unique statistical approach to a large prospective cohort, the main findings are exploratory and need to be further studied and externally validated. The ability to analyze big data in growing ILD registries around the world is an important need. We believe cluster analysis will play an important role in ILD research as it allows patterns and relationships to be identified in large data sets, which may not have otherwise been apparent. Our findings support the hypothesis that hypoxia and/or exaggerated swings in intrathoracic pressure related to OSA may be an underlying mechanism for disease initiation, progression, and ultimately death in some patients with ILD. This highlights the need for randomized controlled trials to test potential treatments of OSA in patients with IPF. Our analysis was also limited to the comorbidities available in the Charlson comorbidity index and CARE-PF patient surveys, with the possibility that a more comprehensive and robust assessment for specific comorbidities may have yielded additional findings. In addition, an older patient cohort may have increased comorbidities which affect outcomes. However, if there was a causal link between comorbidities and disease progression, we would have expected our study to show further associations. Regardless, this analysis should be conducted in other patient populations, including those who are older and may have more severe comorbidities. Finally, we were unable to account for varying treatment approaches across different ILD subtypes and patient populations, as well as whether comorbidities were treated or not, given the challenges in identifying on- and off-treatment periods in a diverse real-world population.

Conclusion

In this large prospective multicenter cohort, we show that GERD, diabetes, and OSA are the most prevalent comorbidities across the major fibrotic ILD subtypes. In IPF, males with OSA had significantly greater lung function decline and worse survival compared to females without comorbidities. Although they should be validated, these findings identify important research questions including whether comorbidities influence underlying pro-fibrotic pathways and whether outcomes improve when comorbidities are treated. We hope that the novel approach used in this analysis will support these future studies in an attempt to better understand causes of disease progression in fibrotic ILD and to develop new therapeutic strategies for these patients.