Hypersensitivity pneumonitis (HP) is a subtype of interstitial lung disease (ILD) that can be inflammatory and/or fibrotic of nature. HP is typically caused by the inhalation of an overt or occult antigen resulting in an immune-mediated reaction affecting the lung parenchyma and small airways in susceptible individuals [1]. Multiple studies have shown that the disease course, treatment response, and prognosis is determined by the identification and eradication of the eliciting antigen, and by the presence of fibrosis [2,3,4,5,6]. Non-fibrotic HP may be self-limiting upon removal of the offensive antigen and has a better prognosis compared to fibrotic HP (fHP), where some patients will show an irreversible progressive fibrosing phenotype similar to idiopathic pulmonary fibrosis (IPF) [2]. Risk factors for a progressive fibrosing phenotype are not well defined but include continued exposure to the eliciting antigen, extent of fibrosis, and the type of fibrosis e.g., presence of honeycombing on high resolution computed tomography (HRCT) and in histopathology is associated to a worse prognosis [3,4,5]. The incidence of HP has been shown to increase especially in patients older than 55 years [3]. Due to the heterogeneity of HP presentation and progression, it could be valuable to identify if specific phenotypes are related to disease course and outcome.

Comorbidities are presumably common in most fibrotic ILDs [7], but have been best studied in IPF, where the number and specific type of comorbidities are associated with a worse outcome and health-related quality of life [8,9,10]. Similarly in HP, the burden of comorbidities is high and comorbidities are associated to mortality [11]. In a recent study, the most common comorbidities were arterial hypertension, gastro-esophageal reflux disease, diabetes, and coronary heart disease. However, there was no association between the absolute number of comorbidities and mortality. Pulmonary hypertension, diastolic dysfunction, and cerebrovascular disease were among the comorbidities most commonly encountered in non-survivors [11]. Cluster analysis has become an interesting strategy to determine individual groups or clusters of patients with homogenous presentation with respect to clinical characteristics, comorbidities, and prognosis [12]. Previous analysis in IPF identified four specific clusters of comorbidities that may represent specific phenotypes in IPF with different outcomes [8]. It is not known if specific combinations or clusters of comorbidities can help predict disease outcome in fHP and therefore, the aim of the present study was to identify clusters of comorbidities in patients with fHP and examine their prognosis.


Study subjects

Patients diagnosed with fHP between June 1995 and November 2017 at the tertiary referral center for ILD, Heidelberg, Germany were included in the study. All clinical diagnoses were based on multidisciplinary team discussions including pulmonologists, radiologists and pathologists experienced in ILD. HRCT scans were available for all patients, and histopathological samples were available for most patients (79%). The diagnostic process and associations between individual comorbidities and survival has been described in a previous paper [11].

Study measures

Data regarding comorbidities, age, gender, smoking history, pulmonary function tests (forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), FVC/FEV1 ratio, total lung capacity (TLC), and diffusing capacity of the lung for carbon monoxide (DLCO)), long-term oxygen therapy (LTOT), 6-min walk test distance (6MWD), lymphocyte count in bronchoalveolar lavage (BAL), and respiratory hospitalizations were extracted from the database.

Registration of comorbidities was based on patient interviews, a standardized questionnaire for ILD, medical records and current medications[13]. The following comorbidities were assessed: airway obstruction, pulmonary hypertension, obstructive sleep apnea, arterial hypertension, ischemic heart disease, heart failure, heart valve disease, atrial fibrillation, other arrythmias, cerebrovascular disease, vascular disease, thromboembolic disorders, peripheral artery disease, chronic kidney disease, diabetes mellitus, osteoporosis, hypothyroidism, obesity, lung cancer, other cancers, anemia, liver disease, gastro-esophageal reflux disease, depression and anxiety.

Statistical analyses

Discrete variables are presented as frequencies, and continuous variables are presented as median with interquartile range (IQR) or mean with standard deviation (SD).

Clusters of comorbidities were determined by self-organizing maps, also known as Kohonen maps, using Viscovery SOMine 7.2 (Viscovery Software GmbH, Vienna, Austria). This technique uses non-parametric regression analyses to transform multidimensional data into lower dimensional reflections. Data were analyzed for similarity using the SOM-Ward Cluster algorithm, and homogenous groups were visualized in attribute maps [11]. In these maps, the average frequency of each comorbidity was indicated by a fitted color scale. For comparison of each cluster against the two other clusters in combination, continuous data with a normal distribution were analyzed by the two-sided t-test with 95% confidence and otherwise by the Wilcoxon Mann–Whitney U test. Binary data were compared using the chi-squared test or Fisher’s exact test as appropriate.

The gender, age, physiology index for ILD (ILD-GAP) was calculated based on gender, age, and pulmonary function and adjusted for HP [14].

Kaplan–Meier curves, log-rank test, and univariate and multivariate Cox regression were used for mortality analyses based on all-cause mortality. Specific cause of mortality was compared between the clusters using Fisher’s exact test. Cox regression analyses were adjusted for GAP-ILD index and pack years. Changes in FVC% and DLCO% predicted during follow-up in the three comorbidity clusters were estimated by linear mixed effects models. Data were analyzed using STATA 14.2 (StataCorp, College Station, Texas).


Characteristics of the comorbidity clusters

The study population comprised 211 patients with fHP (Table 1). The cohort had a slight majority of male patients (53.6%). The mean (SD) age at diagnosis was 63.0 (13.3) years. FVC and DLCO were reduced. Half of the cohort consisted of never smokers. Median (IQR) follow-up time was 1.8 (0.7–3.9) years.

Table 1 Baseline characteristics of the entire cohort and the three clusters

Three clusters with distinct comorbidity profiles were identified (Tables 1, 2, Fig. 1). Patients in the first cluster were younger, had slightly higher TLC and DLCO and longer 6MWD. Fewer patients received LTOT, and lymphocyte count in BAL was higher compared to the two other clusters in combination. Patients in cluster 1 had fewer total number of comorbidities, and a wide range of specific comorbidities were less prevalent.

Table 2 Prevalence of comorbidities in the entire cohort and the three clusters
Fig. 1
figure 1

Attribute self-organizing maps for each comorbidity and clusters borders

The first map shows the location of the three clusters. The location of each patient on the map and the clusters borders (marked with black lines) are constant. The presence or absence of an individual comorbidity (one map per comorbidity) for patients in a given part of the map is indicated by a fitted color scale (Red: high frequency; Green: moderate frequency; Blue: low frequency). GERD: gastro-esophageal reflux disease; C1: cluster 1; C2: cluster 2; C3: cluster 3.

In cluster 2, patients were older with a majority of males, a larger proportion on LTOT, and more often with higher ILD-GAP scores (i.e., more severe disease besides male gender and older age). The total number of comorbidities was highest in this clusters, and patients suffered more frequently from cardiovascular diseases, diabetes, and renal insufficiency.

Patients in the third cluster were predominantly women with shorter 6MWD. They had more comorbidities than the two other clusters in combination. Cardiac diseases were less prevalent, whereas hypothyroidism, osteoporosis and depression were more frequent.

Longitudinal analyses

Mortality and changes in pulmonary function are presented in Table 3 and Fig. 2. The best survival was observed in cluster 3, whereas patients in cluster 2 had the worst prognosis, also after adjustment for GAP-ILD index and pack years. No significant difference in cause of death was observed in the three clusters (p = 0.17), potentially because of the low numbers.

Table 3 Mortality analyses and changes in pulmonary function during follow-up
Fig. 2
figure 2

Survival in the three clusters

A small decline in FVC% predicted per year was observed in all three clusters, but no difference between clusters was found (p = 0.94). In addition, patients in cluster 2 had a small decline in DLCO% predicted per year, but no difference in slopes between clusters could be proven (p = 0.49). Likewise, 6MWD declined in cluster 2 and 3, but the three slopes were not significantly different (p = 0.91).

Patients in cluster 1 had fewer respiratory hospitalizations than patients in the rest of the cohort (p = 0.021), whereas patients in cluster 2 had more admission days due to HP exacerbations compared to the other two clusters (p = 0.036).


In this study, we report the associations between comorbidities in patients with fHP using an unsupervised machine learning technique to identify clusters of comorbidities which could represent distinct phenotypes of fHP with diverging prognoses. We identified three new clusters of patients with fHP based on specific comorbidity profiles and found a higher mortality and more respiratory hospitalizations among patients in cluster 2, but no difference between the three clusters in pulmonary function or exercise capacity trajectories was shown.

The worse prognosis for patients in cluster 2 could be related to their comorbidity profile with more cardiovascular diseases. The cluster was dominated by older males and their survival was worse even after adjustment for the higher ILD-GAP index and pack years. Also, the increased rate of long-term oxygen therapy in this cluster possibly indicates disease progression. They might also have a more fibrotic pathology compared to patients in cluster 1, who had the highest lymphocyte count in BAL, which could indicate a more mixed inflammatory and fibrotic pathology [15, 16].

Cluster analysis is a well-established method which can be used to analyze associations between variables in complex data sets. In the field of ILD, cluster analyses have only been used in a limited number of studies, and we chose this novel approach to further explore the complex relationship between multiple comorbidities and their prognostic impact on patients with HP. The advantage of this unsupervised method is that no pre-defined associations are incorporated into the model allowing for more unbiased results. Furthermore, clusters of comorbidities could identify distinct phenotypes in fHP with different treatable traits. This approach brings focus on diagnosis and treatment of specific comorbidities in patients from different clusters, thus potentially improving prognosis and quality of life.

Cluster analyses including comorbidities or solely based on comorbidities have been reported in other ILD cohorts. Wong et al. identified two clusters in patients with fHP based on a combination of comorbidities, age, gender, and smoking pack-years, but primarily comorbidities and gender distinguished the clusters [17]. Comparable to our results, one cluster was dominated by males and the other by females with gastro-esophageal reflux disease and obstructive sleep apnea, and no difference in disease progression was seen. However, similar survival was seen in both clusters, whereas we found increased mortality in cluster 2. This difference could be caused by dissimilar cluster algorithms and registration of comorbidities as in the study mentioned, only comorbidities included in the Charlson comorbidity index were used, thus limiting the comorbidities available for cluster analyses. This approach excluded arterial and pulmonary hypertension, which are important prognostic factors in fHP [11, 18]. These comorbidities were more prevalent in cluster 2 in the present study and could to some extent explain the higher mortality in this cluster, thus emphasizing the importance of the choice of comorbidities for analysis based on the risk profile of a specific disease instead of a more generalized approach.

In patients with IPF, four clusters of comorbidities have been reported[8] and in unclassifiable ILD, three comorbidity clusters were identified [19]. Similar to our results, a cluster of patients with few comorbidities and a cluster dominated by cardiovascular diseases and male patients were found in both cohorts. This supports the robustness of these clusters. Emphysema was prevalent in the last clusters in IPF and unclassifiable ILD, but emphysema was not registered in the present study. The proportion of smokers and the number of pack years in IPF and unclassifiable ILD was much higher than in our cohort and thus, emphysema is probably less prevalent in patients with fHP as in the present study. On the other hand, a larger proportion of women with fHP compared to the two other types of ILD led to a separate comorbidity profile dominated by conditions more prevalent in women. In contrast to our findings, no difference in mortality was found in the two other studies, which could be explained by different follow-up times and mortality rates, as the impact of comorbidities on mortality has been shown in IPF [9, 20].

A strength of this study is the structured registration of comorbidities. Furthermore, all patients were diagnosed in a specialized ILD center suggesting a diagnosis with moderate to high confidence [21]. However, a limitation is the risk of missing or misclassification and underreporting of comorbidities in this retrospective study, which could influence the results of the study. Furthermore, treatment of HP and comorbidities was not accounted for in this study, and these interventions might affect both disease course and prognosis. Cluster analyses are well suited for investigation of relationships in large, complex data sets that would not otherwise have been evident. Still, such analyses are exploratory and should be confirmed in future studies.


We identified three clusters with distinct comorbidities which could represent phenotypes in fHP not previously recognized. Mortality and respiratory hospitalizations were higher in the cluster dominated by cardiovascular diseases, but no differences in pulmonary function or exercise capacity trajectories were found. These clusters could reflect phenotypes in fHP with different treatable traits. This approach brings focus on diagnosis and treatment of specific comorbidities in patients from different clusters, thus potentially improving prognosis and quality of life.