Background

Sarcoidosis is a heterogeneous disease, affecting any organ and with variable natural history [1, 2]. Clinical phenotyping in complex diseases such as sarcoidosis can help define subpopulations with similar clinical/biological characteristics. Most importantly, phenotyping may differentiate disease course, identifying those with worse prognosis requiring long-term treatment and follow-up [3]. Based on previous studies, several characteristics portend worse prognosis in sarcoidosis, including race, Scadding stage, BMI, treatment status and lung function [4,5,6,7]. However, transforming these characteristics into discrete and validated sarcoidosis phenotypes, especially ones with clinical implications for disease status and prognostication, has proved challenging.

Several phenotyping classifications have been proposed in sarcoidosis. Most rely on expert opinion even though this way may introduce bias, which can limit agreement between experts and consistency of application. An example of expert opinion based phenotyping was proposed in Wasfi et al., where a disease severity score was derived from subjective assessments by sarcoidosis experts [8]. A benefit of the Wasfi score is the ease of obtaining inputs at one clinic visit to determine phenotype/severity. A limitation is that it has not been externally validated; however, the severity score was internally validated by an independent panel of international experts within the study. Additionally, several studies have used the Wasfi score as a way to measure sarcoidosis severity [9, 10]. Recently, cluster analysis has been used to determine phenotypes in many complex diseases. Cluster analysis employs multivariate algorithms to organize individuals into subgroups based on similarities [11, 12]. The clustering methodology is considered relatively unbiased since it employs objective statistical methods to group individuals rather than expert opinion; however, selection of input variables is still a subjective process. Schupp et al., Rubio-Rivas et al. and Lhote et al. have used cluster analysis to subgroup organ involvement in sarcoidosis [13,14,15]. However, these phenotypes do not necessarily provide information on disease severity or prognosis and can be difficult to apply in a single clinic visit versus multiple visits over time.

We propose that cluster analysis can be used to identify clinical phenotypes of sarcoidosis including severe and less severe forms of the disease. In this study we use this technique and include clinical variables that have influenced prognosis in previous studies. We will also associate resultant phenotypes with the Wasfi severity score to assess differences in disease severity between clusters. Some of the results of this study have been previously reported in the form of an abstract [16].

Methods

Study population

This was a cross-sectional, retrospective study on sarcoidosis cases seen in the Division of Occupational and Environmental Health Sciences at National Jewish Health (NJH) from 2008 to 2015, enrolled as part of a substudy to an NIH funded genetic study (R01HL11487, manuscript in preparation). All subjects provided written informed consent to participate in this study. The study was approved by the NJH Institutional Review Board (HS 2458).

All sarcoidosis subjects met the American Thoracic Society/European Respiratory Society criteria for the diagnosis of sarcoidosis including tissue biopsy confirmation [2]. Medical charts were reviewed to ensure eligibility and extract clinical data. All subject information was collected at the reference enrollment date, defined as the time of spirometry and chest x-ray, except for treatment as noted below. If only spirometry was available, then that date was used for enrollment.

Gender, race, BMI and smoking status were collected at enrollment. The FVC% predicted (FVCpp), FEV1% predicted (FEV1pp), and FEV1/FVC ratio (%) were included in the analysis. For interpretation of spirometry data, we considered normal to be ≥ 80% FEV1pp and FVCpp and ≥ 70% FEV1/FVC as we did not have lower-limit-of-normal available for all participants [17]. Scadding stages were determined by the interpreting radiologist from chest x-rays closest to enrollment. Biopsy dates were recorded if available and used to determine duration of disease and age at diagnosis.

Organ involvement was determined based on the WASOG Sarcoidosis Organ Assessment Instrument [18]. Our sarcoidologists NH, LAM, SYL, CIR reviewed all cases and assigned sarcoidosis organ involvement for organs that met the “highly probable” and “at least probable” classification outlined in the WASOG instrument. Those presenting with traditional signs of Lofgren’s syndrome were noted.

Treatment was defined as being on non-corticosteroid immunosuppressive therapy including methotrexate, azathioprine, mycophenolate mofetil, leflunomide, infliximab, and adalimumab. Hydroxychloroquine was not considered systemic treatment given its nonspecific indications. Treatment with corticosteroids, i.e., prednisone, was not included since some individuals are placed on steroids at diagnosis without a clinical indication. A dichotomous variable indicating the presence or absence of therapy up to 5 years after the enrollment date was included; this time frame was chosen to approximate those who were ever versus never treated.

Wasfi severity score

The sarcoidosis severity score, adapted from Wasfi et al. [8], was calculated for each individual using the following equation:

$${\text{Severity score}} = {11}.{46} + {3}.{9}\left( {\text{C}} \right) + {2}.{56}\left( {\text{N}} \right) + {1}.{56}\left( {{\text{IS}}} \right) - 0.0{51}\left( {{\text{FVC}}\% {\text{ predicted}}} \right) + {1}.{75}\left( {{\text{AA}}} \right) - 0.0{54}({\text{FEV1}}/{\text{FVC}})$$

C = 1 for cardiac; N = 1 for neurological; IS = 1 if individual received non-corticosteroid immunosuppression within 30 days of enrollment date; AA = 1 for African American. Missing data was coded as a 0.

Statistical analysis

All statistical analyses were performed using R (R Core Team, 2020) [19]. Model-based clustering was used to identify sarcoidosis phenotypes based on features shown in Table 1. Variations of the model included a single dichotomous extrapulmonary variable (absence or presence of extrapulmonary disease) versus individual organs. Clustering was performed using the VarSelLCM R package [20]. We chose VarSelLCM given that it supports mixed types of features, missing values, and variable selection to identify important clustering features [21]. VarSelLCM handles missing values using an expectation maximization algorithm. Simulations in Marbac et al. show that the methods work well even when variables have up to 20% missing values [20]. The Integrated Completed Likelihood (ICL) criterion was used to estimate the number of clusters [22]. To identify features associated with cluster membership, variables were ranked based on the variable importance scores from the VarSelLCM model, and additional univariate tests (Fisher’s exact test (FET) and one-way ANOVA) were performed. Pairwise comparisons were made between clusters using 2-sample t-tests for quantitative features and logistic regression for categorical features (FET as appropriate). To account for multiple testing, the Benjamini–Hochberg method was used to calculate false-discovery-rate (FDR) adjusted p-values, hereby referred to as ‘q-values.’ [23] Results with q-values < 0.05 were considered statistically significant.

Table 1 Characteristics of the Study Population

Results

Characteristics of study population

The characteristics of our study population, consisting of 554 individuals (Table 1), reflect a slight female majority and more White individuals, although there was a greater percentage of Black individuals than would be expected based on the racial breakdown of Colorado. The lungs were most commonly involved (96.4%) with Scadding stage 2 most prevalent. Next most frequently involved organs included cardiac (12.8%), skin (12.3%) and eye (10.5%). Most individuals had only one organ involved (54.7%). Most cases (68.9%) were treated with non-corticosteroid immunosuppression within 5 years of enrollment.

Cluster analysis defines six phenotypes

Six clusters were identified by model-based clustering. Based on the variable importance scores from the VarSelLCM model, the six variables most important for clustering in descending order were: FEV1pp, FVCpp, duration of disease, FEV1/FVC, Scadding stage and treatment status. The distributions of these variables are presented in Figs. 1, 2, 3, 4. We evaluated differences across clusters in these variables as noted in Table 2. We describe specific abnormalities by cluster in Fig. 6a, b.

Fig. 1
figure 1

Comparison of lung function parameters among clusters. For each cluster, median and IQR are shown by boxplots and means are shown by x in the center of boxplots for a FEV1pp b FVCpp and c FEV1/FVC. Potential outliers are indicated by distinct points

Fig. 2
figure 2

Distribution of Scadding stages 0–4 in each cluster. The representation of each Scadding stage in a cluster by percent is shown for all six clusters

Fig. 3
figure 3

Comparison of duration of disease in years among clusters. For each cluster, median and IQR are shown by boxplots and means are shown by x in the center of boxplots. Potential outliers are indicated by distinct points

Fig. 4
figure 4

Distribution of cases treated with non-corticosteroid immunosuppression in each cluster. Percent of individuals who ever received immunosuppressive treatment are represented in dark gray, while percent of individual who have never received immunosuppressive treatment are in light gray

Table 2 Differences in clinical characteristics across phenotypes

For lung function (Fig. 1), mean FEV1pp and FVCpp were highest in cluster 1 (104.4 and 104.3 respectively) and cluster 2 (105.7 and 107.9). The highest mean FEV1/FVC ratio was present in cluster 3 (82.5). The clusters with lowest mean FEV1pp and FVCpp included cluster 5 (71.2 and 72.2) and cluster 6 (53.2 and 66.3). The clusters with significantly lower mean FEV1/FVC ratios included cluster 4 (70.3) and cluster 6 (63.5). Overall, cluster 6 had the lowest spirometry values out of all the clusters, although the distribution of the interquartile range (IQR) was broad: FEV1pp (46–62), FVCpp (55–76), FEV1/FVC (55–72).

While each cluster included representation of all five Scadding stages (Fig. 2), differences in the percentages were apparent. Cluster 1 was predominantly composed of Scadding stage 2 (41.5%), while cluster 2 was predominated stage 0 (48.9%). Cluster 5 was mostly Scadding stage 2 (60.3%), while cluster 6 had a majority Scadding stage 4 (51.2%). Clusters 3 and 4 contained no one prominent Scadding stage.

Differences in duration of disease were noted (Fig. 3) with clusters 1 (2) and cluster 5 (2) having significantly shorter mean durations of disease. Cluster 2 (16.5) and cluster 6 (11.1) had the longest mean durations of disease; however, the IQR were broad for these clusters: cluster 2 (8.4–21.9) and cluster 6 (3.4–15.1). Clusters 3 (4.2) and 4 (6.3) had intermediate mean durations. Significantly more individuals were on treatment in clusters 4, 5, 6 compared to clusters 1, 2, 3 (Fig. 4).

Clinical characteristics differ between phenotypes

We evaluated the other variables entered in the cluster analysis to determine differences between clusters (Table 2, expanded table in Additional file 1: Table E1). Figure 6a, b represents specific abnormalities by cluster. In addition to the six variables mentioned above, BMI, age at diagnosis, gender, Lofgren’s syndrome, smoking status and race differed significantly. Specifically, clusters 3 and 5 had higher average BMI than clusters 1, 2, 4, while clusters 2 and 3 had more females compared to more males in clusters 4 and 6. Cluster 6 contained more smokers compared to 3 and 4. More Black individuals were in clusters 2, 4, 5 versus cluster 1. Finally, individuals in clusters 2 and 6 were younger at diagnosis versus those in clusters 1, 3, 4 and 5. Interestingly, specific extrapulmonary organ involvement did not differ across clusters, however there was a trend toward significance for cardiac involvement. When the cluster analysis was rerun using “yes/no” for extrapulmonary involvement, there were still no differences across clusters; additionally, analysis yielded the same results with the same six clusters.

Wasfi score association with phenotypes

We evaluated the association of the Wasfi severity score with each of our clusters. The mean Wasfi score differed significantly across the six clusters (q < 0.001, Fig. 5), with clusters 4, 5, 6 (mean scores of 5, 5.2 and 6.5 respectively) significantly higher than cluster 1, 2, 3 (mean scores 2.6, 3.2 and 3.8).

Fig. 5
figure 5

Wasfi Scores by Cluster. For each cluster, median and IQR are shown by boxplots and means are shown by x in the center of boxplots. Higher scores indicate greater severity

Phenotypes of sarcoidosis disease severity

Based on our cluster analyses, and their associations with clinical variables and Wasfi score analyses, we categorized the clusters based on disease severity and other disease findings. Overall, it appears that the clusters reflect less severe (clusters 1, 2, 3) and severe pulmonary disease manifestations (clusters 4, 5, 6) (Fig. 6a, b). Specifically, individuals in clusters 4, 5, 6 had at least one lung function parameter lower than normal and required more treatment versus those in clusters 1, 2, 3. The individuals in clusters 4, 5, 6 also had unique patterns of lung function abnormalities, specifically obstructive, restrictive and mixed patterns, respectively. Scadding stage was less distinctly distributed between the severe and less severe clusters, although the severe phenotypes had less stage 0/1 disease and cluster 6 had more stage 4 disease, consistent with a fibrotic phenotype. Severe clusters 4 and 6 had more males than less severe clusters 2 and 3. The rest of the variables did not demonstrate a clear distinction between severe and less severe clusters. Based on lung function and radiological differences, we named the clusters as noted in Fig. 6a, b.

Fig. 6
figure 6

a Cluster Descriptions by Less Severe Disease Features. b Cluster Descriptions by More Severe Disease Features. The first column describes the cluster number, and the second column describes the cluster name. The third column includes significant differences in the six most important variables for clustering; arrows indicate a significant difference between less severe clusters (1, 2, 3) and more severe clusters (4, 5, 6) (q < 0.05). The fourth column shows which severe disease features are present in clusters; shading in the Venn diagram indicates that the majority of individuals had that particular disease feature; partial shading indicates half of individuals had the disease feature. The fifth column describes significant pairwise differences between clusters (q < 0.05). Finally, the sixth column describes the mean Wasfi score for that cluster

Discussion

There is a pressing need for sarcoidosis phenotypes that can identify those with or at risk for severe disease and to classify them for research studies. We used cluster analysis on clinical characteristics to define sarcoidosis phenotypes and found that common clinical variables contributed most to the clustering, including spirometry, disease duration, Scadding stage and immunosuppressive treatment. Unexpectedly, we defined six distinct pulmonary phenotypes that included severe and less severe disease manifestations but did not differ in extrapulmonary organ involvement. The three less severe phenotypes were classified as supranormal lung function with parenchymal disease, supranormal lung function with no parenchymal disease and normal lung function. The three severe phenotypes included obstructive physiology with parenchymal disease, restrictive physiology with non-fibrotic parenchymal disease and mixed physiology with fibrotic lung disease. Interestingly, male gender was predominant in two of the more severe clusters while females predominated the less severe clusters. Unsurprisingly, Black individuals made up a greater proportion in two of the severe clusters, and a similar proportion in one of the less severe clusters. Finally, we compared our cluster phenotypes with a previously determined assessment of disease severity developed by our group, the Wasfi score, and found that our less severe clusters had lower scores, while the more severe clusters had higher scores.

Our clusters describe pulmonary disease phenotypes despite the inclusion of other organ involvement. Our unique phenotypes suggest subgroups of pulmonary sarcoidosis based on different lung function and radiographic abnormalities. Our severe phenotypes clusters 4, 5, 6 had lower lung function that was obstructive, restrictive and mixed respectively and were associated with different Scadding stages. Various lung function abnormalities have been implicated with worse outcomes in sarcoidosis, specifically FVC < 80%, FEV1 < 50% and a vital capacity less than 1.5 L. [5, 24,25,26] Previous studies have shown limited correlation between initial Scadding stage and subsequent clinical recovery or lung function [4, 26,27,28] except for Scadding Stage 0 and 4, which have been associated with good and poor prognosis respectively. Indeed, we identified more Scadding stage 4 in our cluster with the worst lung function and Scadding stage 0/1 in our cluster with supranormal lung function. However, Scadding stages 2/3 were represented in both severe and less severe clusters, which supports that Scadding stage is a poor disease predictor except at the extremes. This is not surprising as other studies have found that extremes in Scadding stage, and not stages 2/3, tend to be more predictive of disease course/severity; this is likely due to the vast spectrum of disease abnormalities represented by stage 2/3. The need for treatment is often associated with chronic respiratory impairment [24, 29]. Those who are initially treated are more likely to require treatment at follow-up and relapse with treatment cessation [7, 27]. We find a clear association with treatment and severe and less severe clusters with more individuals in the severe groups on non-corticosteroid immunosuppressive treatment. Individuals who were diagnosed at earlier ages had the longest durations of disease, which did not correlate with disease severity. Clusters that share a similar duration of disease allow identification of distinct phenotypes at a similar point in time without having longitudinal data. For instance, clusters 1 and 5 share a similar short disease duration (average 2 years), but it is obvious these are two distinct phenotypes with cluster 1 exhibiting less severe disease than cluster 5. To determine how the clusters change over time would require longitudinal data, which we did not include; it is possible that individuals may move from one phenotype to another at different time points.

While some of our findings support prior studies, others were unexpected. For example, males were the majority of our severe clusters 4 and 6, while females were the majority in the less severe clusters 2 and 3. These finding are somewhat at odds with prior studies where women, especially Black women, have higher mortality [30, 31] and more severe organ involvement [32, 33]. However, these findings may support studies suggesting that males have a more chronic course than females [25]. Unexpectedly, our less severe cluster 2 had a greater frequency of Black individuals than the other less severe clusters although severe clusters 4 and 5 also had more Black individuals. There is significant literature supporting that Black individuals have more severe disease requiring treatment and higher associated mortality [4, 7, 30, 31, 34]. Most of our participants in this study were White, which may have impacted the results, although they may also suggest that severe sarcoidosis affects all races. It is well documented that there is a decreased prevalence of sarcoidosis among smokers [35,36,37,38]. However, we found that our most severe cluster 6 had the highest percent smokers, suggesting that disease severity may be worse for smokers. This is seen in other pulmonary granulomatous diseases such as chronic beryllium disease and hypersensitivity pneumonitis, where smokers have worse pulmonary function and require more treatment compared to never smokers despite having a lower prevalence of disease [39, 40]. Cluster 6 individuals were also younger at diagnosis and had longer disease duration, which is consistent with the fact that fibrosis is associated with a prolonged duration of disease [31]. Interestingly, this is not the case with cluster 2, which also has a younger age of diagnosis and longer duration of disease. The may reflect that these two clusters represent two different phenotypes. Additionally, cluster 6 had the highest percentage of males; this is an interesting observation as males are often diagnosed at a younger age, and may have more chronic disease, more stage IV fibrotic disease, and higher mortality from fibrosis. [32, 41, 42].

We compared our phenotypes to another phenotyping method developed by our group; the Wasfi severity score gives a numerical severity index developed to codify expert opinion [8]. Our three severe disease clusters were associated with higher Wasfi severity scores. This was not surprising as the features clustered in our severe phenotypes, abnormal spirometry and treatment, are part of the Wasfi score. Unlike the Wasfi score, non-pulmonary organ involvement, including cardiac and neurological, did not contribute to our clusters/phenotypes. Our treatment variable timeframe was different than that used in the Wasfi severity score because we wanted to approximate ever treatment in our cohort using a 5-year timeframe instead of the 30-day timeframe used in Wasfi; however, we found that the Wasfi 30-day treatment variable was highly correlated with our 5-year treatment variable. Other studies have used cluster analyses to define phenotypes in sarcoidosis [13,14,15, 43]. However, in contrast to our study, they used organ specific variables to produce organ-based phenotypes. Unexpectedly in our study, extrapulmonary organ variables did not contribute to the clustering of our phenotypes; while cardiac involvement trended toward being significantly different between clusters, we did not see more cardiac involvement in our severe phenotypes as we anticipated. This might be due to low extrapulmonary organ frequencies in our cohort, although performing cluster analysis using only the presence/absence of extrapulmonary disease did not affect our results. Additionally, Schupp et al. included similarly low extrapulmonary organ frequencies in a large European cohort to develop organ-based phenotypes [13]. Our results may suggest that extrapulmonary organ involvement is not a predominant phenotype when clinically relevant pulmonary variables are included; pulmonary involvement is overwhelmingly the most commonly involved organ in sarcoidosis and results in significant morbidity and mortality [44]. A study by Rodrigues et al., used factor analysis with clinical input variables similar to those used in our study to analyze a Brazilian cohort and found four phenotypes [45]. Similar to our results, they found a phenotype characterized by fibrosis/Scadding stage 4 and decreased lung function parameters as well as one marked by airflow obstruction. Despite the differences in our statistical methods, the similarities in our resultant phenotypes do suggest consistency of results as well as demonstration of the importance of including clinically relevant variables.

While we are a major sarcoidosis referral center, we often see more complicated cases, including severe pulmonary and extrapulmonary disease; this may have biased our cohort towards more severe disease. While this could have impacted extrapulmonary disease severity, our rates of other organ involvement were similar to other studies [13]. Our study emphasizes the need for inclusion of clinically relevant measures of extrapulmonary disease severity, like arrhythmias or ejection fraction for cardiac disease, if the goal is to evaluate other organ specific or overall severe disease. Additionally, other clinical markers of disease severity such as lymphopenia were not available in our cohort. We did not have patient reported outcomes or symptoms, which could provide information missed by objective measurements in a phenotyping paradigm designed to assess disease course and therapeutic intervention; organ specific phenotypes may or may not be helpful for these applications. We did not have longitudinal data to test the stability of our clusters over time, although we were able to infer some longitudinal information based on duration of disease as described above; additionally, previous studies have noted stability in FVC, FEV1 and Scadding stage over a 2-year period suggesting that many of our clusters may remain the same over this time frame [28]. We intentionally chose not to include corticosteroid therapy as a variable because we find that many individuals with sarcoidosis are over-treated with corticosteroids; however, we cannot completely rule out that including corticosteroid may have changed our clustering results. For future directions, a larger cohort followed longitudinally would allow for a deeper analysis of treatment types, treatment failure, the effects of age and sex, and extrapulmonary organ severity based on objective measures. This will allow us to validate the findings we have found in this manuscript. Finally, given the inherent uncertainty in statistical techniques we cannot say that there are definitively only six sarcoidosis clusters, which is an issue that applies to all forms of cluster analyses.

Conclusion

In conclusion, this study is novel in that it uses the objective method of cluster analyses to clinically phenotype sarcoidosis patients with easily obtained clinical characteristics beyond organ involvement. It demonstrates the importance of clinical variables to define clinically relevant phenotypes and suggests that inclusion of longitudinal data may add to the model, which is a plan for future directions. Furthermore, these pulmonary phenotypes were further categorized into less severe and severe phenotypes. Specifically, these phenotypes may help clinicians identify individuals who are more likely to have severe disease in phenotypes 4, 5, and 6, while being able to offer reassurance to those in phenotypes 1–3. For phenotypes 1 and 3 with shorter time since diagnosis, there could be important differences among the less severe phenotypes, which could be elucidated in longitudinal follow-up in future studies. The methods of this study suggest an approach for other organ specific phenotyping. Finally, these phenotypes have the potential to help identify subgroups in this heterogeneous disease that may have implications in follow-up, prognosis and possibly interventions.