Introduction

Systemic sclerosis (SSc; scleroderma) is a multiorgan disease with a complex interplay of diverse pathological processes involving inflammation, fibrosis, and vasculopathy. While organ involvement in SSc varies, skin involvement is almost universal in SSc [1]. The modified Rodnan skin score (mRSS), a measure of skin thickness, is the primary outcome measure in the majority of clinical trials of diffuse cutaneous SSc (dcSSc). Measurement of skin thickness is used as surrogate measure of disease severity and mortality in patients with dcSSc. Specifically, an increase in skin thickening is associated with involvement of internal organs and increased mortality [1]. It is generally accepted that the mRSS tends to worsen in the early part of the disease and to improve in late disease, although the time of peak involvement is poorly defined. The mRSS is feasible, reliable, valid, and sensitive to change in multicenter clinical trials [2].

To interpret change in the mRSS within a group of participants with SSc over time, or to interpret differences in the mRSS between two groups, it is important to first define whether the change or difference is clinically meaningful. The minimal clinically important difference (MCID) is defined as the smallest difference in a measure or instrument that is considered to be “worthwhile or important” to patients [3]. For clinicians, the MCID helps guide treatment. While there are several methods for calculating the MCID, estimating the MCID using an external anchor is often preferred over other methods [4]. Khanna et al. [5] previously published MCID estimates for the mRSS using data from the d-penicillamine trial, which involved a cohort of patients with early dcSSc. They found that an improvement of 3.2–5.3 units for the mRSS were the MCID estimates. However, they used physician assessment of change over time as the anchor to determine the MCID estimates; it is preferred that this information comes directly from patients [4]. Thus, in this article, we analyzed data from two clinical trials in SSc-related interstitial lung disease—the Scleroderma Lung Studies I and II (SLS-I and SLS-II)—to calculate the MCID estimates for the mRSS using patient-reported anchors from participants with SSc.

Methods

Participants

All participants with any outcome data in SLS-I and SLS-II were evaluated in this post-hoc analysis. The study protocols for both SLS-I and SLS-II were approved by the local Institutional Review Boards, and written informed consent was obtained from all participants. The trial designs for both SLS-I and SLS-II have been published elsewhere [6, 7]. Briefly, participants meeting the 1980 SSc classification criteria were included. Participants in SLS-I were randomized to 1 year of oral placebo or oral cyclophosphamide, with the primary endpoint being change in FVC percent predicted (FVC%) at 1 year. Participants in SLS-II were randomized to 2 years of mycophenolate mofetil or 1 year of oral cyclophosphamide followed by 1 year of placebo. The primary endpoint for SLS-II was the course of the FVC% from baseline to 24 months using a joint model, which examined the repeated measurements of FVC%. The mRSS was captured as a secondary outcome measure in both trials [8] and was assessed by experienced rheumatologists at Scleroderma Centers for Excellence throughout the trials. We did not perform mRSS training sessions before each trial.

Methods and procedures

Participants’ clinical data included age, gender, race, disease duration (from first non-Raynaud’s symptom attributable to SSc), skin subtype of SSc (dcSSc or limited cutaneous SSc (lcSSc)), presence of tendon friction rubs (captured as yes or no at baseline and month 12), presence of small and large joint contractures (captured as yes or no at baseline and month 12), and the mRSS. In both studies, the patient-reported outcome measures included the Mahler Baseline and Transition Dyspnea Indexes (BDI and TDI), the Health Assessment Questionnaire (HAQ-DI), the Medical Outcomes Short Form-36 (SF-36), and the Patient Global Assessment of Disease Activity (PtGA) [9].

The Mahler Baseline and Transition Dyspnea Indexes (BDI and TDI) [10] measured patient dyspnea. The BDI measured patient dyspnea at baseline, and the TDI measured their change from baseline to 12 months. Scores ranged from − 3 to + 3 for three domains, for a sum ranging between − 9 and + 9. Higher positive scores connoted less dyspnea (BDI) or an improvement in dyspnea (TDI). While both the BDI and the TDI were assessed using paper questionnaires in SLS-I, both indices were assessed using a self-administered, computer-generated format in SLS-II.

The Health Assessment Questionnaire (HAQ-DI) is a 20-item questionnaire assessing patients’ functional disability in eight domains. Scores ranged from 0.0 (best) to 3.0 (worst) [11]. It has been fully validated in SSc [12].

The Medical Outcomes Short Form-36 (SF-36) version 2 is a self-administered survey assessing patients’ generic health-related quality of life. It generated a physical component summary and a mental component summary [13] and was scored on a t-score metric with a US population mean of 50 (SD 10); a higher score denoted better health-related quality of life. One item is a health transition question and asked the patient whether their health had got better or worse, as described in more detail in the following. The SF-36 has been previously validated in SSc [14].

Statistical analysis

Summary statistics were calculated for all demographic and clinical variables. Continuous variables are reported as mean and standard deviation (SD), and frequencies are reported for categorical variables.

Anchors to assess the MCID

To determine the MCID, we used three anchors directly from participants: the health transition question from the SF-36 answered at 12 months, the change in their HAQ-DI score from baseline to 12 months, and the change in their PtGA score from baseline to 12 month visit. We chose these anchors based on their relationships to the mRSS in previous studies [15] and because all of this information is provided directly by the patient. Additionally, experts recommend multiple anchors to get robust estimates [4]. The SF-36 health transition question asks the patient: “Compared to one year ago, how would you rate your health in general now?” We used the “somewhat better” and “somewhat worse” responses as the anchors for calculating the MCID [16,17,18]. For the HAQ-DI, we chose a previously published MCID estimate of 0.14 units for improvement in SSc [5] and other arthritides [0.22], and we used an arbitrary cutoff point of 0.48 because large improvements in the HAQ-DI score may be greater than the MCID. For the PtGA, we used a cutoff point of 20 units (0–100 units) as the MCID, and we used an arbitrary cutoff point of 50 units as a change that is greater than the MCID.

We assessed the appropriateness of the anchors by calculating Spearman correlations between the anchors (changes in HAQ-DI scores, changes in PtGA scores, and the SF-36 health transition answer at 12 months) and changes in the mRSS from baseline to 12 months. Generally, a correlation coefficient of ≥ 0.30 is considered acceptable [4, 19]. If we did not achieve this threshold, we considered p < 0.05 as an acceptable alternative. We also sought to determine whether the MCID estimates for changes in the mRSS were associated with changes in several patient-reported outcomes (i.e., the SF-36 Physical and Mental Component Summary Scores and the TDI), and changes in physical examination (improvement in tendon friction rubs and joint contractures) due to their relationship with mRSS [20].

Student’s t tests or chi-square tests were used to compare the mean difference in patient-reported outcomes or in percent difference between those who had improved mRSS as defined by the MCID estimates versus those whose mRSS scores did not improve as defined by the MCID. We calculated the effect size as the mean change in the mRSS divided by the SD at baseline. p < 0.05 was considered statistically significant and no adjustment was made for multiple testing.

Results

Baseline characteristics and changes in mRSS and patient-reported outcomes

We evaluated data from 300 participants in SLS-I and SLS-II combined (158 participants in SLS-I and 142 participants in SLS-II; Table 1). The mean (SD) age of the pooled cohort was 50.3 (11.3) years, mean (SD) disease duration was 2.9 (2.0) years, mean (SD) FVC% at baseline was 67.4% (10.8%), and 59% of the participants had dcSSc in both trials (Table 1). While both studies were comparable in most baseline characteristics, those in SLS-II were older (p = 0.004), had shorter mean disease duration (p = 0.01), and had less baseline dyspnea as assessed by the BDI (data not shown; 7.2 vs 5.7, p < 0.001).

Table 1 Baseline demographics for all participants

The mean (SD) mRSS at baseline for all participants was 14.75 (10.72), but the mRSS was higher for those with dcSSc and lower for those with lcSSc as expected (Table 1). The change in mRSS was − 2.84 (5.91) in the overall group, − 4.49 (6.75) in dcSSc participants, and − 0.37 (3.00) in lcSSc participants.

Correlation coefficients to assess the appropriateness of MCID anchors

For our MCID analysis, the correlation coefficients between the anchors (SF-36 health transition at 12  month period and changes in HAQ-DI and PtGA from baseline to 12 months) and the change in the mRSS from baseline to 12 months did not meet the 0.30 threshold (Table 2). However, the correlations for the overall group were statistically significant, thus meeting our alternative criteria and indicating that the anchors are acceptable. To better understand this, we calculated the correlations separately for those participants with dcSSc and those with lcSSc. For those participants with dcSSc, the coefficients were also below the 0.30 threshold, but all three correlation coefficients were statistically significant. For those participants with lcSSc, the coefficients were very small and not statistically significant.

Table 2 Spearman correlation coefficients between change in mRSS and the three patient-reported anchors

mRSS MCID estimates for reported improvement in the overall group

We provide unadjusted mRSS MCID estimates for both improvement and no change in all three anchors (Tables 3, 4, and 5). For the SF-36 health transition question, the mean mRSS MCID estimate for reported improvement (defined as answering “somewhat improved”) was − 2.86 with an effect size of 0.27. This estimate was similar to the group that reported no change for the question, which had an estimate of 2.72 with an effect size of 0.25 (Table 3). Using the HAQ-DI cutoff points described earlier, the mRSS MCID estimate for HAQ-DI improvement was − 3.56 with an effect size of 0.33, and this was numerically larger than the HAQ-DI no change group (mean change of − 2.55 and effect size of 0.24; Table 4). Using the PtGA cutoff points described earlier, the mRSS MCID estimate for PtGA improvement was − 4.06 with an effect size of 0.38, and this was numerically larger than the PtGA no change group (mean change of − 2.94 and effect size of 0.27; Table 5).

Table 3 Estimation of MCID estimates in mRSS using SF-36 health transition anchor for all participants
Table 4 Estimation of MCID estimates in mRSS using HAQ-DI anchor for all participants
Table 5 Estimation of MCID estimates in mRSS using Patient Global Assessment anchor for all participants

mRSS MCID estimates for reported improvement in those with dcSSc

For those participants with dcSSc, using the SF-36 health transition question, the mean mRSS MCID estimate for those who reported improvement was − 4.70 with an effect size of 0.49. This was similar to the group reporting no change, which had an estimate of − 4.61 with an effect size of 0.48 (Table 3). Using the HAQ-DI, the mean mRSS MCID estimate for HAQ-DI improvement was − 4.64 with an effect size of 0.48, and this was similar to the group with no change in HAQ-DI (mean change of − 4.8 and effect size of 0.50; Table 4). Using the PtGA, the mean mRSS MCID estimate for PtGA improvement was − 5.12 with an effect size of 0.53, and this was similar to the group with no change in PtGA (mean change of − 5.16 and effect size of 0.54; Table 5).

mRSS MCID estimates for reported improvement in those with lcSSc

For those participants with lcSSc, the mRSS MCID estimates were small and within the measurement error of the mRSS (Tables 3, 4, and 5). Specifically, the mRSS MCID estimate was 0.67 for improvement on the SF-36 health transition question, − 1.58 for improvement in HAQ-DI scores, and − 1.0 for improvement in PtGA scores.

mRSS MCID estimates for reported worsening in the cohort

On average, the mRSS improved in participants who categorized themselves as “somewhat worse” on health transition, and using definitions of worsening in HAQ-DI and PtGA (Table 3, 4, and 5). These trends were also seen in those specifically with dcSSc.

Relationship between the mRSS MCID estimates, patient-reported outcomes, and musculoskeletal examination

We explored whether the participants whose mRSS scores improved by ≥ MCID over 12 months had parallel changes in patient-reported outcome scores (Table 6). We used an improvement of > 3 and 4 units of the mRSS for the overall group and > 5 units of the mRSS for those with dcSSc. For the participants who met these mRSS MCID improvement criteria, they also had significantly greater improvements for the SF-36 physical component summary and the TDI compared to those whose mRSS changes did not meet the MCID estimates (p < 0.05 for all comparisons), for both the overall group and for those with dcSSc. For those with dcSSc, a greater proportion of participants who met the mRSS MCID criteria had an improvement in their small and large joint contractures compared to those who did not meet the MCID improvement criteria (p = 0.008).

Table 6 Change in patient-reported outcome measures and musculoskeletal involvement by the MCID estimates for all participants and for diffuse cutaneous SSc

Discussion

The mRSS is a surrogate of skin and internal organ involvement in SSc and is feasible, reliable, valid, and sensitive to change in multicenter clinical trials [2]. Since the mRSS serves as the primary outcome measure in recent clinical trials, it is important to define which changes in the mRSS between two groups are clinically meaningful (not just statistically significant). Additionally, for the clinician, the MCID provides information on treatment response and can help guide treatment. Our data suggest that an improvement of 3–4 units in all SSc patients (an improvement of 20–27% from the baseline score) and an improvement of 5 units in dcSSc patients (an improvement of 24% from baseline scores) are appropriate MCID estimates.

Our data align with the mRSS MCID estimates from the dPenicillamine trial (based on physician report), where an improvement of 3.5–5.3 units was considered the MCID [5]. Another study including the physician consensus exercise also defined an improvement of 3.0–7.5 units as the mRSS MCID estimates for improvement [21]. Additionally, improvements of > 5 units and ≥ 25% have recently been considered clinically important estimates for the mRSS in dcSSc [22]. Thus, our findings are very similar to published estimates. In addition, our MCID estimates are associated with statistically significant changes in the SF-36 physical component summary and TDI as well as improvement in small and large joint contractures, suggesting that these MCID estimates translate into how a patient feels and functions [23].

MCID estimates are calculated at a group level and should not be confused with change in a measure in an individual patient. At an individual level, a larger change is required to be considered a statistically significant change, and it is influenced by both measurement error and normal biologic variability over time [4]. It may also be influenced by the severity of disease and skin involvement.

Our data highlight a few important things about assessing the MCID estimates. First, despite the large number of participants in the two trials, the lack of appropriate correlations between the anchors and the mRSS indicate uncertainty in the point estimates. This is highlighted by the similar mean changes in the mRSS for both the MCID improved and no change groups for those with dcSSc. Second, using the patient anchors of SF-36 health transition, HAQ-DI, and PtGA, patients who worsened on these anchors still had, on average, an overall improvement in the mRSS (with wide CIs that crossed 1). Although this seems surprising, this is likely due to the poor correlation coefficients between the anchors and the mRSS, as the SF-36 health transition question and the PtGA survey assess overall change in health (beyond improvement in skin) and the HAQ-DI asks about daily functional activities [15]. In previous analyses from a dcSSc cohort, the mRSS had a larger correlation with the physician global assessment compared to the PtGA [15]. Incorporating anchors that focus on change in skin involvement and its impact on function and other daily activities may be more appropriate and should be considered in future trials. Third, as expected, those with lcSSc had minimal change in the mRSS during the two SSc trials. In our analyses, the change in the mRSS for those with lcSSc was ≤ 1.0 units in a majority of the subgroups.

Our study has many strengths. We used prospective data from two large SSc randomized controlled trials (SLS-I and SLS-II) to determine MCID estimates. Although the mRSS was not the primary outcome measure, it was captured by expert centers in the USA and assessed by the same investigator for a given patient. Second, we have validated the MCID estimates for the mRSS using patient-driven anchors; all previous estimates were based on physician anchors [21] or consensus agreement. Our effect size in this analysis for the MCID groups (range of 0.48–0.53) is very similar to the estimates provided by Khanna et al. ([5] effect sizes for the MCID group were between 0.40–0.66). Third, MCID estimates are an approximation, and experts have suggested using multiple anchors to define a range for these estimates [4] so we included three anchors in the current analysis. Fourth, our analysis provided MCID estimates that correspond to patient-reported outcome changes over time, supporting the validity of these estimates.

Our study is not without limitations. First, the analysis was post hoc rather than a priori. Second, the correlation between the anchors and the mRSS was less than the proposed cutoff point of 0.30, adding to the uncertainty of these estimates. For example, the relationship between the SF-36 health transition and PtGA anchors versus the mRSS outcome may have been influenced by changes in variables unrelated to the skin, including lungs, gastrointestinal involvement, and other disease manifestations. Further analyses of ongoing clinical trials should explore other anchors that focus on skin involvement rather than global disease and that have better associations with the mRSS.

Conclusion

We report patient-based MCID estimates for the mRSS using data from two large randomized controlled trials. These mRSS MCID estimates can be used for interpretation of ongoing clinical trials in SSc and interstitial lung disease, and for sample size estimation in future trials.