Introduction

As the number and diversity of therapeutic options for patients with rheumatoid arthritis (RA) increase, the need has grown for easily applicable clinical outcome measures that represent valid, stable improvements, are relevant to individual patients, and accurately discriminate between different active therapies in clinical trials and real-world settings. Ideally, these measures could also help guide therapeutic modifications in treat-to-target strategies during routine clinical care. A European League Against Rheumatism (EULAR) good response, which requires a decrease in Disease Activity Score-28 joints (DAS28) of 1.2 and a current DAS28 ≤ 3.2 [1], is one possible option for comparing active agents and guiding treatment. However, a study of test–retest reliability over 7 days found that a DAS28 change of 1.2 was lower than the smallest detectable difference [2], suggesting that some EULAR responses are due to inherent variability in DAS28 values rather than to stable therapy-associated changes.

Assessment of individual therapeutic responses based on the critical difference for improvement in DAS28, referred to as the DAS28-dcrit, provides an alternative to EULAR responses. The DAS28-dcrit is a validated, statistically derived criterion defined as the minimum DAS28 change exceeding random fluctuation during stable therapy, and is a robust indicator of clinical response to RA treatment [3]. In RA patients receiving stable therapy with conventional or biologic DMARDs during routine clinical care, the DAS28-dcrit corresponded to a DAS28 decrease (improvement) ≥ 1.8 [3]; this value was confirmed in patients on tocilizumab monotherapy [4] in the non-interventional ICHIBAN study [5]. DAS28-dcrit responses are more closely correlated with patient-reported improvements in functional capacity than EULAR responses [3] and are associated with significant improvements in patient-reported outcomes (PROs) [6], thereby demonstrating the clinical relevance of this response criterion. The validation study focused on change from baseline in an observational cohort initiating treatment with a biologic therapy [3]. To date, no studies have explored the potential utility of DAS28-dcrit in evaluating the comparative effectiveness of two active treatment arms in clinical studies.

Tocilizumab is an interleukin-6 inhibitor with efficacy in treating RA in placebo-controlled randomized trials as monotherapy [7] and in combination with conventional disease-modifying antirheumatic drugs (DMARDs) [8, 9]. This agent is effective and well tolerated during routine clinical care as intravenous (IV) or subcutaneous formulations [10, 11]. Tocilizumab therapy results in superior improvements in disease activity compared with anti-tumor necrosis factor (TNF) agents [12, 13]. In addition to improving objective clinical outcomes, tocilizumab results in significant improvements in PROs, including pain, fatigue, and function, compared with anti-TNF drugs [14].

The goals of this study were (1) to evaluate the ability of DAS28-dcrit responses to differentiate between the effectiveness of active RA therapies in randomized and observational clinical studies through retrospective analysis of data from clinical trials of tocilizumab and comparator active agents and (2) to further explore the suitability of DAS28-dcrit as a statistical tool for evaluating individual responses to RA therapy across different agents and settings.

Methods

Overview of clinical studies

The work reported here is a retrospective analysis of data derived from two previously published clinical studies of tocilizumab in patients with active RA: ACT-iON (clinicaltrials.gov identifier NCT01543503) [13] and ADACTA (clinicaltrials.gov identifier NCT01119859) [12]. ACT-iON was an international, multicenter, non-interventional study of biologic-naïve patients treated with IV tocilizumab (initiated at a dose of 8 mg/kg every 4 weeks) or anti-TNF therapy as their first biologic for 52 weeks [13]. ADACTA was a 24-week randomized, double-blind, phase 4 trial of 8 mg/kg IV tocilizumab monotherapy every 4 weeks versus 40 mg/kg subcutaneous adalimumab monotherapy every 2 weeks in biologic-naïve patients [12]. These previously published studies were conducted in accordance with the Declaration of Helsinki and International Conference on Harmonisation Guidelines for Good Clinical Practice and were approved by an ethics committee or institutional review board at each center (ACT-iON: North Wales Research Ethics Committee, Nov 25, 2011, REC reference 11/WA/0355; ADACTA: University Hospital of Geneva Ethics Committee, Oct 12, 2011, protocol WA19924). All patients provided written informed consent. Because this was a retrospective analysis of previously collected anonymous data, additional ethics approval and consent were not required.

Study design

The previously published DAS28-dcrit [decrease ≥ 1.8 from baseline in DAS28- erythrocyte sedimentation rate (ESR)] [3] was applied retrospectively to evaluate the proportion of patients achieving an individual therapeutic response in the ACT-iON and ADACTA studies. The DAS28-dcrit is a valid and stable measure of intraindividual change in disease activity beyond that due to random fluctuation. In a previous study, this outcome measure was found to be sensitive to therapy-related changes in disease activity and more highly correlated with functional improvement than EULAR responses [3].

DAS28-ESR data were obtained at each visit. The major DAS28-dcrit analyses involved change between baseline and week 52 for the ACT-iON study and between baseline and week 24 for the ADACTA study (with additional analyses at weeks 8, 12, 16, and 20). PROs included pain and patient global health, reported on a 100-mm visual analogue scale (VAS) ranging from 0 (best) to 100 (worst), the Health Assessment Questionnaire-Disability Index (HAQ-DI) on a scale of 0 (best) to 3 (worst), and the FACIT fatigue score, which ranges from 0 (worst) to 52 (best) [15]. EULAR and American College of Rheumatology (ACR) response assessments used previously published criteria [1]. For all outcomes, data were compared between different treatment arms within either ADACTA or ACT-iON. No cross-study comparisons were conducted.

Statistical analysis

Descriptive statistics were used for demographic and baseline characteristics. Stability of responses was evaluated by examining the percentage of patients with at least one response who recorded additional responses at other study visits. p values for comparisons of response rates in different treatment arms were calculated by a logistic regression model with DAS28-dcrit as the dependent variable adjusted for treatment arm. PRO outcomes in responders and non-responders were compared using an ANCOVA model with the given PRO as the dependent variable adjusted for DAS28-dcrit response and baseline PRO value. ACT-iON p value calculations were also adjusted for monotherapy vs combination therapy, and ADACTA calculations were adjusted for RA duration and region. All derived p values were nominal p values unadjusted for multiple comparisons; p < 0.05 indicated statistical significance.

Results

Individual therapeutic response to tocilizumab

We first analyzed the DAS28-dcrit response rate using data from ACT-iON, in which patients were treated with tocilizumab or anti-TNF agents at the discretion of their clinicians during routine clinical care. The groups were generally well matched (see Table 1 in supplementary information) [13]. Concomitant therapy with methotrexate was used in 85.9% of tocilizumab-treated patients and 88.3% of anti-TNF-treated patients.

Analyses of DAS28-dcrit response rates showed a clear difference between the tocilizumab and anti-TNF treatment arms. At week 24, 118/155 (76.1%) of tocilizumab-treated patients and 191/314 (60.8%) of patients treated with anti-TNFs achieved an individual therapeutic response as indicated by a DAS28 decrease of ≥ 1.8 (p = 0.0008; Fig. 1). The corresponding numbers at week 52 were 78.2% and 58.2% (p < 0.0001). Differences were also observed in EULAR good responses (65.2% for tocilizumab versus 41.4% for TNF inhibitors at week 24; 67.6% versus 42.9% at week 52) and ACR50 responses (51.5% for tocilizumab versus 37.1% for TNF inhibitors at week 24; 62.1% versus 38.4% at week 52).

Fig. 1
figure 1

DAS28-dcrit individual therapeutic response rates with tocilizumab and anti-TNF agents in the ACT-iON study. *p < 0.05; **p < 0.0001. DAS28 Disease Activity Score-28 joints, TNF tumor necrosis factor

Although observational studies provide important information on therapeutic responses during “real-world” clinical use, randomized trials are the recognized standard for head-to-head comparisons of active agents. We therefore applied the DAS28-dcrit to data from the ADACTA study, a randomized, double-blind, phase 4 trial of tocilizumab monotherapy versus adalimumab monotherapy. Treatment groups were well-matched (see Table 1 in supplementary information) [12]. DAS28-dcrit responses differentiated between the effectiveness of tocilizumab and adalimumab at all visits; DAS28-dcrit response rates at week 24 were 118/131 (90.1%) for tocilizumab versus 75/127 (59.1%) for adalimumab (p < 0.0001; Fig. 2). In comparison, EULAR good response rates for tocilizumab and adalimumab at week 24 were 60.4% versus 23.5%, and ACR50 response rates were 58.3% versus 35.2%.

Fig. 2
figure 2

DAS28-dcrit individual therapeutic response rates with tocilizumab and comparators in the ADACTA study. *p < 0.05; **p < 0.0001 for tocilizumab versus adalimumab. ADA adalimumab, DAS28 Disease Activity Score-28 joints, TCZ tocilizumab

Stability of responses

Both DAS28-dcrit and EULAR responses are based on the DAS28, so we assessed which individual response measure showed greater stability over time (Table 1). In the ACT-iON study, which consisted of two post-treatment visits at 24 and 52 weeks, DAS28-dcrit responses were more stable than EULAR good responses. Of patients who achieved a response at either visit, 54.7% of tocilizumab-treated patients and 47.9% of anti-TNF-treated patients had a DAS28-dcrit response at both visits; the corresponding rates for a EULAR good response were 47.0% and 37.2%. The ADACTA study, which had patient visits every 4 weeks for 24 weeks, provided a more rigorous test of response stability. DAS28-dcrit responses showed greater stability over time than EULAR good responses (Table 1). During the study, 65.5% of tocilizumab-treated patients who achieved a DAS28-dcrit response at any visit had a response for at least five of the six visits versus 33.3% of patients with a EULAR good response. Response stability rates were markedly lower in patients treated with adalimumab. Of adalimumab-treated patients who achieved a response at any study visit, 40.9% of DAS28-dcrit responders and 21.1% of EULAR good responders recorded a response on at least five of the six visits.

Table 1 Stability of DAS28-dcrit response and EULAR good response in patients with the specified response at any time point during the study. Data are presented as n (%)

Association between DAS28-dcrit response and PROs

For a therapeutic assessment to be relevant to individual patients and useful in guiding therapy, patients with a response should also experience improvements in PROs, regardless of the chosen therapy. We therefore examined mean values for HAQ-DI, patient global health, pain, and fatigue in patients with or without a DAS28-dcrit response in the ADACTA trial [Fig. 3 and Table 2 (supplementary information)]. A DAS28-dcrit response was associated with significant improvements in PROs in responders compared with non-responders (p < 0.05 for all PROs at all time points). Improvements in PROs associated with a DAS28-dcrit response were consistent regardless of the agent used for treatment (tocilizumab or adalimumab). Similar results were observed in the ACT-ion study (Fig. 4).

Fig. 3
figure 3

Patient-reported outcomes (mean values) in DAS28-dcrit responders and non-responders in the ADACTA study. a Patient global health VAS, b pain VAS, c HAQ-DI, d FACIT fatigue. For patient sample sizes, please refer to Supplementary Information. ADA adalimumab, DAS28 Disease Activity Score-28 joints, HAQ Health Assessment Questionnaire-Disability Index, PtGH patient global health, TCZ tocilizumab, VAS visual analog scale

Fig. 4
figure 4

Patient-reported outcomes (mean values) in DAS28-dcrit responders and non-responders in the ACT-ion study. a Patient global health VAS, b Pain VAS, c HAQ-DI, d FACIT fatigue. In this study, the FACIT fatigue scale was reversed (0 = best; 52 = worst) so that lower scores were better, in keeping with other patient-reported outcomes. DAS28 Disease Activity Score-28 joints, HAQ Health Assessment Questionnaire-Disability Index, PtGH patient global health, TCZ tocilizumab, TNF tumor necrosis factor, VAS visual analog scale

Discussion

In this study, we evaluated the ability of the DAS28-dcrit criterion to differentiate between the effectiveness of distinct active RA therapies and to provide a robust and stable assessment of individual responses to RA therapy across different agents and settings. DAS28 is the continuous variable most closely linked to the rheumatologist’s decision to modify treatment [16]. However, a DAS28 change of 1.2, the criterion for a EULAR good response, is too low to be a reliable assessment of therapeutic response [2]. We therefore evaluated the statistically determined DAS28-dcrit to assess individual therapeutic responses during tocilizumab clinical studies.

The DAS28-dcrit, which is based on DAS28 changes that exceed random fluctuation, is more stable and more closely linked to functional improvement than EULAR responses [3]. In addition, it only involves one calculation and is therefore less cumbersome to apply in routine daily practice. Both DAS28-dcrit and EULAR response criteria require calculation of DAS28, but the DAS28-dcrit simplifies assessments of therapeutic response by requiring only two pieces of information: the baseline DAS28 and the current DAS28. In contrast, for a EULAR good or moderate response, the category of the current DAS28 (≤ 3.2, > 3.2 to ≤ 5.1, or > 5.1) and the extent of improvement (> 1.2 or 0.6 to ≤ 1.2) need to be taken into account [1]. In a recent US study, the most common reason for not collecting RA metrics routinely was “takes too much of my time” [17], suggesting that more easily applied assessments may help improve disease tracking and patient care. The DAS28-dcrit criterion has been used to evaluate patient responses in observational studies [3], but had not previously been applied to randomized trial data or to studies comparing different active agents.

In this study, we used the previously published value for DAS28-dcrit (DAS28 improvement from baseline ≥ 1.8) [3], which was confirmed in patients on tocilizumab monotherapy [4], to retrospectively evaluate therapeutic responses in observational and randomized trials of tocilizumab versus comparator agents. We found that DAS28-dcrit response rates effectively discriminated between active treatments: DAS28-dcrit response rates were significantly higher in patients treated with tocilizumab than in those treated with anti-TNF comparators, thereby supporting the previously reported improved effectiveness of tocilizumab compared with anti-TNF drugs [12, 13]. In addition to using DAS28-based outcome measures, previous reports from the ADACTA and ACT-iON studies found that the significantly improved activity of tocilizumab compared with anti-TNF agents was retained in assessments of the Clinical Disease Activity Index (CDAI), which does not contain an inflammatory marker [12, 13], thereby suggesting that the variances observed between therapeutic agents represent true clinical differences and not simply an effect of tocilizumab on acute-phase reactants.

As in the DAS28-dcrit validation study, therapeutic responses as determined by the DAS28-dcrit were more stable over time than EULAR responses during continuing, consistent active therapy. In ADACTA, stable responses (at least 5 of the 6 visits) in patients with a response at any time point occurred approximately twice as often in patients who achieved a DAS28-dcrit response compared with a EULAR good response. We acknowledge the possibility that these data reflect a greater sensitivity to change with EULAR good responses compared with DAS28-dcrit responses. However, we consider it unlikely that 30% of patients on stable treatment had clinically significant differences in disease activity over a period of 24 weeks. Accordingly, we believe the more likely interpretation to be that DAS28-dcrit responses have greater stability over time versus EULAR good responses because these responses represent change that exceeds random fluctuations.

For both response measures, tocilizumab responses showed greater stability over time than responses to TNF inhibitors, suggesting that therapeutic responses to tocilizumab may be better maintained than therapeutic responses to anti-TNF agents. These findings are consistent with other reports of the long-term effectiveness of tocilizumab [10, 11, 13, 18].

Previous studies have found that DAS28-dcrit responses are associated with improvements in function as assessed by the Funktionsfragebogen Hannover patient questionnaire [3] and that patients with DAS28-dcrit responses are more likely to achieve individual responses in PROs than patients who do not achieve a DAS28-dcrit response [6]. In this study, we extend these observations by showing that DAS28-dcrit responses were closely associated with improvements in mean values for PROs, regardless of the drug involved. Patient global health is one of the parameters used to derive the DAS28, so the association between a DAS28-dcrit response and global health is perhaps expected. However, the strong association with pain is more surprising and supports the clinical relevance of a DAS28-dcrit response as a surrogate for outcomes important to patients. The mean change in pain during tocilizumab and anti-TNF treatment exceeded the individual pain-dcrit response value of 30 mm (corresponding to 3 points on a scale of 0 to 10 as calculated by Scharbatke et al. [6]) in the DAS28-dcrit responder group, but not in the group that did not achieve a DAS28-dcrit response, suggesting that changes in pain associated with a DAS28-dcrit response were clinically meaningful to patients. Patients with a DAS28-dcrit response also showed strong improvements in patient-reported function as assessed by the HAQ-DI. As has been reported previously [6], fatigue was the PRO most refractory to improvement in patients achieving a therapeutic response.

Limitations of this study include the use of ad hoc analyses conducted retrospectively. In addition, although statistical assessments were adjusted for potential confounding factors, there were no adjustments for multiplicity, which may have had a slight impact on the reported data. The analyses reported here are specific to tocilizumab and TNF inhibitors and may not apply to conventional DMARDs or other biologic agents. The DAS28 assessments used in this study utilized ESR as the acute phase reactant; DAS28 values based on C-reactive protein may have varied slightly.

Because of the strong effect of tocilizumab on acute phase reactants, DAS28 evaluations may somewhat overestimate the therapeutic effects of this drug [19]. However, compared with other included variables, ESR has been shown to make the smallest independent contribution to DAS28 [20]. A retrospective analysis of data from the SATORI study, a double-blind comparative study of tocilizumab versus methotrexate, found that the coefficient of correlation between DAS28-ESR and the CDAI, which does not contain an inflammatory marker, was high (> 0.8) at baseline and throughout the study; the correlation did not decrease following improvements in ESR after tocilizumab initiation [21]. As mentioned previously, CDAI assessments have shown improved disease activity with tocilizumab compared with anti-TNF comparators in previous reports [12, 13]. However, CDAI-dcrit evaluations are not possible because, unlike the DAS28, the CDAI does not have a normal distribution over the span of disease activity [22]. Because the CDAI is a nonlinear measure, it is not appropriate for a statistical tool such as dcrit which requires equidistance between score intervals to provide a valid outcome.

In conclusion, our study demonstrates that the DAS28-dcrit provides a robust statistical tool for evaluating individual responses to RA therapy with wide applicability across different active agents and clinical settings. In analyses of clinical trial data, the DAS28-dcrit criterion effectively discriminates between active agents, while its close association with PROs makes it a suitable and easy-to-use tool for treat-to-target treatment decisions. We believe a more stable response metric has the potential to improve outcome assessments, and we encourage the prospective use of the DAS28-dcrit response criterion in future RA studies to further examine its utility in both clinical trials and routine clinical care.