Background

Remission or at least low disease activity (LDA) is a major treatment outcome for patients with rheumatoid arthritis (RA) [1, 2]. Different composite measures with specific thresholds are available to measure the effects of treatment on these outcomes [2,3,4,5,6]. The most well-established one, the modified disease activity score including 28 joint counts (DAS 28 [4]), was developed in the 1990s and includes counts for swollen and tender joints, a patient global assessment, and an acute phase reactant (APR), either the C-reactive protein level or the erythrocyte sedimentation rate [7]. The composite score is calculated using a complex formula with weighting and/or transformation of the individual elements (Table 1). Besides being rather complex, a limitation of the DAS 28 is the use of a cut-off for remission of < 2.6, where patients may still have residual swollen joints and thus the risk of progression to joint damage and permanent functional disability [2, 8,9,10].

Table 1 Comparison of composite measures for assessment of remission and disease activity in rheumatoid arthritis*

Further, more simple composite measures with more stringent cut-offs were therefore developed for use in clinical practice (Table 1): the simplified disease activity index (SDAI) in 2003 [5], the clinical disease activity index (CDAI), which does not include an APR, in 2005 [6], as well as newer remission criteria by the American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) in 2011 [2]. These criteria comprise either a Boolean approach or an index-based definition (cut-offs for remission: ≤ 3.3 for SDAI and ≤ 2.8 for CDAI) [11]. Compared with the original DAS 28 cut-off, patients fulfilling the newer cut-offs were found to have less residual disease activity as well as less functional disability and joint damage [12,13,14].

Modern RA therapy is characterized by a “treat-to-target” approach with regular assessment of disease activity using the composite measures mentioned above and, if the target is not achieved within a particular timeframe, subsequent therapeutic adaptation with the goal of reducing disease activity as early as possible [1]. Newer treatments, such as interleukin (IL)-6 and Janus kinase (JAK) inhibitors, directly inhibit APR production and may thus lead to better DAS 28 scores not reflected by clinical improvement [9, 15, 16]. The use of the DAS 28 may therefore lead to a higher proportion of patients fulfilling remission and LDA criteria than the use of other composite measures. The results of the DAS 28 may therefore be misleading when RA treatments with different modes of action are compared. To date, no systematic comparison of all four composite measures (DAS 28, CDAI, SDAI and Boolean approach) for measuring the effects of biologics and JAK inhibitors has been performed.

To inform the discussion on the choice of composite measure, we thus performed such an investigation. For this purpose, we largely used data generated for a systematic review of biologics in RA conducted by the German health technology assessment (HTA) agency, the Institute for Quality and Efficiency in Health Care (IQWiG) as well as data from systematic reviews of newer biologics and JAK inhibitors (see below for details). The studies included in these reviews investigated treatment effects on remission and LDA in patients with RA using different composite measures. We aimed to quantify the impact of the choice of composite measure. Furthermore, we discuss the consequences of potential differences in the results of the various composite measures in the studies analysed.

Methods

Study design and data sources

Data on remission and LDA assessed via the DAS 28, SDAI, CDAI and the Boolean approach (remission only) were included from a systematic review in an HTA report conducted by IQWiG on biologics in RA therapy. The full report is only available in German [17]—the core report [18], as well as a journal article on a network analysis based on the HTA report [19], are available in English. Clinical studies conducted up to 2017 on biologics approved up to 2016 by the European Medicines Agency (EMA) were included. These studies investigated head-to-head comparisons of biologics and comparisons of biologics with placebo. Results after 6 months of treatment were included in the analyses. Study sponsors reanalysed and provided the proportion of patients in remission or with LDA for the different composite measures used in the studies.

In addition, for newer biologics and JAK inhibitors approved by the EMA between 2017 and 2020, data on remission and LDA were considered that had been provided by study sponsors for inclusion in 5 early benefit assessments (also called dossier assessments) [20,21,22,23,24]. In Germany, this type of assessment is conducted within 3 months of market entry of a new drug based on a dossier submitted by the study sponsor and contains a systematic review of the evidence on a new drug versus standard care. The studies included in the assessments investigated head-to-head comparisons of biologics and comparisons of biologics with JAK inhibitors. Results after 6 to 12 months of treatment were included in the analyses.

According to the therapeutic indications specified in the summaries of product characteristics for biologics and JAK inhibitors approved in the European Union, the studies included in the systematic reviews above considered either methotrexate (MTX) naïve patients, patients after MTX failure, patients after biologic failure and / or patients with MTX intolerance. The treatments were administered in combination either with MTX or as monotherapy in patients intolerant to MTX. In placebo-controlled studies, the placebo was also administered in combination with MTX.

Statistical analysis

Treatment effects for the outcomes of LDA and remission were estimated by odds ratios (ORs) for each of the composite measures. For studies comparing biologics with placebo, an OR > 1 indicates a beneficial effect of the biologic. For studies comparing biologics with each other and studies comparing JAK inhibitors and biologics, an OR > 1 indicates a beneficial effect for the first treatment mentioned.

Within each study we estimated the differences in estimates for all composite measures used for the assessment of remission and LDA calculating the ratio of ORs (ROR) for each comparison (e.g. ROR = ORDAS 28 < 3.2/ORCDAI ≤ 10 for LDA). An estimate of ROR > 1 thus indicates larger effect estimates for remission or LDA for the first composite measure versus the second one. For the main analyses, calculations considering data dependency were conducted (see Additional file 1 for more details on statistical methods). Sensitivity analyses not considering data dependency were also conducted (see Additional file 1: Tables 5 to 10).

RORs were calculated within each study for each possible comparison of composite measures and subsequently combined for each treatment comparison, using inverse variance weighted fixed-effect model meta-analyses for the whole patient population and, if possible, for the different subpopulations with available data (MTX naïve, after MTX failure, after biologic failure, with MTX intolerance). If separate data for different subpopulations were available from one study, both data sets were included in the analysis separately. Heterogeneity was assessed using the Q test [25] between all data sets on a treatment comparison. If data for different subpopulations were available for a treatment comparison, heterogeneity was also tested between the data set pools for different subpopulations. In the case of relevant heterogeneity (p < 0.05), no combined estimate was calculated. We used the statistical software R 4.1.1 [26] for all analyses on the study level and SAS 9.4 (SAS Institute, Cary NC) for meta-analyses. Data for the outcomes in the individual studies are included in Additional file 1: Tables 11 to 20.

Results

Placebo-controlled studies

An overview of results is provided in Table 2 and details are provided in Additional file 1: Tables 1 and 2.

Table 2 Overview of results on RORs for assessment of low disease activity and remission using the DAS 28, SDAI, CDAI and the Boolean approach (remission only), placebo-controlled studies

We considered results from 49 placebo-controlled studies identified in the previous systematic review [17]. The studies included 9 different biologics: 48 studies (16,233 patients) investigated LDA and 49 (16,338 patients) investigated remission. About 65% of the patients were included after MTX or biologic failure and about 35% were MTX-naïve. Nine combinations of the 4 composite measures were compared (3 for LDA, 6 for remission) resulting in a total of 81 comparisons, of which 3 (all SDAI vs. CDAI) were not interpretable due to relevant heterogeneity. 78 comparisons were thus included in the analysis (25 for LDA and 53 for remission).

Statistically significantly larger treatment effects versus placebo were observed when using certain composite measures in 16 of the 78 comparisons (20.5%): 7 out of 25 (28.0%) for LDA and 9 out of 53 (17.0%) for remission. 11 of these 16 comparisons (68.8%) showed these effects in the DAS 28 (6 vs. CDAI, 3 vs. SDAI, 2 vs. Boolean approach): 5 for the IL-6 inhibitor tocilizumab (all with RORs > 2), 3 for the IL-1 inhibitor anakinra (2 with RORs > 2), 2 for the tumour necrosis factor (TNF)α-inhibitor adalimumab, and 1 for the TNFα-inhibitor certolizumab pegol. Four of the 16 comparisons showed statistically significantly larger treatment effects in the SDAI (3 vs. CDAI, 1 vs. Boolean approach): 2 for tocilizumab and anakinra and 2 for the TNFα inhibitors golimumab and etanercept. One of the 16 comparisons showed a statistically significantly larger treatment effect in the CDAI (vs. Boolean approach) for certolizumab pegol. To visualize the larger treatment effects measured with the DAS 28 versus the CDAI for tocilizumab in the single studies, please see the forest plot in Fig. 1a as an example.

Fig. 1
figure 1

Forest plots of RORs for DAS 28 and CDAI for the assessment of low disease activity for comparisons of IL-6 inhibitors versus placebo (A) or active controls (B)

No statistically significant differences in treatment effects were shown in 59 of the 78 comparisons and statistically significantly smaller treatment effects were shown in 3 comparisons: the SDAI or CDAI versus a Boolean approach for 1 comparison including adalimumab (RR < 1) and 2 comparisons including anakinra (RR < 0.5). However, at least for anakinra, these findings should be interpreted with caution due to the very small number of patients in remission using these composite measures (1 to 5 patients per treatment arm, see Additional file 1: Table 13), which probably contributed to the high RORs.

Active-controlled studies

An overview of results is provided in Table 3 and details are provided in Additional file 1: Tables 3 and 4.

Table 3 Overview of results on ratios of odds ratios for assessment of low disease activity and remission using the DAS 28, SDAI, CDAI and the Boolean approach (remission only), active-controlled studies

The 11 active-controlled studies investigated both LDA and remission and included 5 different head-to-head comparisons of biologics and 5 different comparisons (6 studies) of biologics with JAK inhibitors. A total of 5996 patients were included. About 95% of the patients were included after MTX or biologic failure and 5% were intolerant to MTX.

The same 9 combinations of composite measures were compared as in the placebo-controlled trials, resulting in 90 comparisons (30 for LDA and 60 for remission) of which all were interpretable.

Statistically significantly larger treatment effects versus the active control were observed when using certain composite measures in 11 of the 90 comparisons (12.2%); 9 out of 30 (30.0%) for LDA and 2 out of 60 (3.3%) for remission. 8 of the 11 comparisons (72.7%) showed these effects in the DAS 28 (5 vs. CDAI, 3 vs. SDAI): 6 for the IL-6 inhibitors tocilizumab and sarilumab (of which 4 showed RORs > 2) and 2 for the JAK inhibitor upadacitinib. The other 3 of the 11 comparisons showed statistically significantly larger treatment effects in the SDAI (all vs. CDAI) for tocilizumab (2 comparisons) and the JAK inhibitor filgotinib. To visualize the larger treatment effects measured with the DAS 28 versus the CDAI for tocilizumab and sarilumab in the single studies, please see the forest plot in Fig. 1b as an example.

No statistically significant differences in treatment effects were shown in 77 of the 90 comparisons and statistically significantly smaller treatment effects were shown in 2 comparisons (DAS 28 vs. CDAI for the JAK inhibitor tofacitinib and SDAI vs. CDAI for certolizumab pegol).

Sensitivity analyses

The only statistically significant results that were robust in the sensitivity analyses were those on comparisons of composite measures including the DAS 28 for placebo-controlled trials with tocilizumab (Additional file 1: Table 5). This can be explained by the large sample size available (about 3300 patients from 9 studies) and the substantial treatment effects observed (all RORs > 2). However, all ROR estimates remained unchanged in the sensitivity analyses, only the confidence intervals were wider.

Discussion

Our study provides the first systematic comparison of differences between the estimated treatment effects of biologics and JAK inhibitors on remission and LDA recorded with 4 composite measures. In the overall patient population and in all subpopulations, statistically significantly larger treatment effects compared to the other measures were most frequently observed if the DAS 28 was used. To a lesser extent, such larger effects were also shown in a further composite measure including an APR, the SDAI. Interestingly, statistically significant differences in treatment effects for SDAI versus CDAI were observed for the assessment of LDA, but not for remission. This may be due to smaller differences in treatment effects measured by SDAI versus CDAI than in DAS 28 versus CDAI, where differences in treatment effects might therefore be detected more easily. In addition, less precise effect estimates were observed for remission than for LDA (e.g. SDAI vs. CDAI for tocilizumab: ROR [95% CI] of 1.10 [1.05 to 1.15] for LDA and 1.06 [0.96 to 1.17] for remission). The difference in treatment effects was found to be similar for both outcomes; however, statistically significant differences in ROR were not shown for remission. This may be due to lower numbers of patients with events for this outcome in the single studies using both composite measures (see, e.g. Additional file 1: Tables 13 and 14). With regard to the treatments affected, the larger effects were most common and most pronounced for the IL-6 inhibitor tocilizumab. Smaller differences were also shown for the IL-1 inhibitor anakinra, other biologics and JAK inhibitors, although these findings were not confirmed in sensitivity analyses. Hence, the DAS 28 and SDAI in particular make it difficult to interpret comparative effectiveness studies of treatments with different modes of action (e.g. IL inhibitors vs. TNFα inhibitors), as the apparently larger treatment effects in favour of IL-1 or IL-6 inhibitors may not accurately reflect clinical improvement.

Consequences of using inappropriate composite measures

Even though the deficits of the DAS 28 (and to a lesser extent the SDAI) have been known for several years, composite measures including an APR are still being used in primary studies [10]. This might be partly due to the fact that they are still recommended in official guidance: the current EMA guideline for the design of clinical studies on RA still mentions the DAS 28 as a validated composite measure to assess LDA and remission [27]. In addition, the ACR recommendations on RA disease activity measures, which were updated in 2019, still recommend the DAS 28, among others [28]. These guidelines should be updated to ensure study results that reflect clinical benefits for patients, rather than differences in the mode of action of treatments. The replacement of the DAS 28 and SDAI with the CDAI is all the more important because various newer treatments (e.g. JAK inhibitors such as upadacitinib and filgotinib or IL-6 inhibitors such as sarilumab) directly inhibit APR production.

Furthermore, composite measures including an APR still influence the conclusions of systematic reviews and HTA reports [29,30,31,32,33,34,35,36] and resulting documents such as clinical guidelines. In consequence, decisions based on these documents, such as reimbursement or treatment decisions, may be biased. Systematic reviewers and HTA bodies should thus avoid using the DAS 28 and SDAI, specifically for comparative effectiveness research. If only DAS 28 or SDAI results are available, HTA bodies should require study sponsors to provide CDAI results. With the present publication, we provide these results for a large number of RA studies (see Additional file 1: Tables 11 to 20), demonstrating that these important data can be generated by re-analysis of available study data, even if the original study analysis did not include CDAI results [19].

Previous research

Our findings confirm and supplement previous research on composite measures in RA. As early as 2005, Aletaha et al. indicated in their validation of the CDAI that composite measures including APRs were dispensable [6]. Moreover, Schoels et al. (2017) found that even if lower cut-offs were used for the DAS 28, a considerable proportion of patients were classified as being in remission, despite the presence of a significant swollen joint count [37]. This is in line with Futó et al., whose visualization of the DAS 28, SDAI and CDAI showed that APRs overshadowed changes in clinical outcomes; the authors described APRs as “major confounding factors” [38]. In an exposure–response modelling of tocilizumab in RA using the DAS 28, SDAI and CDAI, Bastida et al. [39] found that APRs decrease faster than clinical outcomes and concluded that the “CDAI is a better option than the DAS 28 and SDAI to assess disease activity in tocilizumab-treated patients”.

Strengths and limitations of our analysis

The major strength of our analysis is the systematic approach and the broad evidence base retrieved from several systematic reviews allowing consideration of all 4 composite measures, a broad range of biologics and JAK inhibitors, and a broad range of patients, i.e. MTX-naïve patients as well as patients after MTX or biologic failure. A limitation is the relatively small number of studies available for the direct comparisons.

Conclusions

The use of composite measures including an APR to measure the treatment effects in patients with RA leads to overestimation of the treatment effects of drugs with direct inhibitive effects on APRs and thus to inaccurate classification of the main treatment outcomes compared to other RA treatments. Our findings underline the need for the use of the CDAI as the composite measure of choice in clinical studies, in particular to enable unbiased results in comparative effectiveness research.