Background

Rheumatoid arthritis (RA) is a common chronic inflammatory joint disorder. Without treatment most patients with RA become severely disabled. The goals of RA treatment are to reduce disease activity, reduce or inhibit the rate of joint damage and, if possible, achieve remission. Current pharmacologic therapies include traditional disease-modifying anti-rheumatic drugs (DMARDs) and biologic agents [13].

Biologic agents have been shown to inhibit radiographic joint destruction in patients with an inadequate response to non-biologic DMARDs. Driven by recent expectations that patients in clinical trials randomized to placebo should be ‘rescued’ with active therapy within 6 months of starting treatment, the relative benefit of arresting joint damage with biologic agents beyond this period is unclear. With longer-term evidence of the rate of joint deterioration with placebo or minimal treatment, the efficacy of biologic agents and novel treatments might be projected beyond the placebo-controlled phase observed in clinical trials.

The objective of the current study was to estimate radiographic joint destruction over time with minimal treatment among the following populations of biologic DMARD-naïve RA patients: (1) moderate-to-severe RA patients with a history of inadequate response to non-biologic DMARDs who were treated with one (other) non-biologic DMARD; and (2) moderate-to-severe RA patients without a history of inadequate response to a DMARD, who received palliative care (non-steroidal anti-inflammatory drugs [NSAIDs], analgesics, low-dose glucocorticoids) or were being minimally treated with one non-biologic DMARD. The first population was termed the “DMARD-IR population” and the second population the “non-DMARD-IR population”. The evidence for this analysis was obtained by means of a systematic literature review.

Methods

Study identification and selection

A systematic literature search was performed to identify studies that provided information concerning joint structural deterioration among minimally treated RA patients. MEDLINE® and EMBASE® databases were searched simultaneously for articles published in English, French, or German, from 1970 to October 2009, with a predefined search strategy. Search terms included a combination of free text and thesaurus terms related to RA, NSAIDs, glucocorticoids, non-biologic DMARDs, clinical trials, and observational studies. (See Additional file 1 for details of the search strategy.) The relevance of each citation identified from the databases was based on the title and abstract according to the predefined selection criteria outlined below:

Populations of interest

DMARD-IR, i.e., adult RA patients naïve to biologic DMARDs with a history of inadequate response to one or more non-biologic DMARDs; and non-DMARD-IR, i.e., adult RA patients naïve to biologic DMARDs without a history of inadequate response to a non-biologic DMARD. The non-DMARD-IR population could include both non-biologic DMARD-naïve (completely DMARD-naïve) and non-naïve (non-biologic DMARD-experienced) patients.

Interventions

NSAIDs, glucocorticoids, and single non-biologic DMARDs, including methotrexate (MTX), azathioprine (AZA), sulfasalazine (SSZ), leflunomide (LEF), ciclosporin A (CSA), hydroxychloroquine (HCQ), minocycline, D-penicillamine, and gold salts.

Outcomes

Radiographic measures of joint deterioration: Larsen score (0–200 points range) [4, 5] and Total Sharp Score (TSS) (0–448), plus two TSS subscores (Erosion Subscore [ES] (0–280) and Joint Space Narrowing [JSN] (0–168) Subscore) [610].

Study design

Randomized controlled trials (RCTs), and prospective and retrospective observational cohort studies. Only study arms concerning the interventions of interest were included.

Publications were obtained, if available, for any abstracts that potentially met the selection criteria. Based on these full-text reports, two reviewers evaluated whether each study met the selection criteria and any disagreements were resolved in a consensus meeting.

Data extraction

For each of the selected studies that reported sufficient follow-up data, details were extracted from the relevant study arms on study design, population characteristics, interventions, and the outcomes of interest, i.e., the (modified) TSS and its two subscores (ES and JSN). Data were extracted into a study database according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2009 [11]. Mean change from baseline (CFB) in the outcomes of interest was extracted from tables, text, or graphs. If not reported, CFBs were calculated as the difference between reported follow-up and baseline values. Corresponding standard errors were extracted directly or calculated indirectly based on the following data (if available): reported standard deviation (SD) with sample size, 95 % confidence interval, or p-values (in this order of preference).

Larsen scores were not consistently evaluated or reported. Different numbers and sets of joints were evaluated in the various studies, including hands and feet, or hands, feet, and wrists, and many studies did not report which or how many joints were evaluated. Moreover, some studies reported the total scores and some reported an average of scores per joint. Consequently, the analyses were based on standardized mean CFB in Larsen score, calculated as the reported CFB divided by the corresponding SD of this change.

Meta-analysis of joint structural deterioration over time

Mean CFB in TSS, ES, JSN, and the standardized Larsen scores obtained from the selected studies were combined with Bayesian random-effects meta-analysis models to estimate joint deterioration over time for the DMARD-IR and non-DMARD-IR populations [12]. Any study that did not explicitly state whether or not patients had previously shown an inadequate response to DMARDs was assigned to the non-DMARD-IR population. Depending on the availability of data by endpoint and population, two sets of analyses were performed. In the first series of analyses, all non-biologic-DMARDs were considered as one group and the development of the outcomes of interest was estimated. Studies evaluating only NSAIDs were not combined with studies evaluating DMARDs. In the second series of analyses, the development of the outcome over time was compared among individual DMARD (e.g., MTX, LEF, and AZA) using only data from comparative studies. All analyses provided curves reflecting the pooled mean CFB in TSS, ES, JSN, and the standardized Larsen score over time, along with their respective 95 % credible intervals (95 % CrIs).

Within the Bayesian framework, analyses consisted of data, likelihood, parameters, and a model. Bayesian methods involve a formal combination of a prior probability distribution (that reflects a prior belief of the possible values of the parameters of interest) with a likelihood distribution based on the observed data, to obtain a posterior probability distribution of the parameters of interest [13]. A normal likelihood distribution was assumed.

We opted for statistical models that assume that outcomes develop over time in a linear fashion, as well as models that anticipate that outcomes can develop in a non-linear fashion over time [1416]. The advantage of the used meta-analysis models is that all available data points of each study included in the analysis are captured, even if time points are not the same across studies, and (non-) linear trends of the development of outcomes over time are estimated [12]. Details of the meta-analysis models are provided in Additional file 2. Model 1 and 2 were used to estimate the development over time where all non-biologic DMARDs were grouped. Model 3 and 4 were used for the comparative analysis of different DMARDS. The deviance information criterion (DIC) provides a measure of model fit that penalizes model complexity and was used to compare the different models [17, 18]. The model with the lowest DIC and, therefore the model with the “best fit”, was considered the most appropriate.

To avoid prior beliefs influencing the results of the model, non-informative prior distributions were used. Prior distributions of all model parameters were normal distributions with a mean of 0 and a variance of 104, except for heterogeneity, which was a uniform distribution with a range of 0–10. With such a “flat” prior, it is assumed that, in advance of the actual data, any parameter value is “equally” likely. As a consequence, posterior results are not influenced by the prior distribution but are driven by the data. The result of the Bayesian analysis is a (joint) posterior distribution for the model parameters of interest. The model parameters were estimated using a Markov chain Monte Carlo method as implemented in the WinBUGS software package [19].

Results

Study selection

The study selection process, including the reasons for exclusion, is summarized in Fig. 1. The literature search identified 2076 potentially relevant studies, although the first review excluded 1892 (91 %) of these. The full-text review of the 184 remaining studies excluded another 111 studies. Of the 73 articles meeting the selection criteria, another 29 studies were excluded because of insufficient data on the outcomes of interest during follow-up. Overall, 44 studies were included [2063].

Fig. 1
figure 1

Flow diagram of study selection

Study characteristics

Information on key study and patient characteristics are presented in Tables 1 and 2. All 44 studies were RCTs, except for one retrospective [31] and two prospective cohort studies [33, 52], and were published between 1982 and 2009, with follow-up periods ranging from 24 weeks to 2 years. Twelve studies concerned the DMARD-IR population [2031] and the remaining 32 studies concerned the non-DMARD-IR population [3263]. Only two studies provided data on the Larsen score for the DMARD-IR population, while 10 provided data on the (modified) TSS in this population. For the non-DMARD-IR population, 17 studies provided data on Larsen score only, 14 studies provided data on the (modified) TSS, and one study provided data on both. As some of the studies contributed data from more than one arm (three studies in the DMARD-IR population [21, 22, 31] and 13 studies in the non-DMARD-IR population [35, 36, 38, 41, 48, 50, 51, 54, 57, 59, 6163]), the total number of treatment arms included in the analyses was 63: 16 for DMARD-IR and 47 for non-DMARD-IR. Among the 47 arms that formed the non-DMARD-IR population, only 12 included patients who had been previously exposed to DMARDs. Hence, the majority of patients in this population could be considered DMARD-naïve.

Table 1 Key study characteristics
Table 2 Patient characteristics at baseline

The patients received the following treatments g : MTX (in 12 treatment arms across the studies); AZA (2); SSZ (1); and gold salts (1) for the DMARD-IR population; and MTX (16); gold salts (9); SSZ (5); LEF (four 4); CSA (4); HCQ (1); D-penicillamine (1); antimalarials, D-penicillamine, SSZ, or gold salts (2); any one of a list of non-biologic DMARDs (4); and NSAIDs (4) for the non-DMARD-IR population. The number of patients included ranged from 29 to 228 for DMARD-IR arms and from 20 to 501 for non-DMARD-IR arms.

Patient characteristics

The DMARD-IR and the non-DMARD-IR populations showed comparable distributions for gender, age, baseline C-reactive protein (CRP) level, and erythrocyte sedimentation rate (ESR). On average, 73 % of the patients were women in the DMARD-IR studies versus 70 % in the non-DMARD-IR studies; the average ages were 54 and 52 years, respectively. The median of the reported baseline CRP level was 2.2 mg/dL across the DMARD-IR studies and 2.6 mg/dL across the non-DMARD-IR studies. The median of the reported baseline ESR was 43 mm/h across DMARD-IR studies and 44 mm/h across non-DMARD-IR studies. There was a large variation in baseline CRP level and ESR across the non-DMARD-IR studies. As expected, the disease duration was skewed and longer among the DMARD-IR patients (median of 88 months) versus the non-DMARD-IR patients (median of 15 months). For the DMARD-IR population, the median of the reported baseline Health Assessment Questionnaire (HAQ) score across the studies was 1.7; the HAQ scale ranges from zero (no disability) to three (completely disabled). The median baseline TSS for the DMARD-IR population was 53. For the non-DMARD-IR population the median of the reported HAQ scores across studies was 1.0 and the median TSS was 11.9.

Joint structural deterioration over time in the DMARD-IR population

The mean CFB in TSS within the DMARD-IR population, as obtained from the individual studies, is presented in Fig. 2. These results were combined with a random-effects meta-analysis (Additional file 2, Model 1), where the change in TSS over time developed in a linear fashion from 1.14 at Week 12 to 9.84 at Week 104 (Table 3). There was a high probability that continuation of treatment with any one non-biologic DMARD in the setting of inadequate response would result in deterioration of the joint structure over time.

Fig. 2
figure 2

Mean change from baseline in TSS in the DMARD-IR population. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. AZA: azathioprine; DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to disease-modifying anti-rheumatic drugs (DMARDs) who are currently treated with one (other) non-biologic DMARD; MTX: methotrexate; TSS: modified Total Sharp Score

Table 3 Mean change from baseline in TSS and subscores in the DMARD-IR population as estimated with meta-analysis

Table 3 also presents the results of the analysis (Additional file 2, Model 3) that compared joint deterioration as observed with MTX and AZA. Continuation of treatment with AZA was associated with greater joint deterioration than continuation of treatment with MTX in this DMARD-IR population.

The progression in ES extracted from the individual studies and the pooled results (0.51 at Week 12 and 4.43 at Week 104) obtained with the meta-analysis (Additional file 2, Model 1) are presented in Fig. 3. There was a 98 % to 100 % chance that ES would deteriorate over time when DMARD-IR patients received minimal treatment with a non-biologic DMARD (Table 3). As inferred from the comparative analysis, a greater rate of deterioration was expected with AZA than with MTX.

Fig. 3
figure 3

Mean change from baseline in ES in the DMARD-IR population. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. AZA: azathioprine; DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to disease-modifying anti-rheumatic drugs (DMARDs) who are currently treated with one (other) non-biologic DMARD; ES: Erosion Subscore; MTX: methotrexate

When DMARD-IR patients were treated with MTX alone mean changes in JSN were 0.36 at Week 12 and 3.14 at Week 104 (Fig. 4; Table 3).

Fig. 4
figure 4

Mean change from baseline in JSN Subscore in the DMARD-IR population. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to disease-modifying anti-rheumatic drugs (DMARDs) who are currently treated with one (other) non-biologic DMARD; JSN: Joint Space Narrowing; MTX: methotrexate

For joint deterioration as measured with the Larsen score, only two studies with sufficient data were available for the DMARD-IR population [27, 31]. As neither study reported repeated intermediate observations, no meta-analysis model for change over time was estimated. At 24 weeks, a deterioration of 0.52 was observed with MTX [27]. In the other study the deterioration varied from 0.53 points with SSZ to 1.06 with gold salts at 52 weeks [31].

Joint structural deterioration over time in the non-DMARD-IR population

The rate of deterioration in the non-DMARD-IR population was not as great as for the DMARD-IR patients. The progression of the TSS for the non-DMARD-IR population is presented in Fig. 5 and Table 4. Individual study results were combined with a random-effects meta-analysis model, where the change in TSS from baseline developed in a non-linear fashion (fractional polynomial with p1 = p2 = 1. Additional file 2, Model 2) and shows an increase in TSS from 1.56 at 12 weeks to 5.13 at 104 weeks. Up to at least 104 weeks, there was at least a 94 % chance that continuing treatment with one DMARD would result in deterioration of the joint structure in the non-DMARD-IR population (Table 4). An analysis (Additional file 2, Model 4) comparing LEF and MTX based on two head-to-head RCTs indicated a similar rate of deterioration with LEF and MTX [37, 59].

Fig. 5
figure 5

Mean change from baseline in TSS in the non-DMARD-IR population. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. AZA: azathioprine; DMARD: disease-modifying anti-rheumatic drugs; DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to DMARDs who are currently treated with one (other) non-biologic DMARD; LEF: leflunomide; MTX: methotrexate; SSZ: sulfasalazine; TSS: modified Total Sharp Score

Table 4 Mean change from baseline in TSS and subscores plus Larsen Score in the non-DMARD-IR population as estimated with meta-analysis

For ES, the individual study results were also combined with a random-effects non-linear meta-analysis model (fractional polynomial with p1 = 1 and p2 = 0.5; Additional file 2, Model 2) and showed that ES worsened over time from 0.69 at Week 12 to 2.93 at Week 104 when non-DMARD-IR patients continued to receive one traditional DMARD (Fig. 6; Table 4). Comparative analysis showed no difference in rate of deterioration was expected between LEF and MTX (Additional file 2, Model 4).

Fig. 6
figure 6

Mean change from baseline in ES in the non-DMARD-IR population. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. DMARD: disease-modifying anti-rheumatic drugs; DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to DMARDs who are currently treated with one (other) non-biologic DMARD; ES: Erosion Subscore; LEF: leflunomide; MTX: methotrexate; SSZ: sulfasalazine

For the JSN and the Larsen scores, linear meta-analysis models were appropriate to reflect the deterioration up to 104 weeks (Fig.  7 and Table 4; Additional file 2, Model 1). One study evaluated treatment with NSAIDs only and reported a mean standardized change from baseline in the Larsen score of 0.01 up to 52 weeks [54].

Fig. 7
figure 7

Mean change from baseline in JSN Subscore in the non-DMARD-IR. Data are as observed in individual studies and estimated with meta-analysis. Solid line represents the mean estimate for a given treatment arm and the dashed lines show the corresponding 95 % credible interval. DMARD: disease-modifying anti-rheumatic drugs; DMARD-IR: patient population with moderate-to-severe rheumatoid arthritis with a history of inadequate response to DMARDs who are currently treated with one (other) non-biologic DMARD; JSN: Joint Space Narrowing; LEF: leflunomide; MTX: methotrexate; SSZ: sulfasalazine

Discussion

In this study, the development of joint structural deterioration among minimally treated patients with moderate-to-severe RA was estimated based on currently available published data. Estimates were obtained for two populations: a DMARD-IR population that consisted of patients who showed previous inadequate response with non-biologic DMARDs, and a non-DMARD-IR population that consisted of both non-biologic DMARD-naïve and non-biologic DMARD-experienced patients without an inadequate response to any DMARD. In the identified studies, the minimally treated DMARD-IR patients were receiving monotherapy with MTX, AZA, SSZ, or gold salts, with most patients receiving MTX. In the included non-DMARD-IR studies, DMARD treatment consisted of MTX, SSZ, LEF, CSA, HCQ, or D-penicillamine. Only one study was identified in which patients were treated with NSAIDs only, but this study was not included in the meta-analysis. For both populations, treatment with one DMARD resulted in deterioration of joint structure over a 2-year period as measured with the TSS, ES, JSN, and Larsen scores. Under the assumption that the minimal clinically important difference is about 1 % of the maximum of the possible TSS and Larson scores, the estimated changes over a 2 year period in terms of TSS can be considered relevant, in particular for the DMARD-IR population [64, 65]. Depending on the time assessed and the measure examined, the rate of deterioration in the DARD-IR population was about 1.5- to 2-times the rate of deterioration in the non-DMARD-IR population. Based on RCT evidence, the rate of deterioration with AZA was greater than with MTX in the DMARD-IR population. For the non-DMARD-IR population, LEF and MTX showed a similar progression over time.

The greater rate of deterioration observed in the DMARD-IR population compared with the non-DMARD-IR population makes sense, given the negative impact a history of non-biologic-DMARD failure should have on the effectiveness of continuation with a non-biologic DMARD. Related underlying causes for the difference in progression rates are possibly differences in disease duration, rheumatoid factor status, and disease activity.

However, it is important to note that for a subset of the identified studies it was not clear whether the patients were exclusively DMARD-IR. These studies were assigned to the non-DMARD-IR group to make sure that the DMARD-IR group was as homogenous as possible. As such, it is possible that the defined non-DMARD-IR population partly consisted of patients who might have a history of failed treatment with a DMARD. This possible misclassification might have overestimated the deterioration in this group, and should be kept in mind when comparing the degree of joint deterioration in DMARD-IR versus non-DMARD-IR populations.

The relevant studies were identified by means of a systematic search of the literature and included both RCTs and observational designs. Given the objective of the meta-analysis, only those arms of the comparative studies were selected in which patients were treated with NSAIDs or a single DMARD (with or without additional NSAIDs or corticosteroid use). Although many RCTs were included, often only one treatment arm (e.g., MTX-only arm from biologic trials in DMARD-IR populations) was used. As such, there was no difference in the way evidence obtained from observational studies and RCTs was handled. RCTs in which different single non-biologic DMARDs were compared were included and provided the evidence to allow comparisons between DMARDs. Comparative analyses were only possible for AZA versus MTX and for LEF versus MTX. Although RCTs provided comparative data for some other DMARDs, these were not part of a connected network of RCTs and could not, therefore, be used in the planned analyses.

Many of the MTX treatment arms comprising the DMARD-IR population were obtained from RCTs in which a biologic DMARD was evaluated. In the included studies the patients in these MTX arms were not assigned to biologic treatment within the study time horizon; only the MTX dose could be increased in case of non-response. Hence, the observed structural deterioration is a reflection of the limitations of MTX in this population.

The included studies reported joint deterioration at different time points, with outcomes reported up to 2 years of follow-up. With the meta-analysis models used, all the available time points were analyzed simultaneously to estimate a curve reflecting joint structural deterioration over time. It cannot be assumed that extrapolation of these curves beyond this 2-year period is a valid representation of joint structure deterioration over the longer term. The vast majority of studies used the modified Sharp score to analyze joint erosion and space narrowing. The modified score includes feet in the radiographic assessment, in addition to the scoring of wrists and hands as with the original Sharp score [610]. The study by Hamdy [21] used the earlier version of the Sharp score. Despite the differences in total score, we included the study by Hamdy in the analysis of the DMARD-IR population. We do not expect this variation in total score to be a cause of large between-study heterogeneity in development of TSS over time. In fact the observed TSS reported by Hamdy is very consistent with the other studies included in that analysis (Fig. 2).

The included studies were characterized by variability in patient characteristics, especially among the non-DMARD-IR studies. As a result, heterogeneity in joint structural deterioration over time was observed. In order to capture this heterogeneity, random-effects models were used; however, these models do not explain the heterogeneity. In the future, it will be of interest to evaluate whether certain patient characteristics are associated with differences in joint deterioration. However, meta-regression analysis where study level data is used to evaluate the impact of patient characteristics on outcomes or treatment effects can be prone to ecological bias [66, 67]. For such an evaluation it is preferred to have access to patient-level data. In this context it would be interesting to evaluate the independent effect of steroid use, disease duration, rheumatoid factor status, and disease activity on joint deterioration, for example.

There is evidence that the patients treated in biologic trials have changed substantially over the past decade. [68]. This is important to consider when using the findings of this meta-analysis to help interpret results of new placebo-controlled trials of biologics. In this context, a limitation of the current analysis is that any potentially relevant studies published after 2009 were not included.

Conclusions

Based on the currently available published evidence, it can be concluded that minimal treatment of RA with one non-biologic DMARD results in a high risk of deterioration of joint structures among patients who have shown an inadequate response to non-biologic DMARDs, as well as patients that have not (yet) shown an inadequate response. This finding is of relevance when assessing the relative benefit of arresting joint damage with new biologic agents based on findings of placebo-controlled trials in which patients randomized to placebo are ‘rescued’ with active therapy within 6 months.

Abbreviations

AZA: azathioprine; CFB: change from baseline; CrI: credible interval; CRP: C-reactive protein; CSA: ciclosporin A; DIC: deviance information criterion; DMARD: disease-modifying anti-rheumatic drugs; DMARD-IR: inadequate responder to DMARD therapy; ES: Erosion Subscore; ESR: erythrocyte sedimentation rate; HAQ: Health Assessment Questionnaire; HCQ: hydroxychloroquine; JSN: Joint Space Narrowing; LEF: leflunomide; MTX: methotrexate; NSAID: non-steroidal anti-inflammatory drug; RA: rheumatoid arthritis; RCT: randomized controlled trial; SD: standard deviation; SSZ: sulfasalazine; TSS: Total Sharp Score