Background

Methotrexate (MTX) is now the conventional synthetic disease-modifying antirheumatic drug (csDMARD) of first choice, either as monotherapy or combination therapy, for most patients with rheumatoid arthritis (RA) [1]. This is emphasised in a number of international and national guidelines [2,3,4,5]. However, response to MTX, although better than to most other csDMARDs, is not universal. In observational studies approximately 30% of patients discontinue MTX in the medium term - around half due to inefficacy and half due to adverse events [6, 7]. Patient-related factors such as female gender and current smoking are associated with MTX non-response [6, 7]. Disease related factors such as disease duration, disease activity, rheumatoid factor (RF) and anti-citrullinated protein antibody (ACPA) status are moderately predictive of inefficacy [6, 7]. Psychosocial factors may also be important but have received little attention to date [8]. Response to treatment may be influenced by the patient’s social background [9, 10], by their existing beliefs about their illness and the likely efficacy of the drug, and by whether they have actually taken the medication (adherence) [11]. Genetic or other biological factors may also influence drug response [6, 7].

Many previous studies have attempted to identify independent predictors of response to MTX, although frequently without assessing the ability to assign probabilities of response to individual patients [12,13,14]. Those prediction models that have been developed have used data from the restricted populations and rigid treatment regimens of clinical trials [15, 16]; have used only small numbers of participants (fewer than 100) from observational studies [17, 18]; or have analysed the outcome of treatment discontinuation rather than a broader assessment of patient condition [19]. The value of predictions from such models for “real world” patients with RA about to start MTX for the first time is uncertain. More likely to be of use is a model developed in an observational study including patients seen in routine clinical practice using readily available or easily measurable demographic, clinical and psychosocial factors. If such a model could identify those unlikely to respond to MTX prior to starting therapy, with sufficient accuracy to be clinically useful, it could enable earlier access to alternative medications such as biologic therapy and the avoidance of disease progression for some patients.

The objectives of this study were, in a large national multi-centre observational study of patients with RA or undifferentiated polyarthritis (UP) commencing MTX for the first time, to (1) describe the pattern of 6-month treatment response, (2) identify patient-specific, disease-specific and psychosocial predictors of primary non-response to MTX, (3) combine predictors of non-response in a model that could be used to assign probability of non-response at the individual patient level and (4) test the accuracy of the model.

Methods

Study design and study population

The Rheumatoid Arthritis Medication Study (RAMS) is a large national (UK) multi-centre (n = 38 centres) study. To be eligible for RAMS, patients had to (1) be aged 18 years or over, (2) have a physician diagnosis of RA or UP and (3) be about to start MTX for the first time, either as monotherapy or in combination with other csDMARDs, including oral steroids. Patients were not eligible if they had current or previous exposure to a biological DMARD (bDMARD). RAMS was approved by the Central Manchester NHS Research Ethics Committee (reference 08/H1008/25) and all patients provided written consent.

The decision to start MTX, the dosage and mode of administration, and whether to use MTX as monotherapy or in combination were made by the patient’s rheumatologist based on clinical need, local practice and national guidelines [3]. Patients were generally recruited following the drug education visit and prior to taking their first dose of MTX.

Baseline assessments

Demographic and lifestyle data collected at baseline, and relevant to this analysis, included age, gender, height and weight to calculate body mass index (BMI), smoking status (current/former/never), current alcohol intake (units/fortnight) and current caffeinated tea and coffee consumption (cups/day). Socio-economic status was assigned using the Index of Multiple Deprivation (IMD) 2010 based on the patient’s postcode, where a higher IMD score represents a more deprived area [20].

Disease-specific data were collected from the patient by a research nurse and supplemented with information obtained from medical records, including symptom duration; 28 tender and swollen joint count; individual 1987 American College of Rheumatology (ACR) classification criteria for RA [21]; previous csDMARD history; current oral steroid use; intramuscular or intra-articular steroid injections in the past week; current use of non-steroidal anti-inflammatory drugs (NSAIDs); duration of morning stiffness; and serum creatinine. Self-reported comorbidities were selected from a list of predefined conditions (high blood pressure, angina, heart attack, transient ischaemic attack, stroke, epilepsy, asthma, chronic bronchitis/emphysema, bronchiectasis, peptic ulcer disease, liver disease, renal disease, tuberculosis, diabetes mellitus, hyperthyroidism, depression and cancer).

Patients also completed a questionnaire, including pain, fatigue and general well-being visual analogue scales (VAS) (0–100 mm, with 100 mm the worst score); the British version of the Health Assessment Questionnaire (HAQ) (score range 0–3) [22]; the Hospital Anxiety and Depression Scale (HADS) (score ranges 0–21) [23] and the Beliefs about Medicines Questionnaire (BMQ) (score ranges 5–25) [24], with higher values of the HAQ, HADS and BMQ representing reduced physical function, greater indication of anxiety or depression and stronger beliefs about medication necessity or concerns, respectively. The brief Illness Perception Questionnaire (IPQ-brief) [25] was used to categorise patients’ illness representations as positive or negative [26].

Blood samples were taken at baseline and sent to the UK Biobank, Stockport, UK for the measurement of C-reactive protein (CRP) (Beckman Coulter AU5400, CRP assay OSR6147; mg/l) and RF (Beckman Coulter AU5400, RF latex assay OSR61105; IU/ml). RF values in excess of 14 IU/ml were taken to indicate RF positivity. If blood samples were not available to measure CRP, recorded CRP values from medical notes were used. The DAS28 was calculated using the CRP, 28-joint counts and VAS for general well-being [27].

Follow-up assessments

Patients were followed up at 3 and 6 months. Changes in DMARD therapy, including MTX, were recorded and the DAS28-CRP was measured at each visit. If MTX therapy had been stopped, the reason for stopping and whether treatment would be restarted were also recorded.

Outcome: European League Against Rheumatism (EULAR) non-response

Non-response to treatment at 6 months was defined as “no response” using the EULAR response criteria [28], i.e. Disease Activity Score in 28 joints (DAS28) improvement ≤ 0.6, or DAS28 improvement > 0.6 but ≤ 1.2 and 6-month DAS28 > 5.1. In addition, patients who had discontinued MTX by 6 months, i.e. had stopped MTX and did not plan to restart, due to inefficacy were classified as non-responders, as were patients who commenced bDMARD treatment by 6 months. “Moderate” or “good” responders by the EULAR criteria were considered responders, as were patients who discontinued MTX by 6 months due to remission.

Participant selection

To allow for sufficient follow-up time, the current analysis included RAMS participants recruited by 30 September 2015. Patients without a 6-month follow-up record were excluded, unless they had discontinued MTX by their 3-month follow up and so could be classified as non-responders. Also excluded were patients with unknown MTX exposure status at 6 months, those who had discontinued MTX by 6 months for reasons other than inefficacy or remission (e.g. adverse events), and those who had not discontinued MTX by 6 months but for whom the 6-month EULAR response was unavailable (see Fig. 1). If MTX or restart status at 6 months was unknown, 3-month records were checked for evidence of having discontinued at this time point before excluding a patient.

Fig. 1
figure 1

Flow diagram of participant inclusion. RAMS Rheumatoid Arthritis Medication Study; MTX, methotrexate; EULAR, European League Against Rheumatism

Statistical analysis

All variables were assessed for their univariable and multivariable association with non-response to MTX at 6 months using logistic regression. Backward selection was used to successively remove non-significant terms (p ≥ 0.05) from a full multivariable model containing all variables. Forwards selection was also used to successively add significant predictors (p < 0.05) into an empty model to validate the list of predictors derived by backwards selection. If the backwards and forwards selection processes delivered different sets of predictors, backwards selection was applied again on the pooled set of predictors derived from both approaches to produce a final model. The ability of the final model to discriminate between responders and non-responders was assessed using the area under the receiver operating characteristic curve (AUC). Agreement between predicted probabilities and observed outcomes was assessed using a calibration plot of the observed proportions of non-responders in each decile of predicted probability of non-response plotted against the mean predicted probabilities for the deciles. As performance was assessed using the same data used to build the model, an estimate of the optimism in the AUC value was produced using 200 bootstrapped datasets. The modelling procedure was followed afresh in each bootstrap dataset and the estimate of optimism was calculated as the average difference between the AUC achieved by a model in its own bootstrap dataset and the AUC achieved by that model in the original dataset [29]. How the model might perform in clinical practice was explored by calculating the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) when using different cut-offs of predicted probabilities of non-response as thresholds for classifying individuals as having a high risk of non-response.

Rates of missing data were calculated for all potential predictor variables and an analysis using multiple imputation with chained equations to impute missing values in all candidate predictor variables in 50 imputed datasets was performed. All analyses were performed in Stata 13.1 [30].

Results

Of 1656 patients recruited by 30 September 2015, 1050 were included in the current analysis (Fig. 1): 707 (67%) female, median age 59 (IQR 49–68) years and median symptom duration 9 (IQR 4–28) months (Table 1). Of the patients, 66% (584/889) were RF positive and 82% (787/962) satisfied the 1987 ACR criteria for RA at baseline; 77% of patients were starting MTX as their first csDMARD; 18% were currently on another csDMARD; and 4% had prior but not current exposure to one or more csDMARDs (Table 1): 41% were taking oral corticosteroids and/or had recently (within the last week) received an intramuscular corticosteroid injection, and 3% of patients had received an intra-articular steroid injection in the previous week (Table 1). Almost all participants (1003/1005) were starting folic acid at baseline. Starting doses of MTX were recorded at baseline for 1042 (99%) participants and ranged from 2.5 to 25 mg/week, with plans to incrementally increase the dose in 43% (422/984) of cases; 98% (963/978) of participants were starting orally administered MTX.

Table 1 Baseline characteristics of the whole cohort and divided by responder status

Outcome: non-response

At 6 months 449/1050 patients (43%) were classified as non-responders. Table 1 gives baseline characteristics stratified by response status at 6 months. In the univariable analysis, significant predictors of MTX non-response (OR (95% CI)) included higher BMI (1.02 (1.00, 1.05) per kg/m2), current smoking (1.78 (1.28, 2.48) compared to never smoking), longer symptom duration (1.00 (1.00, 1.00) per month); not being RF positive (0.66 (0.50, 0.88) for RF positive compared to not); not satisfying the 1987 ACR criteria (0.59 (0.43, 0.83) for satisfying the 1987 ACR criteria compared to not); lower HAQ score (0.76 (0.64, 0.90) per unit increase in HAQ) and lower DAS28 (0.54 (0.48, 0.60) per unit increase in DAS28) (Table 2). In the multivariable model, not being RF positive (0.62 (0.45, 0.86) for RF positive compared to not), higher HAQ score (1.64 (1.25, 2.15) per unit increase in HAQ), higher tender joint count (1.06 (1.02, 1.10) per additional tender joint), lower DAS28 score (0.29 (0.23, 0.39) per unit increase in DAS28) and higher HADS anxiety score (1.07 (1.03, 1.12) per unit increase in HADS anxiety) were independent predictors of MTX non-response (Table 2). The results using 50 multiple imputation datasets to account for missing values were almost identical (results not shown).

Table 2 Univariable and multivariable analysis of predictors of MTX non-response in all subjects and excluding those in remission at baseline

Sensitivity analysis

The most surprising result was the relationship between a lower DAS28 score and non-response. This may be an inevitable consequence of the method of calculating non-response. In order to be classified as a responder to MTX by achieving a moderate or good EULAR response, it is necessary for a patient’s DAS28 score to fall by at least 0.6. If patients start with a relatively low DAS28 score they have less potential for achieving such a response. Indeed, since the DAS28-CRP(4) formula includes a constant of 0.96, a DAS28 of 1.56 is the minimum that can fall by 0.6 or more and be classified as response. At baseline, 226 (22%) patients had low disease activity (LDA) (DAS28-CRP ≤ 3.2), including 102 (10%) patients in remission (DAS28-CRP ≤ 2.6). Of these, 72% (162/226) and 88% (90/102), respectively, were non-responders, compared to 43% of all patients (Table 1). 40% (88/219) of those with LDA and 38% (38/99) of those in remission were on oral corticosteroids at baseline or had received an intramuscular corticosteroid injection in the past week, similar to the 41% of the whole cohort.

To further explore the role of baseline DAS28, we conducted two sensitivity analyses, first excluding patients in remission and second excluding patients with LDA but otherwise following the modelling procedure described above. The multivariable models excluding those in remission and those with LDA contained the same predictors as the main model except for the addition of BMI and the removal of HAQ score and TJC28 (Table 2 and Additional file 1: Table S1). Hence the predictors common to all three multivariable models were RF status, DAS28-CRP and HADS anxiety score.

Additionally, a model was developed to predict failure to achieve LDA (i.e. DAS28-CRP > 3.2) at 6 months (not requiring a minimum improvement in DAS28). Higher BMI, higher HAQ score, higher TJC28 and higher HADS anxiety score, but not lower baseline DAS28, were predictive of failing to achieve LDA at 6 months (Additional file 1: Table S2).

Model assessment

The model predicting non-response in all patients had an AUC of 0.77 (95% CI (0.73, 0.80)) (Table 2), reduced to 0.74 when correcting for optimism. Excluding those in remission or with LDA reduced the AUC to 0.72 (0.68, 0.76) (Table 2) and 0.71 (0.66, 0.75) (Additional file 1: Table S1), respectively. Calibration plots in Fig. 2 and Additional file 1: Figures S1 and S2 indicate that the models have similar properties across the deciles of predicted probabilities. For the outcome of “failure to achieve LDA at 6 months”, the AUC was 0.73 (0.70, 0.77) (Additional file 1: Table S2) and the calibration plot is shown in Additional file 1: Figure S3.

Fig. 2
figure 2

Calibration plot for multivariable prediction model for non-response to methotrexate

Table 3 shows the sensitivity, specificity, PPV and NPV when predicted probability cut-offs ranging from 0.5 to 0.9 are used as thresholds for classifying individuals as being at high risk of non-response. Using a cut-off of 0.8 to indicate high risk, 49 patients (6%) would be predicted to be at high risk of non-response and 98% of these would indeed fail to respond (PPV); 61% of those predicted to respond would do so (NPV). On average the model would need to be applied to 17 patients to identify one individual at high risk of non-response (“number needed to test”).

Table 3 Sensitivity (Sen), specificity (Spec), positive predictive value (PPV) and negative predictive value (NPV) for a range of probability cut-offs for classifying those at high risk of non-response (main model)

Discussion

In this large observational study investigating response to MTX among patients with RA in the current era, 43% of patients were classified as non-responders by 6 months after starting treatment, with those discontinuing MTX due to adverse events excluded from the analysis. Baseline predictors of non-response in a multivariable logistic regression model were RF negativity, higher HAQ score, higher tender joint count, higher HADS anxiety score and lower disease activity. The AUC was 0.77 (0.74 optimism-corrected). The AUC was lower in models that excluded either all those in remission or all those with LDA at baseline when attempting to address the fact that it was harder for those with lower baseline DAS28 scores to achieve the definition of response. All models included RF negativity, lower baseline DAS28-CRP and higher HADS anxiety score as predictors of non-response. This is the first study to explore potential psychological factors in prediction of individual MTX non-response and all models retained HADS anxiety score as an independent predictor. Hence this psychological predictor added significant additional predictive information once the clinical predictors in the multivariable model had been accounted for. While the design of the current study does not allow us to examine the mechanism by which anxiety is associated with non-response, and this relationship requires further research, anxiety could be considered a modifiable risk factor, suggesting that the shared decision-making between patient and rheumatologist [2] should be mindful of anxiety issues, and that patient education prior to starting MTX should address anxiety. Although the literature on the association between patient anxiety and response to treatment in RA is limited, a recent study [31] did report that depression and anxiety (without differentiating between the two) may reduce likelihood of remission in those treated with MTX or tumour necrosis factor inhibitors.

While previous attempts have been made to develop models to predict response to MTX [15,16,17,18,19], this is the first model to be developed using a large cohort of patients with RA starting MTX for the first time and recruited from routine clinical care with few exclusions, i.e. representative of the setting in which such a model might be applied. That is, our model is designed to be applied in the real-world population of those about to commence MTX, which includes individuals with disease activity lower than might be expected. Final sets of predictor variables vary between published models, with only some measure of patient condition at baseline, be it TJC [15], DAS [16, 17] or HAQ [19], common to most studies. While lower baseline DAS28 was found to be associated with non-response to MTX when defined primarily using the EULAR response criteria, higher baseline DAS28 was associated with failure to achieve LDA (although significant only univariably, with higher TJC being retained in the multivariable model), which matches findings elsewhere [16]. This is a reminder that models are outcome-specific and that the clinical relevance of outcomes should be considered.

If a predicted probability of 0.9 or above from the model was used to identify patients at high risk of non-response, 100% of those meeting this criterion would go on to be non-responders. However, only 4% of non-responders would be identified in this way. Reducing the cut-off to 0.8 would identify 14% of non-responders, but at the expense of 2% of those labelled as high risk actually being patients who would respond to MTX. The trade-off between the delay in accessing alternative medications for those who will not respond to MTX but are predicted to do so, and the over-treatment with alternative medications of those who would respond to MTX but are predicted not to, is unlikely to be an equally weighted one. Deciding where to draw an appropriate threshold for a label of high risk for clinical practice requires consideration of the treatment options available in a particular setting, their benefits and risks for individual patients, and their health economic implications. Of course, even with perfect prediction of non-response to MTX therapy, there is no guarantee that a better response would be achieved with alternative treatments. Truly informed decision-making by clinicians and their patients would require personalised predictions of patient outcomes for a range of treatment options, a scenario which is still some way off.

The strengths of this analysis include the large sample size and the fact that it reflects use of MTX according to current guidelines and so supersedes earlier work [12,13,14,15,16,17,18,19]. The definition of non-response at 6 months embraced those who remained on the drug but had not exhibited enough improvement to be classified as moderate or good EULAR responders and also those who had discontinued the drug due to inefficacy or started a bDMARD. Other strengths were the inclusion of potential psychological predictors and using multiple imputation to provide reassurance of the robustness of the results to missing predictor data.

This study also has some limitations. The non-response rate was high. This may, in part, be due to suboptimal dosing or route of administration. Although the study did not dictate the treatment protocol, all patients should have been managed according to national guidelines [3] in which escalation of MTX is permitted and combination therapy is encouraged. We do not know the reasons for any deviations from treatment guidelines, but this prediction model may help guide clinicians as to which patients are less likely to respond as they are currently practicing and encourage them to consider treating more intensively. As we have attempted to predict non-response using only information available before the commencement of MTX, we have not considered the relationship between time-varying MTX dose and non-response. Titration is highly influenced by starting dose and patient response to treatment. The goal of the current work is to try and predict response prior to the start of MTX and, although maximum MTX dose and rate of titration may also be associated with response, that information would not be available pre-treatment. Further research is required to specifically investigate how rate and characteristics of MTX titration may also influence response, taking early response and adverse events into account. It seems likely that the observed association between lower DAS28 at baseline and subsequent non-response is explained by less scope for improvement (the key component of response) for those with lower disease activity. Since patients were recruited from routine clinical care, a high proportion of patients (41%) received oral or intramuscular steroids between the decision to prescribe MTX and the baseline assessment. We therefore performed sensitivity analysis, stratifying by steroid use, and the results were very similar (results not shown). The aim of this study was to use demographic, clinical and psychosocial variables that are readily available or easily measurable, to predict non-response. In future we aim to add genetic and metabolic predictors with the hope of improving the accuracy and clinical applicability of the model. Finally, these models were only validated internally. RAMS continues to recruit new patients so there will be an opportunity for further internal validation (“temporal validation”) in the future. However, external validation in an independent dataset, which includes information on the relevant predictor variables that would be needed before the models could be considered for clinical use.

It might be reasonable to consider by-passing MTX therapy altogether in patients predicted to be very unlikely to respond. This model assigns ≥ 90% probability of non-response to only a tiny proportion (2%) of patients. Would it be reasonable to accept a lower probability of non-response as a guide to MTX prescription - or a guide to starting combination therapy? This depends to some extent on the alternative forms of treatment and their efficacy in individuals predicted not to respond to MTX, and their cost. In the current situation it seems reasonable to continue to prescribe MTX for most patients in whom it is not contra-indicated but with a low threshold to move on to stronger combination or biological therapy if there is non-response at 6 months.

Conclusions

We have developed a model to predict non-response to MTX using data from a large contemporary observational study of patients with RA and UP commencing MTX for the first time. This is the first such model to consider patient-specific, disease-specific and psychosocial predictors. Using a high predicted probability to classify patients as at high risk of non-response would identify a small proportion of such individuals with perfect specificity. Patient anxiety was a multivariable predictor of non-response to MTX, a relationship that requires further research and which could be addressed prior to treatment commencement.