Introduction

Trials in rheumatoid arthritis (RA) traditionally classify patients into responders and non-responders. This approach reflects European League Against Rheumatism (EULAR) and American College of Rheumatology (ACR) response criteria [1]. It identifies effective new treatments and compares efficacies of existing treatments. However, wider categorisation of responses may be preferable for individualising care using treat-to-target [2] approaches.

Latent class models have been used to analyse long-term observational studies of RA to identify sub-groups of patients for some time. They have evaluated changes in disability [3, 4], psychological distress [5] and disease activity [6]; between three and six distinct trajectories were reported in these different studies. More recently, Bykerk et al. extended using disease trajectories to identify different patient groups in a clinical trial [7]. Their post hoc analysis of patients receiving Tofacitinib over 2 years in a phase III trial identified five different trajectories in the disease activity score for 28 joints using the erythrocyte sedimentation rate (DAS28-ESR). When latent class models were applied to patients with early RA managed by treat-to-target approaches enrolled in the DREAM and BARFOT registries [3, 8], three distinct responder groups were identified from changes in DAS28-ESR scores. The best responders achieved remissions whilst the worst responders had persistently active disease. There was a similar pattern of three responder groups in treated early RA patients assessed with the Simplified Disease Activity Index [9].

Experience in these observational studies and the analysis of existing clinical trial data suggests that when patients with RA are managed using treat-to-target approaches, identifying groups with different outcome trajectories gives useful information about the benefits of active treatment. We examined this concept in a post hoc analysis of patients randomised to receive intensive treatment in the TITRATE trial [10], which individualised treatments with the goal of achieving remission at 12 months. Our analysis addressed three issues: first, the practicality of using latent class modelling in trial patients followed over 12 months and the number of distinct trajectories identified; second, baseline factors influencing membership of the different trajectories; and third, the effects of different components of intensive management on the membership of different trajectories.

Methods

Patients studied

The TITRATE trial [10] enrolled 335 patients with RA from 39 UK centres; 168 were randomised to receive intensive management. The aim of the original trial was to test the hypothesis that intensive management resulted in more remissions at 12 months than standard care in patients with moderate RA. The trial confirmed this hypothesis [10].

TITRATE enrolled patients aged ≥ 18 years who met the 1987 American College of Rheumatology (ACR) or 2010 European League Against Rheumatism (EULAR)/ACR classification criteria for RA [11, 12]. They had received at least 6 months conventional disease-modifying anti-rheumatic drugs (DMARDs), were currently receiving at least one DMARD, and had moderate/intermediate disease activity (DAS28-ESR 3.2–5.1). Patients were excluded if they had co-morbidities making intensive treatment inadvisable, had failed five or more conventional DMARDs, had taken biologics, or had extensive joint damage. 168 patients were randomised to intensive management: 5 patients withdrew after screening and randomisation assessments; we therefore analysed the 163 with at least one follow-up visit.

Patients enrolled to the comparator standard care treatment arm are not included in the analysis for several reasons. First, they only had DAS28-ESR scores measured every six months and constructing latent class trajectories in six monthly data is not comparable to comparing monthly changes. Second, the main trial paper compared the two treatment groups at the trial endpoint, showing significant differences between groups; there are cogent reasons not to continually reanalyse trial data using different approaches to compare treatments. Third, our research question is whether the simple approach of classifying treatment effects into responders and non-responders is ideal, or whether using latent class trajectory modelling identifies more groups in patients receiving active treatment. Investigating this aim only requires evaluating the actively treated patients.

Intensive treatment

Intensive management was delivered by trained rheumatology nurses or comparable healthcare professionals who had all been specifically trained to deliver the management regimen. Decisions about treatments were made by the whole clinical team looking after the patients. A wide range of considerations were involved in management decisions and the DAS28-ESR was used as part of this process. Treatment with conventional DMARDs and biologics was optimised following a treatment algorithm which included also giving intra-muscular (IM) steroid injections. The nurses also provided supportive management for pain and fatigue. HAQ, pain and fatigue were assessed 6 monthly; they changed significantly between baseline and 12 months. PHQ-9 was only measured at baseline. These assessments were not used for decisions about intensive management. Full details are provided in the trial report [10] .

Outcome measures

DAS28-ESR and its components were measured monthly. This index involves making a calculation based on four standard clinical assessments: tender joint counts for 28 joints, swollen joint counts for 28 joints, the patient global assessment on a 100 mm visual analogue score and the ESR. Further details of the DAS28-ESR score and its variants are outlined by Van Riel and Renskers [13]. C-reactive protein (CRP), assessor global assessments pain and fatigue (on 100 mm visual analogue scales) and function measured by the Health Assessment Questionnaire (HAQ) were assessed 6 monthly. Demographic details, smoking habits, body mass index (BMI) and mood (Patient Health Questionnaire-9, (PHQ9)) were assessed at baseline.

One assessment approach that was not used in the TITRATE trial was the American College of Rheumatology responder criteria. The main reason for not evaluating these is that they are not employed in routine clinical settings in England. In addition, the trial assessed patients with moderate disease activity and the value of these criteria in such patients is uncertain.

Statistical methods

Baseline information were summarised using the mean, with accompanying standard deviations (SD) for continuous variables, while binary or categorical variables were summarised using frequency and percentage.

Group Based trajectory Models (GBTMs) were used to identify clusters of disease activity trajectories over 12 months following the commencement of treatment [14]. Latent Class Analysis (LCA) is a statistical measurement model in which individuals can be classified into mutually exclusive and exhaustive groups or latent classes, based on their pattern of response on a variable or a set of variables. In our model longitudinal data from DAS28-ESR was used to generate the groups; it was a different approach to simply dividing patients into groups based on 12-month assessments, or any specific time point’s assessment. The primary outcome measure used to derive the trajectories was DAS28-ESR and no other covariates were included. GBTMs are likelihood-based methods, which are valid using only observed data, under a missing-at-random assumption. When this approach is taken patients do not form two clearly defined groups of responders and non-responders: instead GBTMs identified three different developmental trajectories over the 12 months of the trial; the model took account of all the repeated measurements, rather than one single time point, at any time during the follow-up period or simply using the final time point at end of the study.

The best choice of the number of latent classes (3 classes vs. 4 classes) was made using Bayesian Information Criterion (BIC); the average posterior probability of class membership exceeding 0.7, entropy that indexes classification accuracy, with values closer to 1 indexing greater precision [15]. In addition, judgment that classes are clinically meaningful and represent distinct features was taken into consideration supporting the formal statistical tests [16].

Associations of trajectory classes with outcomes, baseline covariates, RA medication prescribing, alcohol and smoking status were assessed either using analysis of variance or Fisher’s exact test as appropriate for the variable being compared. We used Spearman’s correlation to test the association between the variables before fitting the multivariable model. We used change in outcome of interest at 12-months instead of actual score at 12-months to adjust for baseline values. As expected, change in DAS28-ESR was associated with all three endpoints: the strongest correlation was with change in pain (correlation coefficient (r) = 0.49), followed by change in HAQ (r = 0.33) and change in fatigue (r = 0.28). Since DAS28-ESR was used to derive the trajectory grouping, it was not included in the multivariable model. In this model, demographic variables (age, gender, ethnicity, smoking history and disease duration), baseline body mass index (BMI), baseline PHQ-9 and change in HAQ, pain, and fatigue, respectively were included. Forward stepwise multinomial logistic regression with trajectory groups as the dependent variable was used, p-value was set to 5%, which meant that variables with a significance level p ≥ 0.05 were removed from the model. Multinomial logit model was chosen as the dependent variable is categorical with more than two levels. In addition, a sensitivity analysis was done, where the p-value was set to 15%; the results were the same.

All analyses were carried out using STATA (StataCorp. 2017. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC). P-values < 0.05 were considered statistically significant.

Results

Trajectory groups

The 163 patients receiving intensive management were categorised into three groups based on changes in monthly DAS28-ESR scores over 12 months analysed using GBTMs (Fig. 1). These groups were termed good responders (n = 40), moderate responders (n = 76) and poor responders (n = 47). Figure 1, displays the shape of progression of DAS28-ESR scores over 12 months of follow up for the three groups.

Fig. 1
figure 1

Mean Disease Activity Profiles Over 12 Months In The Three Group Based Trajectory Models (Good, Moderate And Poor Responders)

Means and 95% confidence intervals are shown.

Changes in the individual components of DAS28-ESR scores showed a broadly similar pattern to changes in the composite score (Fig. 2). They all fell mostly in the good responders with the exception of the ESR, which was consistently lower in good responders but changed relatively little over time.

Fig. 2
figure 2

Mean 12-Month Profiles Stratified In The Three Group Based Trajectory Models (Good, Moderate And Poor Responders)

Good responders – red (26%); moderate responders – blue (45%); poor responders – green (29%). Black circles are mean scores.

Baseline data

Table 1 shows overall characteristics including ethnicity, age and disease duration were similar between the groups; there was a small, non-significant, excess of males in good responders.

Table 1 Baseline Data For The Three Responder Groups

Good responders had a significantly lower mean DAS28-ESR with relatively few scores above 4.5. The only individual component of the DAS28-ESR which was significantly different between groups was the ESR; it was lower in good responders. Other assessments of disease activity like C-reactive protein and assessors’ global rating, did not differ between groups.

There was a significant difference in BMI; it was highest in the poor responders. Obesity (BMI > 30) was present in 4/40 (10%) of good responders, 30/76 (38%) moderate responders and 20/47 (43%) of poor responders (P = 0.002 on Chi Square testing). Three other measures were significantly different between groups. HAQ scores and fatigue were lower in good responders together with PHQ-9 scores. Depression (PHQ-9 ≥ 15) occurred in 3/39 (8%) good responders, 11/76 (14%) moderate responders and 18/47 (38%) poor responders (P < 0.001 on Chi-Square testing).

Drug treatments

There was no evidence that baseline drug treatments were different between groups; all patients were taking at least one initial conventional DMARD.

In total, 137/163 (84%) had one additional conventional DMARD, 63/163 (39%) had two additional conventional DMARDs and 4/163 (2%) had three additional conventional DMARDs (Table 2), and there were no significant differences among the groups. Similarly, there were no differences between groups in the numbers of patients increasing or decreasing doses of conventional DMARDs. In total, 69/163 (42%) increased their dose and 15/163 (9%) reduced their dose. Finally, there was also no difference in oral steroid use between groups; 15/163 (9%) received oral steroids. There was a trend for poor responders to receive more steroid injections.

Table 2 Treatments And Outcomes For The Three Responder Groups

There was a significant difference in biologic use. A first biologic was received by 2/40 (5%) of good responders, 23/76 (30%) moderate responders and 24/47 (51%) poor responders (P < 0.001). Only a few patients received second or third biologics with no significant differences between groups.

Management approaches

Table 2 shows there were no differences between groups in patients’ attendance for monitoring visits; 139/163 (86%) patients attended eight or more visits. However, there were differences in adherence to the treatment algorithm: it was followed in 330/373 (88%) visits in good responders, 519/681 (76%) moderate responders and 273/415 (66%) poor responders (P < 0.001). There were no differences between groups in the reasons for these decisions. Overall, 48% of decisions not to follow the algorithm involved patient choice, 5% involved adverse events, 4% involved inter-current illness and 43% involved clinical discretion.

There was no difference between groups in fatigue management approaches; 52-55% of visits involved advice on fatigue. However, there were significant differences in advice on pain: 249/387 (62%) visits in good responders involved advice on pain management compared with 501/708 (71%) in moderate responders and 314/438 (72%) in poor responders (P = 0.004).

Three other measures were significantly different between groups. HAQ scores and fatigue were lower in good responders together with PHQ-9 scores. Depression (PHQ-9 ≥ 15) occurred in 3/39 (8%) good responders, 11/76 (14%) moderate responders and 18/47 (38%) poor responders (P < 0.001 on Chi-Square testing).

End-point outcomes

Table 2 shows that mean DAS28-ESR changes and the numbers of patients in DAS28-ESR remission differed between groups with good responders achieving greater changes and more remissions than other groups. These differences reflect the construction of the groups.

There was a trend for mean HAQ changes to be greatest in the good responders, but this difference was not significant. However, significantly more good responders achieved HAQ scores < 1.0 than in other groups (66% vs. 41% and 26%).

Changes in pain and fatigue and the numbers of patients achieving end-point pain and fatigue scores < 20 were significantly different between groups. In each case the end-point outcomes were best in the good responders and worse in the poor responders.

Multivariate modelling

The multinomial logistic model showed only some predictors of response groups were independent of each other. A model incorporating demographic variables (age, gender, ethnicity, smoking history and disease duration), baseline body mass index (BMI), baseline PHQ-9 and change in HAQ, pain, and fatigue showed only three factors acted independently (Table 3). In moderate responders, only changes in pain at 12-months were significant at the 5% level when compared to good responders. In poor responders there were significant associations with changes in pain at 12 months, and also baseline BMI and PHQ-9.

Table 3 Multinomial Logistic Model For Response Groups

Discussion

Our analysis shows that it is practical and relevant to classify patients with RA receiving intensive management for moderately active RA using latent class modelling into different groups. When this approach is taken patients do not form two clearly defined groups of responders and non-responders; instead GBTMs identified three different developmental trajectories over the 12 months of the trial. These groups are likely to provide approximations for a more complex reality of clusters of patients following similar trajectories over time. It is probable that there are clinically important subpopulations of patients with RA that characterise longitudinal changes in disease activity. About one quarter of patients responded very well; another quarter did not respond at all; and about half showed moderate responses. Most good responders had endpoint remissions with low disability, pain and fatigue scores. Few poor responders achieved any favourable outcomes. There were important differences between groups both in their baseline predictors and in their relationship to different components of intensive management.

The good responders had the lowest baseline mean DAS28-ESR, ESR and BMI scores and only 10% were obese and 8% had depression. Only a few (5%) of the good responders required biologic treatments, suggesting patient-related factors are most important in determining response to treatment. Additional biologics were not needed in most good responders and the use of biologics was, consequently, not related to achieving remission. These findings reflect trial evidence about the value of intensive combination DMARDs provided by trials such as RACAT and TACIT [17, 18]. Although good responders followed the treatment algorithm most closely, this probably reflects the relative ease of doing so when patients responded to treatment. The decline in DAS28-ESR in these patients was apparent by three months. The TITRATE trial used DAS28-ESR to assess response because it was undertaken in English centres which traditionally use this assessment. It is possible that using C-reactive protein (CRP) in place of the ESR and constructing the DAS28-CRP may have given somewhat different findings. However, baseline and endpoint comparisons of DAS28-ESR and DAS28-CRP in the TITRATE trial did not show any clinically relevant differences between these measures [10].

The poor responders, who failed to show any reduction in DAS28-ESR and rarely achieved remission, reflect the concept of difficult to treat RA proposed by de Hair et al. [19]. However, as they had yet to fail two biologics, they could not meet the most recent EULAR criteria for this classification [20]. A recent systematic review by Roodenrijs et al. [21] concluded that the heterogeneity between individual patients with difficult to treat RA suggests a range of different pathogenic mechanisms are involved. Obesity was one factor, and it has been identified in a number of previous studies [22, 23]. High initial HAQ scores have also been associated with poor responses and more flares [24, 25]: Goetz et al. [24] reviewed 30 studies and found baseline HAQ scores were consistently associated with treatment responses; Bechman et al. [24] evaluated a single study and found the association between high HAQ scores and subsequent flares persisted when baseline DAS scores were taken into account in an adjusted analysis. Another factor which we found predicted poor responses was baseline depression; this has been identified previously in trials and observational studies [26,27,28]. Currently depression is rarely measured in either routine care or research settings.

Many trials compare treatments using a binary endpoint outcome, such as whether patients achieved remission. Using trajectories provides the opportunity to undertake a more nuanced assessment. Increasing both the numbers of good and moderate responders identified using GBTMs may be equally clinically relevant as simply increasing the numbers of patients achieving remission. Consequently, using GBTMs to assess response may identify differences in treatments which show superficially similar efficacy using binary outcome assessments.

Our study has a number of strengths. First, it involved patients from multiple English centres, and its findings are therefore likely to be generalisable. Secondly, it adopted a novel model-based approach which allowed formal identification of homogenous subgroups that is more flexible than the traditional binary division of response/no-response. Thirdly, the large number of repeated measures for each participant, and the use of an approach that met all the statistical criteria proposed for the assessment of good classification, shows the analysis was robust. Fourthly, the technical evidence provided by the statistics used such as BIC and Entropy, the posterior group membership probabilities were supportive of the classification achieved. Finally, the characteristics of the three groups indicate differences in baseline risks that are likely to have predicted the response to treatment at follow-up.

The study also has a number of limitations. Firstly, it had a relatively small sample size. However, there was substantial follow up data at 12 months and assessments of DAS28-ESR were made monthly with minimal missing data; the extensive repeated-measures data partially compensates for the relatively small number of participants [14]. Secondly, although the TITRATE trial had a control group, they were not assessed monthly and therefore GBTMs could not be used to assess their responses. If trajectories are to be used to test hypotheses in clinical trials both active and control groups will need to be followed with similar regular assessments, such as monthly DAS28-ESR scores. Finally, our results will need replicating in additional, potentially larger studies, to validate using GBTMs to assess trial outcomes in RA.

Conclusions

We conclude that GBTMs identified three trajectories of disease progression in patients with moderate RA treated intensively. Several baseline variables influenced membership of different trajectories such as depression and obesity. These findings raise the possibility that patients with RA with co-morbid depression and obesity may need additional or different treatment approaches. Such a possibility merits further research in larger groups of patients. It would also be relevant to explore the relationship of other patient related outcomes to GBTMs.