Background

Anti-tumour necrosis factor alpha (anti-TNF) agents, including infliximab, etanercept, adalimumab, golimumab, and certolizumab pegol, significantly reduce disease activity and improve functional ability among patients with spondyloarthritis (SpA), including ankylosing spondylitis (AS) and non-radiographic axial SpA (nr-axSpA) patients [1,2,3]. However, because of their similar high cost and potential side-effects, most health systems worldwide restrict access to all anti-TNF agents to SpA patients meeting specific clinical criteria. Van den Berg et al. described 23 different criteria sets from various international settings that designate which SpA patients are eligible to receive anti-TNF therapy [4]. Some of these criteria sets represent clinical recommendations, while others are reimbursement criteria. All of the criteria sets differ in terms of the diagnosis, disease activity level, and history of treatment failure required to begin anti-TNF therapy. Some criteria sets limit anti-TNF agents to patients with AS, a severe form of SpA in which bone damage is visible on X-ray; others approve anti-TNF use among patients with nr-axSpA, the term for SpA prior to the development of this radiographic damage. Many criteria sets that allow anti-TNF use by nr-axSpA patients require them to have sacroiilitis or spinal inflammation visible on MRI and/or elevated acute-phase reactants, such as C-reactive protein (CRP) or erythrocyte sedimentation rate (ESR), while others do not incorporate these additional markers.

The variation in these criteria sets means that patient access to anti-TNF therapy is more difficult in some settings than others. For example, only an estimated 50% or less of all SpA patients have AS [5], meaning far fewer SpA patients will be treated with anti-TNF therapy in settings that require radiographic damage. The prevalence of other clinical criteria commonly cited in anti-TNF access criteria also varies: elevated acute phase reactants such as CRP or ESR are present in only approximately 40–50% of patients with AS [6], while sacroiilitis visible on MRI appears to be present in less than half of patients with nr-axSpA [7, 8]. Currently, there is a lack of evidence to indicate how many SpA patients possess the unique combinations of clinical characteristics demanded by different sets of anti-TNF criteria across various settings. However, it is clear that by requiring anti-TNF users to meet clinical criteria present in only a portion of SpA patients, fewer individuals will be treated with anti-TNF therapy than if it were available to all. Importantly, the burden of SpA in terms of disease activity and impairment to be comparable among AS and nr-axSpA patients [9,10,11], indicating the need to treat both populations.

By limiting the number of patients treated, anti-TNF access criteria may be seen as a means of reducing the total budget impact [12] of anti-TNF agents. However, the cost-effectiveness of limiting anti-TNF therapy to patients meeting any particular set of clinical criteria has not been demonstrated. To date, some attention has been paid to the relative cost-effectiveness of anti-TNF agents in AS patients versus nr-axSpA patients [1], although the results are considered inconclusive. This is due in part to heterogeneity in the probability and magnitude of anti-TNF response observed across the small number of anti-TNF trials in nr-axSpA [13,14,15,16,17], which, notably, have included patients with different clinical characteristics. Although a meta-analysis indicates a slightly lower effect of anti-TNF therapy in the nr-axSpA population compared to AS [1], evidence from certain trials that have included both populations suggests the effect may be the same if patients are similar in terms of CRP levels, human leukocyte antigen (HLA)-B27 positivity, and presence of MRI inflammation [17, 18]. Unique combinations of clinical characteristics, such as those cited in anti-TNF access criteria, have not been studied in terms of their influence on the estimated cost-effectiveness of anti-TNF therapy.

The DESIR cohort is a longitudinal study of early SpA that provides clinical and cost data on a clinically heterogeneous population of both AS and nr-axSpA patients in France. Our objective was to explore how many DESIR patients would possess the unique clinical characteristics required to receive anti-TNF therapy in select settings and to examine costs and health outcomes in each of these subsets of patients. We then aimed to estimate anti-TNF cost-effectiveness over one year within each subset, with the goal of determining whether the current French restrictions on anti-TNF access [19] are the most cost-effective in that setting relative to other potential restrictions.

Methods

Study setting and data source

The current study was an analysis of data from the DESIR cohort, a 10-year prospective study of 708 early SpA patients recruited from 25 centres across France between October 2007 and April 2010 [20]. The DESIR cohort is a clinically heterogeneous SpA population whose characteristics have been extensively described [21,22,23]. At study entry, all patients were aged 18–50 and had symptoms of inflammatory back pain [24, 25] that had lasted >3 months and <3 years and was suggestive of SpA according to a rheumatologist’s assessment. Follow-up visits occurred every 6 months in the first 2 years and every year thereafter. Data from the first 3 years of DESIR follow-up, i.e., baseline visit (n = 708) plus follow-up visits at months 6 (n = 704), 12 (n = 698), 18 (n = 691), 24 (n = 692), 36 (n = 631) were available for this analysis.

The DESIR database contains clinical, quality of life, and cost data. The clinical data include many of the parameters commonly cited in access criteria for anti-TNF agents [4], including diagnosis; disease activity according to the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) and Physician Global Assessment (PhGA); X-ray, magnetic resonance imaging (MRI), and computerized tomography (CT) findings; acute phase reactants (e.g., CRP); and treatment history. The quality of life data collected in DESIR is derived from the Short Form 36 Health Survey (SF-36).

The DESIR cost data were derived from a recent cost-of-illness study, for which detailed costing methods, unit costs, and data sources have been described [26]. In summary, costing was conducted from a limited societal perspective, including all-cause direct medical costs (i.e., health resource use) and indirect costs (i.e., productivity losses), but excluding direct non-medical costs (e.g., transportation, devices, caregiver expenses), and expressed in 2013 Euros. Direct medical costs were grouped into health practitioner visits, hospitalizations including emergency room visits and surgeries, medical workups, and medications. Total direct non-medical costs were calculated as the reported number of consumed units of each cost component, multiplied by the corresponding unit cost, and summed across all categories and patients. Indirect costs were valued by multiplying the reported number of work days lost by a daily estimated wage per patient in 2013 Euros, which was based on reported professional category and average population wage data [27]. The age and sex distribution of DESIR the cohort was compared to that of the population of French workers from which average population wages were obtained and wages were not further adjusted for age and sex. Missing cost and clinical data were imputed using Monte Carlo Markov Chain multiple imputation, last observation carried forward, probabilistic imputation, or with negative values based on clinical expertise, as appropriate [26].

Selection of anti-TNF access criteria

Most patient characteristics cited in anti-TNF access criteria [4] are routinely collected in clinical practice for multiple purposes. Using DESIR clinical data, it is possible to assess patient satisfaction of the anti-TNF access criteria in place in France [19] and numerous other settings. For the purpose of the analysis, we sought to select a practical number of sets of access criteria with clinically meaningful differences between them and the French criteria, i.e., sets citing different markers of disease severity whose prevalence would vary within the DESIR cohort. The selection process was undertaken by the research team, which included a rheumatologist (BF), epidemiologist (SH), and biostatistician (DG) with knowledge of the DESIR cohort and database. By consensus, four sets of access criteria were selected, including those from Canada [28], Germany [29] Hong Kong [30], and the United Kingdom (UK) [31]. Based on their respective criteria, these sets were anticipated to result in multiple, distinct (though potentially overlapping) subsets of DESIR patients defined as eligible for anti-TNF therapy.

Creation of ‘study population’ datasets

We created five separate datasets containing the DESIR patients who satisfied the diagnosis and disease severity criteria for anti-TNF access in France [19], Canada [28], Germany [29], Hong Kong [30], and UK [31], respectively. These datasets were created to represent five separate ‘study populations’ of patients, each comprised of anti-TNF users and non-users who satisfied the same set of anti-TNF access criteria. As patients could satisfy multiple criteria sets, unique patients could appear in more than one study population dataset. However, as the five study population datasets were separate, only anti-TNF users and non-users who satisfied the same criteria could be compared to each other. This was done to help limit confounding by indication, as patients satisfying the same anti-TNF access criteria have comparable disease severity on a number of specific measures. Satisfaction of the treatment failure criterion, i.e., insufficient response to non-steroidal anti-inflammatory drugs (NSAIDs), was assumed for all patients.

In creating the five study population datasets, specific rules were applied in a basecase analysis and subsequently varied in sensitivity analyses. In all analyses, patients were required to satisfy the relevant criteria set no later than month 24. In the basecase analysis, anti-TNF use (yes/no) was defined based on the patient’s experience in the 1 year following the date of criteria satisfaction, which was taken as the index date for all patients. In the sensitivity analyses, anti-TNF use (yes/no) was defined over the entire study period, with date of criteria satisfaction taken as the index date for anti-TNF non-users and date of anti-TNF initiation taken as the index date for anti-TNF users. In all analyses, outcomes were observed in the 1 year following the index date. Because the start point for the 1 year observation period was defined differently in the basecase and sensitivity analyses, patients who were classed as anti-TNF non-users in the basecase analysis could be classed as anti-TNF users in the sensitivity analyses.

To be included in the basecase analysis, anti-TNF users were required not to have initiated anti-TNF therapy prior to criteria satisfaction (rule 1). Anti-TNF users were further required to have initiated the anti-TNF <6 months after criteria satisfaction (rule 2). Consequently, anti-TNF users who initiated anti-TNF before criteria satisfaction or >6 months after criteria satisfaction were excluded from the basecase analysis. These rules were applied in order to include only anti-TNF users with a similar length of anti-TNF exposure in the basecase analysis, in which outcomes were observed following the date of criteria satisfaction rather than therapy initiation.

In the first sensitivity analysis, anti-TNF users were permitted to receive the anti-TNF prior to criteria satisfaction (rule 1 lifted). In the second sensitivity analysis, anti-TNF users were permitted to receive the anti-TNF >6 months after criteria satisfaction (rule 2 lifted). In the third sensitivity analysis, anti-TNF users were permitted to receive the anti-TNF at any time point (rules 1 and 2 lifted). A separate sensitivity analysis was conducted to explore the impact of simulating a 24-week stopping rule for anti-TNF non-responders, defined as patients who did not achieve a 50% relative change or absolute change of 2 on the BASDAI scale [32] one visit post-therapy initiation. In this analysis, anti-TNF costs accumulated by non-responders after 24 weeks of therapy were excluded. Additional sensitivity analyses were conducted to explore the impact of excluding indirect costs in all scenarios.

Descriptive statistics

Sociodemographic and clinical characteristics at baseline and at time of criteria satisfaction among patients in each of the five basecase study population datasets were described in terms of mean (SD) and frequency (%) as appropriate. Descriptive statistics were also produced to describe the number of anti-TNF users in the DESIR cohort who did not satisfy any of the selected criteria sets (and were therefore excluded from analysis) as well as the number of anti-TNF non-responders in the basecase study population datasets and their total time on anti-TNF therapy.

Adjustment of costs and QALYs

To control for differences between anti-TNF users and non-users, we used linear regression models to estimate adjusted total costs (i.e., direct medical plus indirect costs) in the 1 year post-index. Independent variables considered to be potential confounders of the relationship between anti-TNF use and costs were first tested in univariate models of costs, specifically age, sex, education, marital status, disease duration, smoking (yes vs. no/do not know), HLA-B27 status and presence of peripheral arthritis at baseline, and the Bath Ankylosing Spondylitis Functional Index (BASFI), BASDAI, PhGA, CRP, and SF-36 at the patient’s index date. The same variables were then each tested in preliminary multivariate models of costs that included BASFI (the strongest predictor of costs in univariate analyses) and anti-TNF use. Independent variables that changed the coefficient for anti-TNF use by more than 10% in the preliminary multivariate models were included in the final multivariate costs model.

Total QALYs in the one year post-index were calculated using SF6D utility weights derived from SF-36 health states, following the area under the curve method [33, 34]. Again to control for differences between anti-TNF users and non-users, we used linear regression models to derive adjusted mean QALYs. Independent variables as above were first tested in univariate models then in preliminary multivariate models that included SF-36 at time of criteria satisfaction (the strongest predictor of QALY in univariate analyses) and anti-TNF use. Independent variables that changed the coefficient for anti-TNF use by more than 10% in preliminary multivariate models were included in the final multivariate QALY model.

Cost-effectiveness analysis using adjusted costs and QALYs

For each of the five study population datasets, we calculated the incremental cost-effectiveness ratio (ICER) comparing the costs and QALYs of anti-TNF users versus non-users, i.e., the incremental cost per additional QALY gained by treating with an anti-TNF, using the standard formula: [(Cost anti-TNF- Cost no anti-TNF)/(QALYs anti-TNF- QALYs no anti-TNF)]. To explore the range of uncertainty around mean costs and QALYs, we used non-parametric bootstrapping [35], repeating the same procedures for each study population datasets (i.e., group of patients satisfying a given criteria set). Specifically, 10,000 bootstrap samples were generated (i.e., by sampling with replacement), stratified by anti-TNF users and non-users. For each bootstrapped sample, linear regression models were fitted for costs and QALYs; although the models were fitted separately, the data used were from the same samples, meaning the interdependence of costs and QALYs was accounted for. Adjusted mean costs and QALYs and hence the incremental costs and QALYs were then estimated from the models. The 2.5th and 97.5th percentiles of the bootstrapped distribution were used to estimate 95% confidence intervals (CI) for the incremental costs and QALYs.

Results

Table 1 shows the diagnosis and disease severity criteria for anti-TNF access in France, Canada, Germany, Hong Kong, and the UK, as well as the number of DESIR patients who satisfied the respective sets. The criteria sets from the UK and Hong Kong both required a diagnosis of AS, while those from Canada, France, and Germany were inclusive of nr-axSpA patients. Anti-TNF access criteria from France were satisfied by the largest number of DESIR patients (197/708; 27.8%), followed by Germany (175/708; 25.1%), Canada (169/708; 23.8%), the UK (86/708;12.1%) and Hong Kong (61/708; 8.6%).

Table 1 Selected criteria sets and satisfaction at baseline among 708 DESIR patients

Table 2 shows the characteristics of anti-TNF users and non-users in each of the basecase study population datasets. The proportion of anti-TNF users was highest among patients who met the Hong Kong criteria (32/61; 52.5%), followed by the UK (40/86; 46.5%), Canada (71/169; 42.0%), France (80/197; 40.6%), and Germany (67/175; 38.3%). Among a total 225 anti-TNF users in the DESIR cohort, 107 (47.6%) never satisfied the French anti-TNF access criteria, while 94 (41.8%) never satisfied any of the selected criteria sets and were thus excluded from the analysis. The characteristics of excluded anti-TNF users are shown in Additional file 1: Table S1.

Table 2 Characteristics of DESIR patients satisfying selected criteria sets

Table 3 shows the unadjusted and adjusted costs of patients in the five basecase study population datasets. In final multivariate models, costs were adjusted for smoking and HLA-B27 status at baseline, and BASFI, PhGA, and CRP at date of criteria satisfaction; QALYs were adjusted for age, sex, education, smoking, HLA-B27 status and peripheral arthritis at baseline, and SF-36, PhGA, and CRP at date of criteria satisfaction. Table 4 shows the incremental costs and QALYs and ICERs over one year for each of the five study populations in the basecase analysis. The most favourable cost-effectiveness point estimate was derived from the study population satisfying the Hong Kong criteria (ICER €456,850), followed by Germany (€545,808), the UK (€766,217), and Canada (€818,186). The highest ICER was derived from the study population satisfying the French criteria (€1,105,859) However, as shown in Fig. 1, the confidence intervals surrounding the point estimates for the incremental costs and QALYs derived from each of the five study populations were overlapping, indicating uncertainty in the results of the analysis.

Table 3 Unadjusted and adjusted costs, SF6D utility scores and QALYs among DESIR patients satisfying selected criteria sets
Table 4 Comparative estimates of costs, QALYs, and ICERs: basecase analysis
Fig. 1
figure 1

Confidence intervals around ICERs from each of the five study populations

A positive anti-TNF response one visit post-therapy initiation was achieved by approximately half of anti-TNF users who satisfied the criteria from Canada (n = 39; 54.9%), France (n = 42; 52.5%), and Germany (35; 51.5%), respectively, and by approximately forty percent of anti-TNF users who satisfied criteria from the UK (n = 17; 42.5%) and Hong Kong (n = 13; 40.6%). In each of the five study populations, 90% or more of non-responders continued anti-TNF therapy for one or more years (Additional file 2: Table S2). In the sensitivity analysis that examined the effect of excluding costs accumulated past 24 weeks by anti-TNF non-responders, the incremental cost per QALY was reduced by approximately 25% (France: €857,992 vs. €1,105,859; Canada: € 626,459 vs. €818,186; Germany: € 422,568 vs. €545,808); UK €578,899 vs. €766,217; Hong Kong €335,418 vs. €456,850) (Table 5). Consistent with this finding, utility gain was observed to be lower among anti-TNF non-responders compared to responders (Table 6).

Table 5 Comparative estimates of costs, QALYs, and ICERs: sensitivity analysis excluding non-responder anti-TNF costs past 24 weeks
Table 6 Utility gain 6 and 12 months post-therapy initiation in anti-TNF responders and non-responders

In the sensitivity analysis using the basecase study population, but excluding indirect costs, all ICERs became more favourable (Additional file 3: Table S3). In all additional sensitivity analyses, i.e., including anti-TNF users who initiated therapy prior to and/or 6–12 months after criteria satisfaction, anti-TNF agents were dominated in all study populations (Additional file 3: Table S3); this finding did not change upon the exclusion of indirect costs (data not shown).

Discussion

To our knowledge, this is the first study to explore what proportion of SpA patients in a single cohort possesses the unique combination of clinical characteristics demanded by select sets of anti-TNF access criteria. We found that the proportion of DESIR patients eligible to receive anti-TNF therapy ranged from 9 to 28%, depending on the criteria set considered. For illustrative purposes, we note that assuming a SpA prevalence of 0.43% in France [36], this may translate to as few as 39 or as many as 120 people per 100,000 population per year being recommended anti-TNF therapy. At an estimated cost of €13,000 for a full year of anti-TNF therapy [26], the additional 81 people treated under the less restrictive access conditions would have an annual budget impact of €1.05 million. One of the contributions of the present study is in highlighting the potential role of anti-TNF access criteria, as at the current cost of anti-TNF therapy even a small number of additional patients treated will correspond to a large increase in health budgets, which may or may not represent good value for the public.

This study focused on the comparative cost-effectiveness of selected criteria sets in the French setting, and the absolute ICER values generated here should be interpreted with caution. The ICERs were produced using data over a single year using real-world data, and it should be stressed that these cannot be compared to ICERs from models that employ a lifetime horizon, estimate treatment effectiveness using RCT data, or assume that non-responders will be withdrawn from treatment. Lifetime cost-effectiveness models have the important capability of acknowledging that not all benefits of anti-TNF therapy will be realized within a short time frame; in general, anti-TNF agents appear more cost-effective in models with longer time horizons [37]. Recently, the UK’s National Institute for Health and Clinical Excellence (NICE) [1] reported upperbound ICERs of £66,529 per QALY for AS patients and £34,232 per QALY for nr-axSpA patients based on lifetime cost-effectiveness models. These are vastly more favourable than the ICERs estimated here, reflecting, in part, the impact of including the latent benefits of anti-TNF therapy. However, including these predicted benefits required extrapolating outcomes beyond periods for which observed data are available. In contrast, the present analysis has provided important observed data on the costs and benefits associated with anti-TNF use in a real-world setting. Importantly, the NICE models assume that all non-responders will discontinue therapy at 12-weeks [1], yet we found that the vast majority of non-responders continued therapy for a year or more. The continuation of therapy among non-responders appears to be one reason that ICERs estimated in the basecase analysis here are less favourable than those estimated by the NICE: in a sensitivity analysis, we found that ICERs were reduced by approximately 25% by simulating a 24-week stopping rule. As well, it may be noted that we found only modest QALY gains associated with anti-TNF use over 1 year, though utility gains were up to 0.03 units higher among anti-TNF responders compared to non-responders. Few studies have directly reported utility gain associated with anti-TNF use, and the NICE cost-effectiveness models predicted utility from BASDAI and BASFI scores using an algorithm that has not been externally evaluated [1]. It is difficult to determine whether the utility gain associated with anti-TNF use among DESIR patients is similar to what was predicted by the NICE, and there is an outstanding need for studies to report observed utility gain among patients using anti-TNF therapy.

In this study, we were unable to confirm whether France’s restrictions are the most cost-effective in that setting relative to other potential restrictions over the short term; the uncertainty around the results in the basecase analysis indicates all of the criteria sets compared here may be equally cost-effective. However, the study makes a number of observations that highlight the potential for anti-TNF access regulations to shape the therapy’s cost-effectiveness, in part by defining the target population for initiation, which both influences the likelihood of anti-TNF response and determines the appropriate population of non-users for comparison. In this study, we found that between 41% and 55% of anti-TNF users across the five study population datasets achieved a BASDAI 50 response, and the mean SF6D utility gain one year following anti-TNF initiation was higher among responders compared to non-responders. At the same time, cost-effectiveness estimates did not vary directly in accordance with the proportion of BASDAI 50 responders: the most favourable ICER was derived from the Hong Kong criteria dataset, though it had the lowest proportion of responders. This discrepancy appears to result from the lower utility among the anti-TNF non-users in the Hong Kong criteria dataset, which translated to a larger incremental difference in QALYs compared to other study population datasets. These findings underscore that, to maximize cost-effectiveness, anti-TNF therapy must be directed to patients mostly likely to experience substantial improvement in quality of life when compared to conventional care, and there is a strong need to inform anti-TNF access criteria with evidence to characterize this patient population. To date, a good deal of research has demonstrated predictors of anti-TNF response, both among AS patients [38,39,40,41] and nr-axSpA patients [9, 13,14,15, 42]. However, there are shortcomings in this evidence base, with more data derived from RCT populations [13,14,15, 18, 43] than observational cohorts [9, 38, 39, 42] and more evidence on certain markers (e.g., CRP [15, 39, 43]) than others (e.g., HLA-B27 [42]). Furthermore, few studies have modelled anti-TNF response based on combinations of clinical characteristics, which should be more useful for decision-making than single predictors [41]. Perhaps most importantly, most studies have defined anti-TNF response in binary terms using clinical measures such as the BASDAI [32], ASAS40 [44], or ASDAS [45] and the effect on quality of life of achieving a response as defined by these measures has not been established [46].

The results of this study suggest that one crucial strategy to improve anti-TNF cost-effectiveness is to ensure treatment discontinuation by anti-TNF users not experiencing clinical benefit. To implement this strategy, it would be useful to confirm minimally important differences on common quality of life measures [47, 48], to encourage clinicians to measure the benefits of anti-TNF therapy in terms of quality of life, and to help patients and providers engage in a shared decision-making process around discontinuation. We acknowledge that enforcement of regulations surrounding anti-TNF therapy is challenging, as reflected by the fact that 40% of anti-TNF users in DESIR did not satisfy the French anti-TNF access criteria. However, the potential for anti-TNF access criteria to shape the cost-effective use of these agents should not be ignored, and the rationale for initiating therapy only among patients likely to benefit- and for discontinuing therapy when appropriate- should be known by patients and providers.

Certain limitations of this study should be noted. For one, results differed depending on which anti-TNF users were included in the analysis: when anti-TNF users who received therapy prior to criteria satisfaction or 6–12 months after criteria satisfaction were included, anti-TNF therapy was dominated in all scenarios. This finding points to a possible role for timing of anti-TNF initiation in determining the therapy’s cost-effectiveness; however, the results could also be explained by unmeasured, time-variant confounders. In general, the data analyzed here were derived from a non-randomized study, meaning all results are subject to confounding by indication. To help control for this, we made comparisons only between anti-TNF users and non-users who satisfied the same set of access criteria and we further adjusted costs and QALYs for known confounders, though residual confounding cannot be ruled out. In terms of other study limitations, we assumed that all patients met treatment failure criteria, which were defined differently in the selected criteria sets. It should be noted that up to a third of SpA patients may achieve clinical remission with NSAIDs alone [49] and anti-TNF therapy will necessarily be more cost-effective if used only by patients who have failed this less costly treatment. The present study did not evaluate the number of NSAIDs that should be tried before anti-TNF therapy in order to maximize cost-effectiveness, which is a limitation.

Despite its limitations, this study is unique in having used data from a heterogeneous, real-world population of SpA patients to demonstrate the influence of patient characteristics on anti-TNF cost-effectiveness estimates. In line with the initiative to incorporate economic evidence into clinical guidelines [50, 51], future research should focus on confirming what combination of patient characteristics best predicts quality of life improvement following anti-TNF therapy and informing anti-TNF access criteria with this evidence. As a substantial number of anti-TNF users in the DESIR cohort did not satisfy any of the selected criteria sets, and as discontinuation of anti-TNF therapy following non-response was infrequently observed, this study further calls for a discussion as to the practical application of regulations surrounding anti-TNF therapy.