FormalPara Key Points for Decision Makers

Treatment sequences, where previous treatment and patient characteristics can affect both the choice and effectiveness of subsequent treatments, are increasingly common in chronic conditions and represent complex treatment pathways. Methods for evidence synthesis that produce the least biased estimates of treatment sequencing effects are required to inform reliable clinical and policy decision making.

Randomised controlled trials (RCTs) of treatment sequences are limited; the use of RCTs of discrete treatments may not provide good evidence on treatment sequencing effects, and observational studies are susceptible to confounding and bias.

The inclusion of discrete treatments used at different points in the treatment pathway may bias a network meta-analysis. Meta-regression needs to account for both previous treatment and duration of disease.

Modelling studies of treatment sequences often apply simplifying assumptions due to the absence of sequencing trials. This can lead to misrepresentation of the true level of uncertainty, potential bias in estimating the effectiveness and cost-effectiveness of treatments, and the wrong decision.

1 Introduction

The availability of multiple interventions for the same condition or indication is increasingly common [1]. To optimise treatment outcomes and value for money, a sequence of treatments is likely to be used in such contexts. Policy and clinical decisions based on the optimum sequence rather than the effectiveness or cost-effectiveness of discrete treatments are becoming increasingly important [2,3,4,5]. This is especially true for chronic diseases, such as depression, diabetes, and cancer [5,6,7], and some infectious diseases where treatment resistance can limit effectiveness, for example human immunodeficiency virus (HIV) [8]. However, synthesising and interpreting the evidence base to inform such decisions is not straightforward. Treatment sequencing represents a complex intervention pathway where treatment history and patient characteristics may influence both the choice and the effectiveness of subsequent treatments. Treatment history represents multiple factors, including, number and type of previous treatments [9, 10], carry-over effects of prior treatments [11,12,13], type, level and duration of response to previous treatment [14,15,16], time on treatment [17], intolerance or toxicity [16, 18], development of disease resistance [19, 20], and burden of preceding treatments that can impact subsequent adherence [7, 21]. Time and disease trajectory are also important factors that can influence the effectiveness of subsequent treatment, the impact of which can be both dependent on and independent of previous treatments [9, 10, 22, 23]. Subsequent treatment choices include dose escalation, add-on therapy, a completely new treatment, or re-use of a previously effective treatment. In some instances, for example relapsing-remitting multiple sclerosis, previous treatments can restrict the choice of allowable follow-on drugs [24].

Randomised controlled trials (RCTs) provide the most robust estimates of treatment effects to inform policy and clinical decision making. However, RCTs of treatment sequences are few in numbers and do not cover the breadth of decision making needed. As the number of available treatments increases, the number of unique sequences will increase geometrically [4, 25], making it impractical and prohibitively costly to evaluate all conceivable sequences in RCTs. The time-varying adaptive nature of many sequences also means that innovative and novel approaches, such as sequential multiple assignment randomised trials (SMARTs), are required for developing the dynamic treatment regimens [26,27,28]. RCTs of discrete treatments, used at single points in the treatment pathway, provide robust estimates of effectiveness for their specific context, but may not provide representative estimates for these treatments when used in different contexts, such as the later stages of sequences. Participants who enrol into clinical trials and are adherent to discrete treatments may also be quite different from subjects in trials of treatment sequences where alternative, subsequent treatment options are available [7, 29,30,31]. In sequential treatment studies, participants’ decision to end first-line treatment may be influenced by the knowledge there is a second-line treatment readily available [21]. Alternative data sources, which can potentially provide context-specific estimates of treatment effects in different sequences, are longitudinal observational studies, but these are subject to selection bias and confounding.

Evidence synthesis methods that produce the least biased estimates of treatment-sequencing effects are required to inform reliable clinical and policy decision making. Due to the limitations of primary data sources outlined above, this is likely to require advanced meta-analytic techniques [32,33,34,35,36] or mathematical modelling [37]. There is no current guidance for best practice in this context. The Decision Support Unit (DSU), which is commissioned by the National Institute for Health and Care Excellence (NICE) to provide a research resource to support the institute’s Technology Appraisal Programme has developed a briefing document on reviewing sequential treatments and downstream costs [38]. This was part of a series of briefing papers and reports developed to inform the 2013 update of the NICE methods guide. The updated methods guide highlighted the fact that some technology appraisals may need to consider the comparison of treatment sequences. However, neither the updated methods guide nor the DSU’s briefing document provided guidance on evaluating the clinical effectiveness or modelling treatment sequences. We did not find any other health technology assessment (HTA) guidance that provided information on evaluating treatment sequences. Our paper provides a first step in addressing this limitation.

As a step towards informing best practice, a comprehensive review of reported quantitative evidence synthesis methods was conducted to establish what existing methods are available and outline the assumptions they make and any shortcomings. It is also hoped that this review will draw attention to this increasingly important area and encourage future methods development.

The review of methods was conducted with the aim of providing guidance for undertaking HTA or similar processes, including comparative effectiveness research and evidence-based guideline development. We did not aim to assess the effectiveness or cost-effectiveness of treatment sequences here, rather the methods used to develop summary treatment effect estimates of whole sequences or discrete treatments conditional on their positioning in the treatment pathway. The review considered methods applied within both clinical and economic evaluation; however, our focus is on the estimation of clinical effectiveness and does not consider the impact of treatment sequencing on the estimation of costs or utility values.

2 Methods

2.1 Literature Search

The intention was to identify the breadth of methods developed for evaluating treatment sequences and not to identify every study that used each method.

The breadth of our review, the recognised challenges of identifying and selecting methodological research using reference databases [39,40,41], and the fact that the majority of relevant literature would likely be studies reporting applicable methods or methodological developments as part of a wider applied study, rather than being primarily methodological studies [41], meant that a conventional systematic search of reference databases was considered insufficient for the current review. A number of approaches and sources were therefore used to identify relevant methodological studies. The following bibliographic databases were searched from inception to August 2013: MEDLINE, Embase, and the Cochrane Library. The search strategy is provided in Online Resource 1 (see the electronic supplementary material). This was supplemented by hand-searching the following: internet search engines; the websites of specific organisations, including NICE; electronic journals; the agendas of online conference proceedings; the references of existing reviews (listed in Online Resource 1) and relevant papers; known author searches; and forward citation tracking. The reference database searches were not updated, but iterative and purposeful hand searches, including the PubMed related citations function, were continued throughout the review process. An in-depth review was conducted of relevant studies identified during the initial searches. Potential new studies then were then cross-referenced with a list of included studies and recorded methods. More recent studies were only included if they contributed to new methods or knowledge. The searches were deemed to be complete when further efforts to identify information did not add to the analysis [42] (with the most recent study published in 2016). This is analogous to reaching the point of ‘saturation’ in qualitative research [42, 43]. The searches have since been supplemented by a recent purposeful and targeted search, which incorporated scanning studies included in a recent systematic review of economic evaluations in rheumatoid arthritis by Ghabri et al. [44].

2.2 Assessing Study Relevance

The review included any disease condition and sequence of any type of treatment. It did not consider decision problems relating to prevention, screening/prognostic, diagnostics, or treatment monitoring. It focused on treatment switching based on a clinical assessment. Studies evaluating the effectiveness of planned sequential administration of combined therapy were excluded, as this represented a different type of decision framework.

The review included studies that applied or developed quantitative evidence synthesis methodology as part of secondary research. Studies that used qualitative or narrative evidence synthesis and primary research evaluating treatment sequences were excluded. Any type of meta-analytic technique was considered, incorporating, but not limited to, pairwise meta-analysis, meta-regression, network meta-analysis (NMA), and any meta-analysis based on individual patient data (IPD). Decision-analytic modelling techniques developed to evaluate treatment sequencing, whether conducted as part of an economic evaluation or not, were included. Modelling studies that aimed to evaluate the effectiveness of discrete treatments and incorporated the impact of downstream treatments were only included if they specifically modelled sequencing effects. Studies published in abstract form were excluded, as were economic evaluations based on a single RCT.

3 Results

3.1 Overview of Included Studies

Database searches, after de-duplication, identified 752 references, of which 94 were deemed potentially relevant after screening titles and abstracts. Twenty-six of these could not be further assessed as they were unavailable (n = 2), could not be translated (n = 2), or were only published as conference abstracts (n = 22). A further 28 of those retrieved in full were excluded as they were not relevant (Fig. 1). After collating studies published in more than one publication, the remaining 40 references yielded 36 studies of interest. These were included in the review, along with a further 53 studies identified via internet and hand searches. Recent supplementary targeted searches identified two studies [45, 46] that contributed a new modelling technique. There were 91 studies in all.

Fig. 1
figure 1

Flow diagram showing the number of references identified, publications retrieved, and studies included in the methodology review

Forty-nine (54%) studies investigated the use of disease-modifying antirheumatic drugs (DMARDs), including biological agents (or biologics), for the treatment of inflammatory arthritis, including rheumatoid arthritis, psoriatic arthritis, and ankylosing spondylitis. Fourteen (15%) related to oncology. The remainder assessed treatments for epilepsy (n = 4; 5%), psoriasis (n = 4), depression (n = 3; 3%), glaucoma (n = 2; 2%), schizophrenia (n = 2), type 2 diabetes mellitus (n = 2), HIV (n = 2), neuropathic pain (n = 1), postherpetic neuralgia (n = 1), sciatica (n = 1), fibromyalgia (n = 1), chronic hepatitis B infection (n = 1), Crohn’s disease (n = 1), onychomycosis (n = 1), and spasticity (n = 1). The majority involved sequences of drug treatments, but some also considered other interventions, for example, surgery for sciatica. Only two studies were primarily methodological [14, 47].

Meta-analysis and decision-analytic modelling were reviewed as two distinct categories of quantitative evidence synthesis methods.

3.2 Meta-Analytic Methods

Twenty-three studies were included in the evaluation of meta-analytic approaches [9,10,11, 16, 23, 47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64]. However, some of these studies were considered relevant in fairly broad terms, such as providing examples of how the limited evidence base precluded the evaluation of treatment sequencing, or representing the use of stratified analysis by line of therapy, which could potentially provide a building block for future methods development. These approaches were initially not considered pertinent to the review but because of the dearth of relevant methods identified, a post hoc decision was made to include them as examples of simplifying methods. This provided a more comprehensive list of the approaches pragmatically used for evaluating treatment sequencing in general, rather than limited to novel methods for developing sequence-specific summary effect estimates. An overview of the studies, including their aims, approaches used, and the data sources, is presented in Table 1.

Table 1 Overview of the meta-analytic approaches used by included studies (studies are ordered according to the methodological approach used)

The evidence to inform treatment sequencing was broadly considered in two ways: a one-step-at-a-time evaluation based on a series of discrete treatments and a comparison of whole sequences. No novel meta-analytic methods (beyond the use of conventional pairwise meta-analysis [32]) were identified for evaluating treatment sequences, and none directly aimed at developing a summary estimate of effect conditional on positioning in the sequence. Most approaches were developed for addressing excessive heterogeneity or specific gaps in the RCT evidence when evaluating discrete treatments at single points in the pathway. For example, in rheumatoid arthritis, RCTs of initial biological therapy investigated the use of these drugs in both early disease, where patients have not previously received any DMARD therapy, and as add-on therapy for established disease in patients with an inadequate response to previous conventional DMARDs, representing a heterogeneous patient population. The first-generation biologics include tumour necrosis factor-α (TNF) inhibitors. Most RCTs of second-line biologics investigated other types of biologics in participants with an inadequate response to previous TNF inhibitors; few RCTs evaluated the sequential use of first-generation TNF inhibitors, whist registry data show that these are often used in practice as second- or subsequent-line therapy [47]. The current meta-analytic approaches, which can potentially be used in a clinical evaluation of a health technology, are summarised below.

3.2.1 Meta-Analysis of Studies Evaluating Whole Sequences

This approach is hampered by the limited number of available RCTs of treatment sequences, which also makes it difficult to establish a closed network for implementing NMA [56]. Observational studies can be used as alternative data sources, but are subject to confounding and bias. The type of observational studies used included the comparison of participants who had received a predefined sequence of two drugs [11], the evaluation of second-line treatment where generic first-line treatment is used [52], and the comparison of the outcomes of first- and second-line treatments [9, 47]. The comparison of treatments used during an earlier versus a later part of the treatment pathway ignores the likely effect of disease trajectory, issues relating to treatment choice, changes in pathophysiology with time, and other confounding factors. The types of bias and limitations of non-randomised studies that are specific to the evaluation treatment sequences, and identified as part of the review, are listed in Box 1.

Box 1 Potential bias or limitation in non-randomised, real-world observational studies that are specific to the evaluation of treatment sequences

3.2.2 Subgroup Analyses to Explore the Impact of Treatment History when Evaluating Treatment Sequences in a Piecemeal Fashion

The subgroups can be defined in two ways: by splitting all studies into two or more groups, also referred to as stratified analysis (e.g. early- vs late-stage disease, or failed previous TNF inhibitor ‘yes’ vs ‘no’) [63, 64], or by taking partial data from included studies (e.g. participants switching to a second TNF inhibitor due to intolerance, lack of efficacy, or loss of efficacy) [58]. A summary of the methods used is provided in Online Resource 2 (see the electronic supplementary material). Stratified analysis is also applied when conducting separate meta-analysis for each line of therapy (e.g. first- or second-line biologics) [62, 64] or for different patient populations (e.g. participants with no previous history of biologic therapy  or participants with an inadequate response to previous TNF inhibitors) [10, 60]. The main limitation of using subgroup analyses is that it only allows for the comparison of two subgroups at a time, with or without one covariate. All other covariates are pooled, and each analysis is confounded by other variables [65].

3.2.3 Meta-Regression to Adjust for the Previous Treatment

This approach was not generally used for the sole purpose of evaluating treatment sequences, but was used to account for the heterogeneity within the meta-analysis or NMA. A summary of the methods used is provided in Online Resource 2. The covariate representing previous treatment was often dropped from the final analysis due to non-significant findings [54, 61], possibly due to lack of power, as previous treatment was generally poorly reported in primary studies [10, 54]. However, lack of variability between studies can also contribute to non-significant findings, especially when the meta-analysis is used to compare treatments applied at a single point in the pathway, or where the ordering of treatments is much the same in a given disease. To avoid problems with insufficient power, a limited number of covariates are incorporated in the meta-regression. This frequently included disease duration. For example, a study, which combined the use of NMA and meta-regression to account for the significant heterogeneity between studies of biologics for rheumatoid arthritis, included only two study-level covariates in the meta-regression, disease duration and a measure of baseline disability [57]. The analysis included RCTs of participants who were DMARD naive and RCTs of participants with an inadequate response to these drugs lumped together. Disease duration could potentially be considered as a proxy for previous treatment use, as the likelihood of failing prior treatments will increase with increasing duration. However, there is also justification for including treatment history as a covariate, especially when pooling (lumping together) studies across different treatment lines [10, 23]. The inclusion of both covariates could help to disentangle whether long standing disease per se is associated with a poor response to treatment, or whether failure on previous treatments predicts response to subsequent treatment [22]. The use of IPD is likely to enhance the application of this approach [10], but studies that used such data were still hampered by the poor reporting of previous treatment [23].

A further limitation of conducting an NMA of all discrete treatments irrespective of where they are used in the pathway is that previous treatment(s) can both have an impact on treatment effect, acting as an effect modifier, resulting in heterogeneity, and be associated with the type of treatment comparison, acting as a confounding factor and lead to inconsistency in the network. For example, in an NMA of sciatica treatments, non-invasive treatments were more likely to be used as initial treatments and invasive treatments were used after the failure of other treatments in patients with a more long-standing and less responsive condition [66].

3.2.4 Network Meta-Analysis Incorporating Multiple Treatment Lines, for Example, First- and Second-Line Treatments, as Separate Treatment Nodes

This approach was not developed for evaluating treatment sequences as such, but rather to evaluate methods for incorporating real-world data in evidence synthesis of second-line treatment. In particular, the approach sought to optimise an evidence base using first-line evidence to inform second-line effectiveness estimates. The methods were applied as part of the GetReal project case study of biologics in rheumatoid arthritis [47]. The authors had access to IPD from two national registries and five RCTs (two investigated second-line treatment). A series of Bayesian univariate and bivariate NMA was conducted that incorporated both treatment lines. The data from RCTs provided separate networks of evidence for first- and second-line biologics. No RCT reported on both treatment lines; thus the exchangeability assumption was needed to connect the two networks by assuming all treatment effects have a common distribution. The univariate analysis utilised the registry data as data, whereas the bivariate analyses used the registry data to inform the prior distribution for the correlation parameter between first- and second-line treatments. In the univariate analysis, relative effect estimates for first- versus second-line treatment were obtained from the registry, allowing the two networks to be connected and for treatment comparisons (e.g. drug A in first line vs drug A in second line) to be obtained. The use of multivariate analysis allows separate outcomes to be modelled simultaneously, using the correlation to borrow information across multiple outcomes or time points. Here, the treatment effect for first-line treatment was modelled as outcome 1 and second-line treatment as outcome 2, and the correlation was that of between treatment lines. The initial bivariate NMA was conducted using RCTs of first- and second-line treatments. The correlation estimate was obtained by conducting standard pairwise meta-analysis, based on registry data split into first- and second-line response, and monitoring the correlation. In a second bivariate analysis, the registry data were used as part of the NMA by being split into multiple pairwise studies. This allowed for modelling between-study correlation between the lines of therapy. A third analysis used data from the registries, reporting treatment effect estimates on both lines, which allowed for relaxing the exchangeability assumption on the average level. The biggest challenge here was developing an estimate of correlation between first- and second-line treatments to conduct the analysis. The assumptions of consistency and similarity, across the pairwise contrasts, within the NMA may also be difficult to justify, as discussed above in the NMA of sciatica treatments example. The limitations of relying on observational studies comparing first- and second-line treatment are discussed in Sect. 3.2.1 and Box 1.

3.2.5 Developing a Specific Multiplication Factor that Can be Applied to the Summary Effect of a Treatment Used as First Line in Order to Represent Its Use at a Later Point in the Pathway

This approach is not a meta-analytic method as such, but was used to adapt the findings of meta-analysis of discrete (first-line) treatments to represent sequencing effects. The optimal approach for developing a multiplication factor is yet to be established. Current methods incorporate two approaches [16, 48]. One study, investigating TNF inhibitors for psoriatic arthritis, obtained modifying factors from an observational study comparing the class of drugs used as first-line and subsequent treatment for rheumatoid arthritis. A different multiplication factor was developed, depending on whether the initial TNF inhibitor was discontinued due to inefficacy or adverse effects [16]. A second study developed a reduction factor based on the data available for one antiepileptic drug for which there was an RCT of its use at two different time points, first-line monotherapy and later as an add-on therapy [48]. Modification factors were primarily used by modelling studies, with most not reporting the methods used for developing them [18, 67,68,69,70,71,72]. Most used estimates based on available evidence, mainly an observational or previous modelling study, the choice of which was frequently not justified. The reduction factor used in the most recent (2020) economic evaluation [45] was obtained from a pragmatic RCT of non-TNF-targeted biologic versus a second TNF inhibitor to treat rheumatoid arthritis in patients with insufficient response to their first anti-TNF-inhibitor (Gottenberg et al. [73]).

3.3 Decision-Analytic Modelling

3.3.1 Decision Modelling Methods

Seventy-two modelling studies were reviewed and fifty-two distinct models identified [14,15,16,17,18, 45, 46, 48, 53, 56, 67,68,69,70,71,72, 74, 74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101, 101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]. An overview of the included modelling studies is provided in Online Resource 3 (see the electronic supplementary material). Most modelling studies were conducted as part of an economic evaluation. A wide range of modelling techniques were used to address a broad spectrum of treatment-sequencing decision problems (Box 2), which included identifying the optimum sequence; adding a new drug to an established sequence; comparing ‘step-up’ or ‘step-down’ treatment approaches; comparing different treatments used at the same point within a sequence; evaluating a drug used at different points within a sequence; and comparing predefined sequences. The sequence of treatments being modelled ranged from a fixed sequence of a limited number of treatment lines to variable treatment algorithms where patient history and characteristics dictate the choice of subsequent treatments.

Box 2 Illustration of the different types of treatment-sequencing decision problems

Two published taxonomies developed for categorising different modelling techniques according to their key features [128, 129], along with other guides and algorithms that have been developed to aid the selection of an appropriate modelling technique (or structures) for economic evaluation in general [97, 128,129,130,131,132,133,134,135,136,137,138,139,140], were used to categorise the included studies and inform the data extraction. The advantages and disadvantages of each modelling approach were assessed as part of the review. The choice of an appropriate modelling approach depends on the complexity of the underlying decision problem, the extent of the treatment sequences being investigated, and the disease condition. Table 2 provides an abbreviated summary of the overall findings of the review of modelling studies, including how treatment sequences were conceptualised within different modelling approaches (column 2); and the type of the additional attributes in the decision problem (beyond the sequencing of individual treatments) and disease condition that were captured by the included models (column 3). A more detailed summary of the methods and findings of the review of modelling studies is provided in Online Resource 4 (see the electronic supplementary material). The modelling techniques used included deterministic decision tree, stochastic decision tree, Markov cohort model, partitioned survival cohort model, semi-Markov cohort model, individual-patient simulation state transition models, discrete event simulation, discretely integrated condition event (DICE) simulation, non-terminating population-based simulation, terminating population-based simulation, and dynamic Markov cohort model. No study compared any of these alternative approaches for evaluating treatment sequences to assess, for example, how sensitive results were to the type of model used. A number of studies did report choosing a discrete event simulation over a state transition model due to the improved computational efficiency [48, 68, 104, 122]. The level of complexity in the decision problem accounted for in the models varied quite considerably, even when evaluating similar treatment sequences within the same disease condition. The decision problem was also simplified by modelling a limited number of treatment lines, streamlining the disease process, and using a short time horizon. For example, some studies used a 2–5 year time frame, rather than a lifetime horizon, for modelling treatment sequences for rheumatoid arthritis, because a longer time horizon implied too many assumptions [71, 78, 79, 84, 112, 113, 124].

Table 2 Summary of the different modelling approaches used and their advantages and disadvantages for evaluating treatment sequences

3.3.2 Simplifying Assumptions Regarding Sequences of Treatment

Treatment sequences were often modelled as a series of discrete treatments, each requiring a summary effect estimate conditional on positioning in the treatment pathway. The scarcity of data to inform such estimates meant that simplifying assumptions were often applied to the available data on discrete treatments used at a single point in the pathway. A range of simplifying assumptions used to represent treatment-sequencing effect estimates was identified, which were used to develop a novel taxonomy of all possible assumptions (Table 3). The most common assumptions were that treatment effect is independent of positioning in the sequence, or that treatment effect is dependent on the number of previous treatments (treatment line), but independent of the type of treatments used (Table 4). These assumptions were frequently not validated; nor was their impact on the overall results assessed. Forty-nine studies (72%) assumed that the effect of either all or some of the treatments used after the first treatment modelled (or decision point) were independent of treatment sequence. Only six studies (9%) evaluated the impact of applying this assumption in sensitivity analyses, by reducing the effect of treatments used later in the sequence using a factor based on evidence [67, 69], an arbitrary amount [15, 93, 110], or expert consensus [14]. The assumption that treatment effect is dependent on line of therapy was often used in conjunction with the assumption of treatment independence, applied to treatments adopted later in the sequence.

Table 3 Taxonomy of simplifying assumptions relating to treatment-sequencing effects used by studies included in the review
Table 4 Summary of the frequency of use of the simplifying assumptions

The available evidence to inform treatment-sequencing effects impacts the type of assumptions required. The review focused on modelling studies that evaluated treatment sequences, but economic evaluations often focus on the comparison of discrete treatments and model downstream costs of subsequent treatments. The findings demonstrated that priority was often given to matching the evidence for the decision point, for example, comparing first-line biologics, rather than considering treatment sequences as a whole. Economic evaluations undertaken by, or on behalf of, manufacturers of health technology tended to focus on a specific decision point reflecting treatments used in pivotal RCTs matching the licence indication, for example, comparing a TNF inhibitor to a conventional DMARD [74, 80, 101], or a non-TNF-inhibitor biologic to a TNF inhibitor [101]. The data sources used alongside the simplifying assumptions for treatments used beyond the decision point varied, even when considering the same decision problem and addressing the same evidence gap. For example, data sources used to inform sequential TNF inhibitors included the following: RCTs of TNF inhibitors used as first-line biologics [45, 67, 72, 83, 87, 89, 92, 96, 101, 109], a national patient registry [81, 101, 104, 115, 122], a large, uncontrolled, open-label study of a specific TNF inhibitor in patients who had previously discontinued TNF inhibitors [78, 79, 84, 107, 112, 113], and an RCT of a non-TNF-inhibitor biologic in participants with an inadequate response to TNF inhibitors. The effects of treatments administered later in the treatment pathway were also handled in different ways. For example, in a technology appraisal of TNF inhibitors for rheumatoid arthritis [83], the initial treatment response for each subsequent conventional DMARD was explicitly modelled, whilst in another technology appraisal of TNF inhibitors for psoriatic arthritis [16], the economic model assumed that patients experienced a steady long-term deterioration after the failure of the TNF inhibitor. Therefore, fluctuations caused by response to subsequent conventional DMARDs, which were considered to be administered as part of palliative care, were ignored. The uncertainty in the quality of the alternative evidence to inform sequencing effects was not investigated in depth.

Decision models that start at the point of diagnoses are more likely to reflect the complete sequence of treatments used in chronic conditions, for example, some studies of biologics in rheumatoid arthritis developed the decision population within the actual model, with patients entering the model being newly diagnosed with early disease [67, 75, 83, 85, 99, 122]. However, the likelihood that there is no matching evidence is increased, and more assumptions are required. Another approach is to model the initial treatment used prior to the decision point (e.g. when comparing second-line biologics), and apply the assumption that the entire patient population on entering the model have an inadequate response to the first modelled treatment (e.g. first-line biologic). This approach was used in the Advanced Simulation Model, to allow the initial treatments to be costed appropriately, reflecting treatment sequences used in practice [71, 78, 79, 84, 112, 113]. However, the evidence used to inform the treatment effects of the second TNF inhibitor did not match the prior TNF inhibitor failed (first treatment modelled). The third and most common approach was to include a patient population entering the model that reflected the decision problem in terms of the number of previous treatments used, for example, patients receiving their first biologic therapy. Modelling studies that only consider the impact of subsequent treatments when, for example, comparing first-line biologics [71, 72, 74, 75, 83, 89, 101, 120, 123, 127] are generally based on the assumption that the sequences being compared are starting from a level playing field. The potential impact of this is not generally considered within the sensitivity analysis, as it is not part of the cost-effectiveness estimates.

A frequent problem when evaluating the introduction of a new treatment to an established sequence is the lack of data to inform the ‘displaced effect’. For example, when adding a new drug (e.g. non-TNF-biologic agent) to an established sequence (e.g. starting with a TNF inhibitor), the existing drug is displaced lower down the sequence (Box 2), and is generally modelled as both the comparator (e.g. first-line) treatment in the baseline sequence and the subsequent (second-line) treatment, after the new drug, in the intervention sequence. The same treatment effect is generally applied to the existing drug, irrespective of whether it is used early or later in the sequence (and disease trajectory), with no RCT data available on its effect in patients with an inadequate response to the new drug.

4 Discussion

4.1 Summary of the Findings

The review identified a range of quantitative evidence synthesis methods used for evaluating the effectiveness of alternative treatment sequences. The findings demonstrated the following:

  1. i

    Reviewing the evidence on treatment sequencing is neither trivial nor straightforward.

  2. ii

    In most cases, treatment sequences represent complex, multifaceted, dynamic intervention pathways, which will require advanced methods of quantitative evidence synthesis, especially if evaluated using a ‘one-step-at-a-time’ approach.

  3. iii

    Prospective sequencing trials are few in numbers and do not cover the breadth of decision making needed. The evidence synthesis would likely need to consider the inclusion of diverse study designs, including non-randomised studies.

  4. iv

    The problem has largely not been addressed using evidence synthesis methodology for clinical effectiveness, but is usually dealt with at the decision modelling stage.

  5. v.

    There is no single best way to evaluate treatment sequences; rather there is a range of approaches and, as yet, no generalised methodology that encompasses the different assumptions used.

  6. vi

    Each approach has advantages and disadvantages and is influenced by the evidence available and decision problem.

  7. vii

    When using a one-step-at-a-time approach, previous treatment is an important effect modifier, and subsequent treatments can confound long-term outcomes, such as survival.

  8. viii

    The reason for discontinuing treatment (lack of effect, loss of effect, or intolerance) has a differential effect on the effectiveness (and choice) of subsequent treatment, and is poorly reported in primary studies.

  9. ix

    The extent and type of sequences being evaluated tended to reflect the available research evidence, rather than clinical practice.

4.2 Comparison with Existing Reviews

We identified three existing reviews of methods for evaluating treatment sequences. This included two systematic reviews of economic evaluations [4, 141] and one review of published UK NICE technology appraisals [3]. Mauskopf et al. analysed treatment-sequencing assumptions after failure of the first biologic in cost-effectiveness models of psoriasis, and compared the modelled sequences with the most recent treatment guidelines [141]. They concluded that models of first-line biologics either do not include subsequent treatments or include only some of the regimes recommended in current guidelines, and that cost-effectiveness results may be sensitive to the assumptions about treatment sequencing, and choice and efficacy of subsequent treatment sequencing regimens. Tosh et al. assessed and critiqued how sequential DMARDs for rheumatoid arthritis have been modelled in economic evaluations [4]. They found that reporting of the methods and evidence used to assess the effect of downstream treatments was generally poor; when lifelong models and treatment sequences were considered, evidence gaps were identified. They concluded that methods were not applied consistently, leading to varied estimates of cost-effectiveness, and that treatment sequences were not fully considered and modelled, potentially resulting in inaccurate estimates of cost-effectiveness. Zheng et al. investigated approaches used to model treatment sequences in NICE appraisals to provide practical guidance on conceptualising whether and how to model sequences in health economic models [3]. They concluded that the biggest challenge is the scarcity of clinical data that capture the long-term impacts of sequences on efficacy and safety. Three commonly used assumptions to bridge the evidence gap were identified, but each had its own limitations. These included the assumption that the efficacy of a treatment stayed unchanged regardless of line of therapy, the use of data from trials in different lines of therapy to directly model a treatment sequence, and the use of retrospective studies of clinical registries or databases. The findings of these reviews were consistent with ours, though their scope was more limited in that they focused on either a single condition or UK NICE appraisals.

4.3 Strengths and Limitations of the Review

This is the first review of methods to investigate the evaluation of treatment sequencing across all clinical scenarios, and to include both meta-analytic techniques and decision-analytic modelling. It represents an extensive in-depth review of current methods used to evaluate the clinical effectiveness of treatment sequences, representing a broad and disparate area of research.

A potential limitation of our review is that the reference database searches were not updated. However, targeted hand searches were continued during the review process and studies published beyond 2013 have been included. Nevertheless, more recent studies were only included if they contributed new findings, and the searches stopped when no new information was being found. This means that the review could have potentially missed new methods developed in the last few years. Updated targeted hand searches identified a new modelling technique (DICE) that was not previously included in our review. This has since been included. However, the methods used to conceptualise treatment sequences and the level of reality captured in the DICE model did not change the findings and recommendation of our review. The methods used to develop treatments sequencing effect estimates and the accompanying simplifying assumptions made within the new studies [45, 46] were also the same as those included in our review. The assessment of recent studies included in a new systematic review of economic evaluation of sequences of biological treatments for patients with rheumatoid arthritis, published in 2020, did not identify any studies reporting methods or simplifying assumptions not already incorporated in our review [44].

4.4 Recommendations for Practice

4.4.1 Primary Study Design

The available evidence base for evaluating new treatments is often driven by the requirements for regulatory approval, and thus focuses on discrete treatments used at a defined point in the pathway [142, 143]. The lack of data on the effectiveness of these treatments when used at another point in the pathway is a barrier to making policy decisions about the optimal positioning of new treatments or treatment sequences. The GetReal project (Sect. 3.2.4) included a stakeholder engagement workshop to solicit views on the usefulness and acceptability of their analytic approach [144]. Interestingly, the regulators considered it to have limited usefulness because the evidence requirements for marketing authorisation in rheumatoid arthritis is line specific, whilst the pharma research and development representatives considered it useful in principal, to better understand the gaps in the evidence across lines of therapy and aid the design of future clinical trials [144]. The focus of primary research on discrete treatments is unlikely to change unless the regulatory authorities specify the importance of treatment sequencing or optimal positioning of new treatments. The reimbursement agencies and HTA bodies should also make recommendations on the nature of the clinical evidence required to inform treatment sequences [145, 146].

4.4.2 Health Technology Assessment

It is important to identify the relevance of, or the need to consider, treatment sequencing early on in the technology assessment process, and incorporate both the clinical and economic evaluation. Treatment sequencing was often considered as part of the economic evaluation only, and not considered in the clinical evaluation [17, 67, 83, 85, 95, 99, 106, 116, 126, 147]. A previous review of NICE technology appraisals also identified a lack of integration or direct use of the systematic review to inform the economic evaluation, and the need to consider the data requirement of the economic model at an early stage [148].

The development of an initial analytic or conceptual framework [40, 149] provides an essential tool for the planning and evaluation of treatment sequences. It can be used to consolidate the requirements of the clinical and economic evaluation; assist in communication within the research team and with a range of stakeholders; ‘think through’ the multiple components of the treatment pathways and disease-specific events in context; enhance the transparency of underlying assumptions; and inform choices about the level of structural complexity required by the model [40, 139, 150,151,152,153]. For some chronic diseases, it may be useful to create a disease-specific conceptual framework that can serve as a foundation for developing future HTAs and economic models of current and novel treatments [154], thus potentially allowing for greater stakeholder feedback and future improvement. There is also a need to depict treatment sequences as a tree, rather than a linear sequence of treatments, thus accounting for the complex and dynamic intervention pathways that they represent. Although methods were developed that accounted for the fact that the reason for treatment discontinuation (e.g. loss of effectiveness, adverse events, non-adherence) might determine the average effectiveness for the next line of therapy, the reality is that this may also affect the choice of therapy for the next line. A tree structure is adopted in the SMART design, which is a multistage trial designed to develop and compare treatment pathways that are adapted over time based on individual’s response and/or adverse effects [28].

The time and resource constraints of HTA, accompanied by limited evidence, may render an extensive model unrealistic. It may therefore be tempting to simplify the treatment-sequencing decision problem. However, a model based on an oversimplification of the decision problem and clinical practice is also unlikely to be useful for decision makers. An alternative approach would be to develop a model that is designed to address any/multiple decision problems, rather than a single use model. This may be relevant, not only for chronic disease, but also in the introduction of new treatments in a rapidly changing clinical field, such as oncology [5]. The likelihood that the available data to inform sequencing effects may improve over time also supports developing a model that is easily updated. This is consistent with recent calls for the use of disease-specific reference models [155], pre-verified modules [156], and open-source models [157] to improve the accuracy of economic evaluations. Our review identified some good examples where a model was further developed over time to address multiple reimbursement decisions (e.g. Birmingham Rheumatoid Arthritis Model [BRAM] [75, 158], Tran-Duy model [68, 122], Sheffield rheumatoid arthritis models [159], and the Advanced Simulation Model [78]). However, each was developed by the same research group. An important challenge here is the need to make sufficient detail on the original model openly available.

A mathematical challenge for comparing multiple permutations of sequences is to determine the proper starting point of the model. This is also relevant when using a model designed for multiple uses, which may start at the point of diagnosis [75], a key point in the treatment pathway (e.g. initiating DMARD therapy [122] or biologic therapy [78]), or the point at which the decision is made (Sect. 3.3.2). All evaluations should start at the point of divergence (i.e. the point at which a decision might be made) [75]. Models used for comparing multiple permutations of sequences often include the same first one, two, or three lines of treatment. This will essentially 'dilute' the true incremental effects (and costs) of treatment since some patients will have died (and left the model) before the point of divergence. Thus, when calculating the incremental outcomes per patient, the denominator will be greater than should have been used, meaning that the incremental results will underestimate the true effects.

A number of studies developed a model based on an existing approach. Existing modelling approaches could also, potentially, be adapted for use in a different disease condition. However, when using an existing model, it is important to consider what underlying assumptions regarding treatment sequences were applied. For example, the York psoriasis model [126], which has subsequently been used by multiple studies evaluating treatment sequences in psoriasis [141, 160, 161], is based on the underlying assumption of treatment independence. The underlying assumptions of some existing modelling approaches mean that they will not be suitable for assessing the treatment sequences for some chronic conditions.

4.5 Taxonomy of Simplifying Assumptions Relating to Treatment-Sequencing Effects

The taxonomy of simplifying assumptions (Table 3) provides a unique and important resource to inform future practice and has the potential to be an important tool for clarifying the extent to which treatment-sequencing effects have been accounted for within a decision model. It can be used as a checklist by modellers to help them consider whether treatment sequencing should be modelled, and what implicit assumptions they may be making. It can also be used by reviewers or policy decision makers to appraise or better understand an existing model. However, to apply the taxonomy, better reporting of the simplifying assumptions made is required.

Our taxonomy focused on the simplifying assumptions made regarding the initial treatment effect (of discrete treatments conditional on their position in the treatment pathway). This incorporates the impact of previous treatment, differential reason for discontinuing previous treatment, and increasing disease duration. However, the taxonomy did not consider the assumptions made about the long-term effect of treatment. Many treatments of chronic conditions, such as rheumatoid arthritis, result in an initial, short-term improvement, followed by a period of waning effect. In some models, when patients move quickly through the sequence of treatments (for example, early discontinuation due to adverse effects), simulated patients can actually benefit from having multiple 'short-term' benefits from different treatments, thus gaining an additive effect. Some included models of inflammatory arthritis attempted to overcome this problem by introducing a 'rebound' effect, which automatically returns the patient to their starting severity (used in, for example, the Diamantopoulos model [89]), or following some natural, background increase (as used in the BRAM [158]). Although the evidence to support this type of assumption is weak, it is arguably better than the false benefits generated by models otherwise. Similarly, the issue of accumulating short-term benefit can also be problematic where there is an asymmetry in the sequences being compared, for example, the ‘adding’ decision problem illustrated in Box 2. A false benefit can be introduced when modelling a sequence plus new treatment, in comparison to the model without the added treatment, simply by allowing more 'short-term' effects of treatments.

4.6 Recommendations for Research

An important outcome of the review is the gaps identified in the research evidence. More research is needed to establish when it is necessary to evaluate treatment sequences, and how best to make this decision. This is likely to be a condition-specific endeavour, but the methods will be relevant across different clinical scenarios.

Further research is needed to identify how best to develop a summary treatment effect of whole sequences or discrete interventions conditional on positioning in the sequence. This requires improved reporting on previous and subsequent treatment within primary studies, including better data on reasons for discontinuing or switching treatment. Access to individual patient-level data is also key here [35, 162].

Real-world disease-specific data sources can provide essential follow-up data on entire treatment sequences, and potentially be used to emulate a pragmatic randomised trial of dynamic treatment sequences [27, 163,164,165]. If these data sources are going to be useful, treatment sequences need to be considered during the planning and development stages. They will also need to go through many high-quality validation studies [164]. The evaluation of whole treatment sequences using real-world data also needs to take into account the potential biases listed in Box 1.

Finally, little reference was made within existing research on the potential, or actual role, of incorporating patient perspectives into the evaluation of treatment sequences. Further work is needed to develop the optimal approach for involving members of the public in HTA of treatment sequences, which should be informed by existing guidance and recent research on patient and public involvement in systematic reviews and economic evaluations [166,167,168,169,170,171]. As experienced-based experts, patients can contribute essential knowledge that is complementary to that of other key stakeholders, such as clinicians and policy makers. Their involvement, on an equal basis to other stakeholders, is likely to be relevant to all stages of the HTA, including refining the scope and decision problem, the evidence synthesis, evidence interpretation and integration, and dissemination and application [172].

5 Conclusions

The review illuminates a significant gap in methods development. It also demonstrates important limitations in the primary studies, which tended to focus on the evaluation of discrete treatments, with poor reporting of any previous or subsequent treatments. The increasing use of NMA in HTA demonstrates an acknowledgment that clinical and policy decision making should account for the multiple treatments available for many chronic conditions. However, the sequential use of these treatments has yet to be accounted for within clinical evaluations, with most meta-analysis being conducted of discrete treatments that may or may not be stratified by line of therapy. The economic modelling exposes the need to consider treatment sequences, but this is often based on the simplifying assumption of treatment independence. This can lead to misrepresentation of the true level of uncertainty, potential bias in estimating the effectiveness and cost-effectiveness of treatments, and eventually the wrong decision.

In summary, there has been no co-ordinated approach to the important issue of evaluating the effectiveness and cost-effectiveness of treatment sequencing. This is a major shortfall at a time when the cohort of people with complex chronic conditions, requiring sequential treatments, is increasing. The findings of the review will help policy makers and researchers gain traction in answering questions about the effectiveness of different treatment sequences.