FormalPara Key Points for Decision Makers

The new oral anticoagulants, dabigatran, rivaroxaban, and apixaban, are likely to yield additional health benefits in terms of quality-adjusted life-years as compared with warfarin.

Differences in health gains are, however, relatively small and prices are high.

Sequential dabigatran (150 mg up to age 80 years, thereafter 110 mg as recommended by the European Society of Cardiology) is the strategy most likely to be considered cost effective, regardless of risk group.

When reducing the dabigatran dosage at age 75 years (instead of at age 80 years), apixaban becomes the most effective and cost-effective alternative.

Conclusions are highly dependent upon assumptions made in the analysis.

1 Introduction

Atrial fibrillation (AF) is a common heart disease comprising irregular heartbeat and is a major risk factor for stroke [1]. To reduce the risk of stroke, for decades AF patients have been treated with the oral anticoagulant warfarin. However, warfarin leads to an increased risk of bleeding, and requires close monitoring and follow-up from medical doctors. The trade-off between decreased risk of stroke and increased risk of bleeding may be difficult for both doctors and patients because the recommended dosage of warfarin is highly individual.

As the average age in most countries is increasing, so is the prevalence of AF, now reported to be between 1.5 and 2 % in developed countries [2]. The prevalence of AF in the last US stroke statistics is reported to be between 0.5 and 8.8 % for different age groups [3]. Likewise, more than 70 % of AF patients in the USA were reported to use anticoagulants [3], with somewhat smaller numbers in Europe [4].

Until recently, guidelines have recommended the use of warfarin for AF patients with two additional risk factors for stroke [5]. With the introduction of new oral anticoagulants (NOACs), guidelines are less strict; oral anticoagulants are now recommended for AF patients with only one additional risk factor [6, 7].

The increased prevalence of AF, combined with more patients being eligible for treatment, makes it increasingly important to explore whether the benefits of NOACs outweigh the downsides with regards to health. And if the health effects are positive, is the increased effectiveness large enough to justify the added cost? Secondly, in the case that NOACs could be considered cost-effective alternatives to warfarin, it is of interest which of the three would yield the most value for the money. In the first 4 years after the introduction of dabigatran (the first NOAC), several economic evaluations have compared NOACs with warfarin [8]. However, most of these only compared one NOAC with warfarin. This in not in concordance with recently published guidelines for health economic evaluations, where analysts are recommended to include all feasible interventions in the analyses [9]. To date, only two published analyses have compared all oral anticoagulants in one analysis [10, 11]. Both of these analyses were performed before the price of apixaban was available in the relevant countries and the analyses are therefore based on an assumed price, which is obviously an important limitation, as conclusions on cost effectiveness are highly price dependent.

Our objective in this article was to first develop a model that simulates what happens to AF patients taking warfarin or the NOACs, and second to estimate the cost effectiveness of the NOACs compared with warfarin and each other. The perspective of the analysis is the Norwegian publicly financed healthcare system.

2 Methods

In order to evaluate the relative cost effectiveness of dabigatran, rivaroxaban, apixaban, and warfarin, we created a probabilistic decision–analytic model, the NOrwegian STRoke and Atrial fibrillation model (NOSTRA). NOSTRA was created as a Markov model, built around clinical events and health states. A Markov structure was chosen in order to be able to capture long-term health and cost consequences of the different treatment options. A simplified version of the model that illustrates all events in the first cycles is depicted in Fig. 1, while the whole structure, as created in TreeAgePro®, is available in Appendix 1 of the Electronic Supplementary Material.

Fig. 1
figure 1

Model structure (arrows represent health events, while boxes are health states). AF atrial fibrillation, AMI acute myocardial infarction, GI gastrointestinal, intra. intracranial, isch. ischemic, mod. moderate, prev. previous, seq. sequelae, sev. severe

In addition to the health state “AF without further complications,” five health outcomes with considerable potential impact on patients were included in the NOSTRA model: ischemic stroke (IS), intracranial bleeding (IB), major gastrointestinal bleeding (MGIB), acute myocardial infarction (AMI), and death (Fig. 1). Risks of events are incorporated into the model as 1-year probabilities adjustable for different risk factors according to CHA2DS2-VASc and HAS-BLED [12, 13]. CHA2DS2-VASc and HAS-BLED are scoring algorithms that divide patients into risk groups according to how many of a predefined set of clinical risk factors they have. For CHA2DS2-VASc, these are Congestive heart failure, Hypertension, Age ≥ 75 years, Diabetes mellitus, Prior stroke (or transient ischemic attack, or thromboembolism), Vascular disease, Age 65–74 years, and Sex (female). For HAS-BLED, the risk factors are Hypertension, Abnormal renal or liver function, Stroke, Bleeding, Labile international normalized ratio (INR), Elderly (age > 65 years), and Drugs or alcohol.

After each event, patients are either successfully treated for their condition and assumed to be more or less asymptomatic, or they are assumed to have developed heart failure or stroke sequelae. The dark arrows in Fig. 1 illustrate the possible events in the first cycle of the model, while the white arrows illustrate the possible events in later cycles. The final outcome of the model is death, and simulations were stopped at the stage where all patients entering the model would have been 105 years old; hence, a lifetime time horizon was applied.

Risks of baseline clinical events were in the model mainly based on Scandinavian registry data [1424], and to some extent on international data [2529] as reported in Table 1. Risks of most primary events (IB, MGIB, AMI, and death) were based on a population with AF patients all using warfarin. Risk of IS was based on a registry of AF patients who did not use warfarin. This risk was adjusted down based on other registry data to give a better representation of the risk for warfarin patients [30]. The proportion of stroke patients that have severe (20 %) and moderate (48 %) sequelae after 1 year was based on Swedish registry data. Age-specific mortality rates from the general Norwegian population were multiplied by hazard ratios estimated in a Swedish population [31] to give estimated mortality among AF patients in Norway.

Table 1 Key parameters

To model what happens to patients if they are started on NOACs instead of warfarin, rates of events were multiplied by hazard ratios of these events based on data from randomized clinical trials (Table 2). Efficacy data were based on intention-to-treat analyses from the three major randomized clinical trials comparing each of the NOACs with warfarin [3234]. We used overall efficacy data for all different subgroups of CHA2DS2-VASc and HAS-BLED, since the overall estimate of efficacy can also be considered a reasonable estimate for subgroups [35].

Table 2 Efficacy from included trials as reported in the Health Technology Assessment report [58] (incorporated as log-normal distributions)

Due to an increased risk of bleeding with dabigatran 150 mg, guidelines from the European Society of Cardiology (ESC) recommend taking dabigatran 2 × 110 mg and not 2 × 150 mg above the age of 80 years. In the European public assessment report (EPAR) from the European Medicines Agency (EMA), the age 75 years is suggested for some groups [36]. In our analyses, we assumed that patients who started on dabigatran (2 × 150 mg) before age 80 years switched to 2 × 110 mg on their 80th birthday (this scenario is from this point onwards called sequential dabigatran). In separate scenario analyses, we explored to what extent this assumption affected the results, when analyzing a scenario where the switch is performed at age 75 years.

All costs and prices used in the model were from 2012, except for the evaluated drugs, because apixaban did not have a price available until 2013. Prices of warfarin, dabigatran, rivaroxaban, and apixaban were gathered from the Norwegian Medicines Agency on 19 February 2013 (Table 3). Costs of monitoring related to warfarin treatment were based on cost estimates used in a Norwegian Single Technology Appraisal (STA) of dabigatran [37]. These mean yearly warfarin monitoring costs were assumed to be €640 [€1 = 7.47 Norwegian kroner (NOK)]. Costs of events and health states were, to a large extent, based on data gathered for the Norwegian Cardiovascular Disease (NorCaD) model [38]. These were updated based on Norwegian fees and schedules from 2012 (see Appendix 2 in the Electronic Supplementary Material). Costs and effects were all discounted at 4 % according to Norwegian guidelines for economic evaluations of healthcare interventions [39]. We assumed that incremental effects in terms of quality-adjusted life-years (QALYs) were valued at a cost-effectiveness threshold of €79,000, due to a suggested threshold for Norway of NOK588,000 per QALY [39].

Table 3 Prices (€) of investigated drugs

QALY weights were incorporated into the model for each event and health state [4044]. QALYs in health states are assumed to last for whole years, while weights for events are only for shorter periods (Table 1). All QALY weights in the model are based on EQ-5D-3L (EuroQol 5 Dimensions 3 Levels), which is the most commonly used instrument for eliciting QALY values [45, 46].

The NOSTRA model is a so-called probabilistic model. This means that all parameters that are considered to be uncertain are included in the model as probability distributions rather than fixed values. Choices of probability distributions were based on common conventions, as described by, for instance, Briggs et al. [47]. The choice of log-normal distribution for efficacy inputs is based on the argument that the standard error for relative risks is defined on the log scale, as, for instance, also pointed out by Altman [48]. Cost data are usually right skewed, and have been shown to fit well with gamma distributions [49]; hence, we used gamma distributions for the cost input into the model. The relationship between binomial and beta distributions, and the fact that beta distributions can be bounded by 0 and 1, makes these the obvious choice for probability data [50]. Regarding distribution type for QALY input, we chose to use beta distributions, which have been reported as a reasonable choice when data were not close to zero [50]. In total, 204 different probability distributions were included; more detail can be found in Tables 1 and 2, or in Appendix 3 in the Electronic Supplementary Material. Evaluation of the model was done through simulation; in 10,000 iterations, we drew a random number from each of the 204 probability distributions. For each iteration, we recalculated the expected cost and health outcomes in terms of QALY of the four drugs. Analyses were presented both in classical incremental analyses with incremental cost-effectiveness ratios (ICERs), and also as net health benefit (NHB) analyses. Results are additionally presented as cost-effectiveness acceptability curves (CEACs) and frontiers (CEAFs).

Primary analyses were performed for a hypothetical group of 75-year-old AF patients with either medium or high risk of stroke and bleeding. The medium-risk group was defined as having no clinical risk factors apart from their age; hence, a CHA2DS2-VASc of 2 and HAS-BLED of 1. The high-risk group was defined at CHA2DS2-VASc = 4 and HAS-BLED = 2, which is approximately the risk profile of an average AF patient in Scandinavia [15].

We also performed some additional scenario analyses to test certain assumptions in the model. Firstly, we tested whether conclusions would change if we assumed only 4 years of treatment instead of life-long treatment, which was our base-case assumption.

Secondly, it has been hypothesized that follow-up of AF patients on NOACs will not be as comprehensive in practice as in the clinical trials and that this will result in the real-life effectiveness of the NOACs being somewhat smaller than has been reported in the trials. To explore this, we repeated our analyses with a decline in effectiveness of NOACs compared with warfarin. The effectiveness was decreased by 6 % per quarter, based on the analyses from the EuroHeart survey [51], which indicates that this was the case for warfarin. All three trials of the NOACs report similar discontinuation rates for warfarin and the NOACs; hence, the reported adherence with warfarin may be the best available data until we have more experience with the NOACs.

Seeing that dabigatran 150 mg is not recommended for the elderly due to a high risk of bleeding, our original analyses were based on the assumption that all patients on dabigatran 150 mg switched to 110 mg at the age of 80 years, as suggested by the EMA in the EPAR. However, ESC has recommended switching from 150 to 110 mg at age 75 years. Because our analyses start with 75-year-old patients, this scenario implies only taking the strategy “sequential dabigatran” out of the analyses.

3 Results

We estimated that 75-year-old AF patients with a medium risk of stroke according to CHA2DS2-VASc have on average 7.15 remaining QALYs if treated with warfarin (discounted 5.706) (Table 4). The expected increased effectiveness of using NOACs instead of warfarin was 0.100 QALYs for dabigatran (2 × 110 mg), 0.104 QALYs for rivaroxaban (1 × 20 mg), 0.146 QALYs for sequential dabigatran, and 0.153 QALYs for apixaban (2 × 5 mg). All NOACs increased lifetime costs compared with warfarin, despite the fact that a considerable reduction in doctor visits and INR testing was included.

Table 4 Expected lifetime results for medium-risk patients

The ICER compared with warfarin was €66,000 per QALY gained for dabigatran 110 mg and €30,000 for rivaroxaban, indicating both as cost effective at a cost-effectiveness threshold of €79,000. However, sequential dabigatran had both higher expected QALY gain and lower expected costs than both dabigatran 110 mg and rivaroxaban. Hence, sequential dabigatran dominates the other two. When comparing apixaban to sequential dabigatran, the QALY gain of 0.007 gives an ICER of €80,000; hence, apixaban is marginally not cost effective compared with sequential dabigatran. Seeing that conclusions on cost effectiveness depend heavily on the assumed cost-effectiveness threshold, we performed analyses with varying thresholds (Fig. 2). Based on our analyses, the optimal oral anticoagulant in terms of cost effectiveness is warfarin for thresholds below €16,000 and sequential dabigatran for thresholds between €16,000 and €80,000. For thresholds above €80,000, apixaban is the most cost effective.

Fig. 2
figure 2

Cost-effectiveness acceptability curve and frontier of medium-risk atrial fibrillation patients. cost eff. cost effective, QALY quality-adjusted life-year

We estimated AF patients with a high risk of stroke according to CHA2DS2-VASc to have on average 5.79 remaining QALYs left if treated with warfarin (discounted 4.771) (Table 5). The expected increased effectiveness of using NOACs instead was 0.119 QALYs for dabigatran (2 × 110 mg), 0.117 QALYs for rivaroxaban (1 × 20 mg), 0.175 QALYs for apixaban (2 × 5 mg), and 0.183 QALYs for sequential dabigatran. As for medium-risk patients, dabigatran (2 × 110 mg) and rivaroxaban (1 × 20 mg) increased lifetime costs compared with warfarin; however, apixaban (2 × 5 mg) and sequential dabigatran decreased lifetime costs compared with warfarin (about −€800 and −€3,000, respectively). Taking into account that sequential dabigatran both increased effectiveness most and also decreased costs most, this was clearly the more cost-effective alternative, a so-called “dominant strategy.”

Table 5 Expected lifetime results for high-risk patients

We plotted the CEAC for high-risk AF patients (Fig. 3). We can see that even though the decision seems clear based on the expected values, when taking uncertainty in the parameters into account sequential dabigatran is only 58 % likely to be the cost-effective anticoagulant at the given threshold (€79,000 per QALY). On the other hand, sequential dabigatran is the strategy most likely to be cost effective at all cost-effectiveness thresholds.

Fig. 3
figure 3

Cost-effectiveness acceptability curve and frontier of high-risk atrial fibrillation patients. QALY quality-adjusted life-year

In scenario analyses we tested whether conclusions would change if we assumed only 4 years of treatment. These analyses indicated that sequential dabigatran was both the most effective and the most cost-effective alternative for both medium-risk and high-risk patients.

In scenario analyses testing a 6 % decrease in NOAC effectiveness per year, the probability of warfarin being cost effective increased from 0.02 to 0.2 % in medium-risk patients, but was unchanged at 0 % in high-risk patients.

In a separate scenario analysis, assuming that patients on dabigatran had to reduce their dose at age 75, apixaban was the most cost-effective strategy for both risk groups. Accordingly, the percentage of apixaban being the cost-effective strategy among all NOACs increased from 43 to 63 % for medium-risk patients and from 35 to 70 % for high-risk patients.

In one-way sensitivity analyses we varied prices of the drugs that were not cost effective at baseline. In medium-risk patients, apixaban became cost effective at a price reduction of 0.7 % (€1,242 per annum instead of €1,250 per annum), while rivaroxaban became cost effective at a reduction of 79 % (€220 instead of €1,066). For high-risk patients, a reduction of the apixaban cost per year to €646 (48 % reduction) made it cost effective compared with sequential dabigatran. For high-risk patients, no rivaroxaban reduction could make it cost effective, while dabigatran 110 mg and warfarin would never be cost effective in either risk group.

4 Discussion

All three NOACs are effective and can be considered cost effective compared with warfarin from a Norwegian perspective. Apixaban is the only NOAC that indicates a positive effect compared with warfarin for all five outcomes included in the NOSTRA model, as can be seen in Table 2 (although only efficacy on IB and mortality are statistically significant at the 5 % level). The increased effectiveness is, however, marginally not cost effective compared with sequential dabigatran, since the ICER was €80,000 and the proposed cost-effectiveness threshold in Norway is €79,000.

Note also from Fig. 2 that apixaban has the highest probability of being cost effective at WTPs above €60,000. The discrepancy between this probability of being cost effective and the expected cost effectiveness has previously been discussed in detail by, for example, Fenwick et al. [52], Claxton et al. [53], and Al [54]. The point made here is that if one or more input distribution is not symmetrical, then the intervention with the highest expected net benefit is not necessarily the intervention that is most likely to be cost effective.

For AF patients with a high risk of stroke, sequential dabigatran is both the most effective and most cost effective, despite the fact that dabigatran 150 mg is related to an increased risk of gastrointestinal bleeding and myocardial infarction as compared with warfarin (and indirectly apixaban, as can be inferred from Table 2). This result comes as a consequence of dabigatran having a somewhat better effectiveness than apixaban and warfarin on both IS and IB in the clinical trials.

There is considerable uncertainty related to which of the three oral anticoagulants is the most effective and cost effective for patients with AF. As shown both in the probabilistic analyses and in the scenario analyses, small changes in assumptions could easily change conclusions, not only regarding which NOAC, but also regarding warfarin. The two strategies with the highest probabilities of being cost effective are clearly sequential dabigatran and apixaban in both scenarios. For medium-risk patients, the cost effectiveness seems undecided between the two, since the ICER is so close to the Norwegian suggested WTP, which is still only a reference value. For patients with a higher risk of stroke, the negative effects of dabigatran 150 mg are overshadowed by the positive and sequential dabigatran has the highest probability of being cost effective.

Needless to say, the prices of the NOACs affect the results. This is especially so for medium-risk patients, where a price reduction of less than 1 % for apixaban makes it cost effective compared with sequential dabigatran. Note also that for medium-risk patients, warfarin is the most cost-effective alternative when the cost-effectiveness threshold is low. Hence, if the Norwegian threshold is lowered, warfarin may be a cost-effective alternative.

As mentioned in the Introduction, we have only been able to find two economic evaluations comparing all NOACs in one analysis, as we do here. The results of these analyses were close to ours, in that apixaban and dabigatran (150 mg) were both cost effective compared with warfarin in the base-case, with conclusions favoring apixaban in one study and dabigatran in the other [10, 11]. The models used in these two analyses seem to be relatively similar to ours; the number of health states, age groups analyzed, follow-up time, discounting, and several other factors were similar. However, a few differences between our model and these two analyses are present. Both of the two other models include more health events than ours, such as minor bleeding. This may be a limitation with our model, but as long as it is minor bleedings, both the QALY impact and cost impact of these are seemingly small. One advantage of our model is that it facilitates analyses with the new risk-scoring system CHA2DS2-VASc, which is, for instance, recommended by the ESC [2]. Note also that QALY gains from our model are very close to those reported by Coyle et al. [10], while Harrington et al. [11] report much higher incremental gains.

In the present study, we have presented graphs of CEAC and CEAF in the same figure (Figs. 2 and 3). This was an attempt to join the advantages of the two into one figure. The CEAC gives information of the probability of each intervention being the most cost effective, while the CEAF only gives this information for the strategy with the highest expected net benefit.

There are several limitations in this economic evaluation. The most obvious may be that in modeling some simplifications have to be made [55]. In the NOSTRA model, we have only included some of the events and health states that could potentially be included. We have, however, included the most severe and prevalent, so the model should be thorough enough to have captured most of the relevant differences between the NOACs and warfarin. Additional health outcomes, which potentially could have been included in NOSTRA, are major bleedings (not gastrointestinal) and hemorrhagic stroke. The first of these was not included because it would require splitting the outcome “major bleedings” from the trials into subgroups of bleedings in order to find proper cost estimates. The second outcome (hemorrhagic stroke) was excluded because of the major overlap with IB, which would imply double counting if it was included. With a total of eight different health states and 204 different probability distributions included, the NOSTRA model is clearly among the more comprehensive models.

A model is, however, never better than its input parameters. Efficacy data for each of the comparisons with warfarin are based only on one single RCT each and direct active comparisons between the NOACs have not yet been studied. To be able to prioritize between the different NOACs, comparative effectiveness is required, preferably through head-to-head RCTs. Relatively short follow-up times in the included trials are also a limitation, because these are drugs that are intended for use for the remainder of people’s lives, and they are only studied for approximately 2–3 years. Even though this may be sufficient with regard to efficacy, several safety issues demand longer follow-up. Another issue with the three trials is that they are designed as non-inferiority trials. When non-inferiority trials indicate superiority over warfarin for some outcomes, this is usually taken to be hypothesis generating. Hence, based on those trials, we have generated a hypothesis that NOACs are better than warfarin. However, this has not yet been tested in a superiority trial, and could therefore be seen as a limitation of our modeling results.

A re-analysis of one of the RCTs on which we have based our efficacy data [56] indicated that the effect of dabigatran versus warfarin is smaller (if any) for the subgroup of patients in whom INR levels are well-controlled. The Nordic countries are among the countries with the best INR control, also exemplified in the re-analysis. However, the re-analysis did indicate that Norway might have the worst INR control within this subgroup. Our cost–utility analysis is based on efficacy data from each study as a whole, and no subgroup analyses of efficacy have been used. If similar subgroup differences are shown for the other NOACs, both efficacy and cost effectiveness would be affected. Hence, it would be relevant to re-do both meta-analyses and cost–utility analyses to see whether the NOACs are still effective and cost effective compared with warfarin in settings with different levels of INR control.

Another issue that may be a limitation in these analyses is that we had no data on patient compliance apart from what is reported in the trials. Patients on warfarin are likely to be followed up more closely than patients on NOACs in general practice. Level of follow-up from doctors is likely to be linked to compliance within patients. In two of the three trials on which we based our efficacy data, patients and doctors were both blinded with regards to treatment. Hence, we could not be sure of the effectiveness of these drugs in real practice.

Valuing health-related quality of life is not straight-forward, and the use of QALYs requires caution. In the present article, we decided to base all QALY values on EQ-5D data. EQ-5D is the most used tool and the only tool for which we could find published values on all events and health states for input into our model. QALY values obtained for the different health events and states are based on four different studies (See Table 1). It would, of course, be better if EQ-5D had been reported for all these outcomes in one single study. The fact that these values are based on different sources may have shifted the internal ranking between these and then influenced the cost effectiveness.

Approximately 65,000 patients in Norway and at least 120,000 in Denmark have AF [15, 57]. Of these, about 10 % have a low risk of stroke and are not eligible for anticoagulant treatment. Another subpopulation is also not eligible for NOACs due to co-morbidity or possible drug interactions. We have no data on this percentage, but as an estimate, let’s say this is 10–20 % of AF patients. Based on this, around 70–80 % of AF patients would be eligible for anticoagulant treatment. With a European population of 750 million, between 11 and 15 million people could be assumed to have AF and, of these, perhaps 10 million are eligible for NOACs. A price increase of close to €1,000 per year per patient would mean around €10 billion in extra expenditures each year if NOACs were to be used in all patients instead of warfarin.

5 Conclusion

There is considerable uncertainty regarding which of the oral anticoagulants is the most cost-effective alternative. However, apixaban and dabigatran (150 mg up to age 80 years, 110 mg after age 80 years) seem to be the most effective and cost-effective alternatives. Warfarin can only be a cost-effective alternative in Norway if the threshold for cost effectiveness is much lower than assumed.