FormalPara Key Points for Decision Makers

Statistical analysis found no significant differences between ablation types for any of the key clinical outcomes used in the economic model.

The streamlined day-case atrial fibrillation ablation protocol was shown to be cheaper than the conventional cryoballoon ablation procedure.

There was a significant difference between an optimised AAD protocol and ablations, with ablations reducing the weekly rate of subsequent ablations.

The streamlined day-case atrial fibrillation protocol was cost effective versus AADs in a UK National Health Service (NHS) setting at the £30,000 per quality-adjusted life-year threshold.

The lifetime economic model also estimated that the streamlined ablation protocol reduced stroke rates by 4% per person compared with those treated with optimised AADs.

1 Introduction

Ablation is a recognised treatment option for individuals with atrial fibrillation (AF) who remain symptomatic despite optimised antiarrhythmic drug (AAD) therapy, and while ablation can be performed on a day-case basis, it usually involves an overnight admission [1].

The AVATAR-AF randomised controlled trial (NCT02459574) [2, 3] was a randomised, multicentre, open-label trial testing the superiority of a ‘streamlined’ AVATAR ablation protocol over optimisation of AAD therapy as the primary hypothesis. The trial involved 321 participants with uncontrolled paroxysmal AF who were recruited and randomised across 13 centres in the UK into three initial treatment arms (AVATAR protocol, conventional ablation protocol or optimised AAD protocol).

The AVATAR protocol arm was defined as patients who received a cryoballoon ablation (Arctic Front AdvanceTM, Medtronic) without assessment of acute pulmonary vein (PV) isolation and performed as a day-case procedure. The conventional ablation protocol was defined as patients who received a standard cryoballoon ablation (Arctic Front AdvanceTM, Medtronic) with formal assessment for acute PV isolation and overnight hospitalisation [2, 3]. PV isolation requires individuals to be trained in this specialist technique, therefore being able to conduct an ablation procedure without this specialist technique represents a potential cost saving to the healthcare system. Further details of the clinical trial can be found in the study by Kanagaratnam et al. [3]. Due to the real-world nature of the study, 54% of patients initially randomised to the optimised AAD protocol elected to receive conventional ablation during the study. Therefore, the optimised AAD protocol represents a mix of patients treated with AAD only and AAD plus ablations.

The objective of our study was to assess whether the streamlined day-case AVATAR protocol is a cost-effective alternative to either a conventional cryoablation protocol or an optimised AAD protocol. As part of this objective, we also looked to compare the clinical efficacy of interventional treatment versus optimised AAD therapy via the pooling of data from both types of ablation protocols (AVATAR and conventional cryoablation).

2 Methods

This study used a two-stage approach to develop the economic evaluations of the protocols. In the first stage, individual patient data (IPD) from the AVATAR-AF study were used to derive prognostic equations to predict the following outcomes: long-term follow-up rates of ablation, symptom development, symptom recovery and AF-related hospital attendance in addition to health-related quality of life (HRQoL) utilities. In the second stage, the equations were used, where possible, to parameterise a cost-effectiveness economic model considering either a three-way comparison (AVATAR protocol vs. conventional ablation protocol vs. optimised AADs protocol) or a two-way comparison (pooled ablation protocol data vs. optimised AADs protocol).

2.1 Stage One: Statistical Analyses of the AVATAR-AF Trial Data

All outcomes listed above were defined as functions of the treatment arm, and selected additional covariates of potential clinical relevance were used to produce adjusted mean estimates. Generalised linear models (GLMs), with either a Poisson (log link) or Gaussian (identity link) distribution, were used to model all outcomes. The most appropriate distribution for the statistical models was chosen based on the dependent variable type (e.g., count or continuous) and diagnostic criteria (e.g., Akaike’s Information Criteria).

An offset variable was included within the long-term follow-up statistical models to derive a rate per week, rather than an absolute count for each patient, to account for exposure time for the relevant models. In order to meet the requirements of the National Institute for Health and Care Excellence (NICE) reference case [4], EQ-5D-5L data were mapped onto EQ-5D-3L responses to generate utilities prior to any statistical analysis, using the van Hout algorithm [5].

Missing data from patients who were randomised into the trial but withdrew before beginning treatment were removed from the analysis dataset. However, when values of specific variables were missing for a small number of patients (< 5%) who did not withdraw from the study, the missing data were assumed to be missing completely at random, and either a multivariate imputation by chained equations or a last observation carried forward approach was adopted to impute the values. Finally, prior to all statistical analysis, 16 patients were removed from the analysis dataset due to having completed the wrong version of the EQ-5D questionnaire and were hence viewed as protocol deviations. This was done to ensure all HRQoL analyses were performed using information collected using the same instrument and to avoid the inclusion of any biases into the analyses.

All statistical analyses used a 12-week ‘blanking period’ after the initial procedure. This is in accordance with the Expert Consensus Statement on Catheter and Surgical Ablation of Atrial Fibrillation, which recommends that counting AF recurrences should be avoided within the first 3 months [6]. Despite a subset of patients requiring early re-intervention, events contributing to the outcomes of interest that occurred within the first 3 months were excluded from the reablation rate calculations to align with clinical guidance and the primary manuscript [2]. However, it should be noted that despite excluding early reablations from the long-term rate calculations, the costs associated with additional ablation procedures were captured in the decision tree portion of the economic model.

Two distinct sets of statistical analysis were undertaken. The first considered a three-way comparison (AVATAR protocol vs. conventional ablation protocol vs. optimised AAD protocol). In the event that the three-way comparison showed no significant differences between the AVATAR and conventional ablation protocols, these data would be pooled to increase the statistical power of ablation treatment when compared with optimised AADs in a two-way analysis.

The final selection of variables used for each statistical model were outcome-dependent and were selected via a stepwise deletion process. All GLMs included sex and baseline age as covariates of interest, and the HRQoL GLM also included symptom status as a covariate while adjusting for baseline EQ-5D. No second-order interactions were included in any of the statistical models. All statistical models were two-tailed and significance was defined as p < 0.05.

2.2 Stage Two: Description of the Economic Model

2.2.1 Model Design

The cost-effectiveness model was a hybrid of a decision tree with a 1-year time horizon, and a Markov model with a lifetime time horizon (Fig. 1). Costs and benefits were captured in both parts of the model. The endpoint allocation from the decision tree formed the initial state allocation in the Markov model. In line with methodological guidance issued by NICE, a UK National Health Service (NHS) perspective was used, and all benefits were expressed in terms of quality-adjusted life-years (QALYs). In addition to the lifetime time horizon, a cycle length of 3 months was used in the Markov model. This cycle length was chosen to align with the study collection follow-up appointments that occurred every 3 months. All costs and benefits, regardless of which part of the overall model they were derived in, were discounted at 3.5% per annum.

Fig. 1
figure 1

Schematic of the economic model. (a) Decision tree covering the first 12 months of the economic model. The AAD decision nodes are identical to those presented for AVATAR-AF. (b) Markov model where decision tree endpoints constitute initial allocation. The Markov model covers the remaining lifetime of the economic model. Death is an absorbing state and movement into it is permissible from all other health states. The numbers at the end of each health state indicate the total number of ablations (post initial treatment) in each health state. Patients could have up to a maximum of three total ablations (including the initial procedure). Stroke, heart failure and other adverse events are not modelled as health states but rather as events that can occur to individuals within the health states, and therefore have not been included in the model schematic. AF atrial fibrillation, AAD antiarrhythmic drug, NSR normal sinus rhythm, ST short-term, LT long-term

The patient pathway in the decision tree was captured using three health states: NSR (‘Normal Sinus Rhythm’, defined as no AF episodes recorded within a 3-month period) and Short-Term Episodic AF (‘ST-Episodic’, defined as at least one AF episode [either paroxysmal or persistent] documented within a 3-month period) and death. In the Markov model, two additional health states were included: long-term persistent AF (‘LT-Persistent’, defined as the same symptoms as in the ST Episodic AF health state, but over at least a 12-month duration, that does not resolve on its own), and permanent AF (defined as AF symptoms from which NSR cannot be restored either spontaneously or through treatment).

In addition to AF symptoms, individuals could have a maximum of three ablation procedures (including the original one). For individuals in the optimised AAD arm, the first ablation procedure was classified as the first follow-up ablation within the economic model to allow parity to the ablation arms; however, patients in this arm were still restricted to a maximum of three ablations.

During any cycle in the long-term model, if an individual had an ablation procedure, their ablation count was increased by one, and they could either move back to the NSR health state (but with a higher ablation count) or remain where they were (again with a higher ablation count). In addition, those who received an ablation in a given cycle could move between health states with the same ablation count. As such, the Markov model has 14 distinct health states.

Stroke and heart failure (HF) were included in the model as key AF-related adverse events but were modelled as a proportion of individuals in each health state rather than independent health states. These events were associated with short-term and lifetime costs and HRQoL decrements, which were applied in the first year and all subsequent years respectively (see Table 1). Strokes were stratified by severity (non-disabling, moderately disabling and severely disabling) [7] and HF was captured via the New York Heart Association (NYHA) classification system [8]. Procedure-related adverse events (cardiac tamponade, PV stenosis, vascular complications, and persistent phrenic nerve injury) were also included in the model (using the same method used for stroke and HF). However, as these intraoperative events are typically short-lasting, it was assumed they would only result in additional treatment costs and there would be no impact on a patient’s HRQoL.

Table 1 Key model parameters

2.2.2 Model Parameterisation

Where possible, parameters relating to the clinical efficacy of the AVATAR protocol arm were informed either by statistical outputs for the AVATAR protocol arm only or for the pooled ablation protocol data depending on the chosen comparator. Where parameters could not be sourced from the available trial data, inputs were taken from the published literature or derived by interviewing both the clinical co-authors of the study and external clinicians in a group meeting. This included two clinical co-authors who were Cardiologists in England and five external experts who specialised in cardiology and electrophysiology from Canada, Germany, England and the United States. A consensus was achieved by all experts for all inputs based on what was deemed reasonable and conservative given their clinical experience. The structure of the economic model and the definitions used to define each health state were also validated by the same panel of experts to ensure it was reflective of the typical patient pathway.

Table 1 provides information regarding the sources of key non-trial parameters used. Procedure-related equipment costs were supplied by Medtronic or based on publicly available national datasets [9, 10] (see electronic supplementary material [ESM] Table 1).

In order to account for symptom severity and adverse events, disutilities to the baseline utilities were applied to published age- and sex-adjusted population norms [11]. The value for ST-episodic AF was derived from the AVATAR-AF trial data analysis but all other values were derived from the literature [12,13,14,15].

Mortality was captured within the model via a combination of UK general population life tables (excluding stroke- and HF-related deaths) [16] and derived stroke- and HF-related mortality rates. Stroke mortality rates were estimated using published age-specific stroke mortality rates [17] conditional on stroke incident rates. Stroke incidence rates were obtained from trial-specific baseline CHA2DS2-VASc scores and published CHA2DS2-VASc score incidence [18]. NYHA class-specific mortality rates were combined with the HF incident rates [19] and HF-related mortality rates [20].

Where possible, costs in the first year of the model were derived from a protocol-driven within-trial cost analysis (ESM Table 1). Obtained resource use data for ablation procedures, hospital care and pharmaceutical use were extrapolated for the entire time horizon. Costs that could not be sourced from the within-trial analysis (such as those relating to adverse events) were sourced from the published literature and open databases [9, 10, 14, 21, 22]. All costs used in the model were from 2019/2020.

2.2.3 Analysis of Uncertainty

Uncertainty within the model was explored by conducting a probabilistic sensitivity analysis (PSA) and a series of scenario analyses (e.g., changing the magnitude of the relative risks of state transitions and incidence rates of adverse events). PSA (2500 iterations) was undertaken to estimate the probability of the AVATAR protocol being cost effective compared with the chosen comparator at different cost-effectiveness thresholds (£20,000 and £30,000 per QALY gained as per NICE recommendations) [4]. The scenario analysis involved changing base-case model inputs for inputs obtained from alternative sources, as well as varying inputs derived from clinical expert opinion with reduced or increased magnitudes. Scenarios included varying the values of relative risk of symptom development and recovery (relative to the number of previous ablations or treatment arm), the ablation success rate, and the incidence rate of stroke. The impact of using alternative utility estimation methods (i.e., EQ-5D form replaced by AF Quality of Life Survey form with additional utility decrement for higher European Heart Rhythm Association class) were also explored.

2.2.4 Software

RStudio with R 3.6.3 [23] was used for all statistical analyses. The cost-effectiveness model was developed in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA).

2.2.5 Consolidated Health Economic Evaluation Reporting Standards Checklist

A completed Consolidated health Economic Evaluation Reporting Standards (CHEERS) checklist has been reported in the supplementary materials.

3 Results

3.1 Statistical Analyses

The statistical findings from the models using the three-way comparison found no significant differences between the two ablation protocols for any of the key clinical outcomes used in the model (see Table 2). Thus, the clinical data for both forms of ablation were pooled to generate inputs to be used in the economic model.

Table 2 Summary of the key statistical analyses (AVATAR protocol vs. conventional ablation protocol vs. optimised AAD protocol)

The results derived using pooled ablation protocol data are presented in Table 3. Ablation was associated with a relative decrease of 0.33 in the weekly rate of follow-up ablation compared with those receiving the optimised AAD protocol (95% confidence interval [CI] 0.22–0.49; p < 0.001). Baseline age was found to have no statistically significant relationship with follow-up ablation rate. Furthermore, after controlling for baseline HRQoL, experiencing short-term episodic AF was predicted to have a statistically significant impact on HRQoL compared with NSR (mean value − 0.07, 95% CI − 0.11 to − 0.03; p = 0.004). Statistically significant ablation-driven impacts on the rates of symptom development and recovery or the rate of AF-related hospital attendance events were not identified.

Table 3 Summary of the key statistical analyses (pooled ablation protocol vs. optimised AAD protocol)

3.2 Economic Modelling

3.2.1 Comparison of AVATAR-AF and Conventional Ablation

As the statistical analysis showed no significant clinical differences between the AVATAR protocol and conventional ablation protocol arms, a cost minimisation approach was used to evaluate the cost effectiveness of the two ablation protocols. The AVATAR streamlined protocol saved £1279 per patient compared with the conventional ablation protocol (ESM Table 2).

3.2.2 Comparison of Ablation with Optimised Antiarrhythmic Drugs

The pooled ablation protocol clinical efficacy results from the IPD were combined with cost and resource use data for the AVATAR protocol to generate a non-trial, real world, AVATAR ablation protocol treatment option. As the cost-minimisation approach showed that streamlined AVATAR procedure is cost effective versus conventional ablation, only the AVATAR ablation procedure costs were included in this analysis as it would be favoured over conventional ablation from a cost-effectiveness viewpoint.

When comparing this alternative AVATAR ablation protocol with optimised AADs, AVATAR was found to be more costly while offering improved clinical benefits. Over a lifetime time horizon, the AVATAR protocol was estimated to be associated with an incremental cost of £1737 (95% credible interval [CrI] £576–£2757) per person but it provided an additional 0.08 QALYs (95% CrI 0.02–0.16). The incremental cost-effectiveness ratio (ICER) was therefore £21,046 per QALY gained (95% CrI £7086–£71,718). The probability that the AVATAR protocol was cost effective (i.e., the ICER is below the willingness-to-pay threshold) at thresholds of £20,000 and £30,000 per QALY was 43.2% and 67.6%, respectively (see Table 4 and Fig. 2).

Table 4 Per-patient lifetime probabilistic results [mean (95% credible intervals)]
Fig. 2
figure 2

Cost-effectiveness plane. The majority of outputs of the PSA iterations is below the £30,000 per QALY gained threshold. PSA probabilistic sensitivity analysis, QALY quality-adjusted life-year

In addition, the AVATAR protocol was estimated to reduce the lifetime stroke rate by 4% per person compared with those in the optimised AAD protocol.

Across all deterministic scenarios assessed, the ICER remained below £30,000 per QALY gained (Table 5). Changing individual model inputs or groups of inputs did not substantially affect the results. One of the 11 scenarios generated ICERs below a threshold of £20,000 per QALY gained (using a reduced probability of symptoms developing after follow-up ablation in combination with an increased probability of symptom recovery after follow-up ablation for the ablation arm). In the remaining 10 scenarios, ICERs ranged between £20,000 and £30,000 per QALY gained, similar to the base-case results.

Table 5 Deterministic parameter uncertainty scenario analysis outputs

4 Discussion

The AVATAR-AF study looked to explore the clinical implications of cryoballoon ablation being conducted without an assessment of acute PV isolation or overnight hospitalisation. The AVATAR-AF study found that there was no significant difference in the rate of AF symptom development between the AVATAR protocol arm and the conventional ablations protocol arm [3].

The statistical analyses undertaken as part of this modelling project expanded on these findings and showed that there was also no significant difference in the rate of symptom recovery, rate of hospital attendance or mean EQ-5D values. These findings are aligned with previous studies where HRQoL results imply similar efficacy for different types of ablations [24,25,26].

A primary conclusion of this work is therefore that it is possible to generate equivalent clinical outcomes whether or not electrical assessment of PV isolation is performed at the time of the cryoballoon ablation and without the need for overnight hospitalisation. The cost minimisation modelling showed that these equivalent benefits could also be provided in a way that is cost saving to the UK NHS. The implications of these findings on how ablation is delivered internationally could be substantive.

We also believe that this study is one of the first ‘real world’ cost-effectiveness analyses comparing ablation with AADs for the treatment of AF, since optimised AAD patients were allowed to crossover to the ablation arm if needed and is therefore a better reflection of the true patient pathway. The ‘real world’ nature of the model and, in particular, the fact that over half of the patients in the optimised AADs arm (54%) received an ablation, is likely to bias the results against frontline ablation but be more reflective of the true clinical pathway.

In addition, in the statistical analysis of the pooled ablation protocol arm compared with optimised AADs, the use of ablation therapy only had a statistically significant impact on the rate of follow-up ablation, but not for any other clinical outcome. This result is consistent with the current literature and the AVATAR-AF study findings, which shows that ablation therapy reduces the rate of AF recurrence compared with those treated with AADs [3, 24,25,26]. Our study also aligns with the European Society of Cardiology 2020 guidelines, which state the primary benefit of AF catheter ablation is a reduction in AF-related symptoms [27].

It is important to note that the option for patients in the optimised AAD arm to crossover into ablation may have diluted any clinical benefits associated with ablation procedures, and thus a larger sample size would be required to detect additional benefits such as changes in HRQoL [28]. Furthermore, HRQoL data were collected at baseline and at the 12-month follow-up visit only, and no intermediate data were collected from patients who crossed-over to ablation between the baseline and 12-month follow-up visits.

When compared with optimised AADs, the ICER for the AVATAR protocol (using the pooled ablation efficacy data) was below the maximum acceptable threshold used in the UK cost-effectiveness decision making (£30,000 per QALY gained) [4]. This is comparable with previously published economic evaluations comparing ablation technologies with optimised AAD therapy that found ablation to be cost effective [29,30,31,32,33]. Furthermore, a recent economic evaluation by NICE comparing cryoablation with AADs concluded that AF ablation was cost effective, with a reported ICER of £14,022 per QALY gained [34]. Care should be taken when comparing the results of this study with the NICE economic evaluation due to the high level of crossover experienced in the optimised AAD arm. As for the reasons already highlighted, this ‘real world’ clinical setting approach is likely to bias the findings against frontline ablation. Therefore, a higher ICER would be expected compared with the NICE economic evaluation, where the level of crossover in the optimised AAD arm was lower.

Overall, we believe that our study findings are aligned with those from other research groups and have shown that the AVATAR protocol is cost effective when compared with optimised AAD therapy.

It should be acknowledged that the data used to parameterise the model are subject to limitations. The AF symptom data used in the model were patient self-reported due to a lack of electrocardiogram (ECG) monitoring in the trial. Typically, self-reported symptoms are less credible than ECG monitoring because a small proportion of patients can be asymptomatic for AF. Therefore, the use of ECG monitoring may produce more accurate symptom progression or recurrence rates for use in the economic evaluation. However, previous studies have demonstrated no differences in major clinical outcomes for patients who present as symptomatic versus asymptomatic [35,36,37]. As this study is focused on an economic evaluation, an asymptomatic patient would not receive treatment for AF until they presented with the condition. Therefore, they would not be associated with any additional treatment costs, compared with symptomatic patients, in the short term, and their inclusion would be unlikely to significantly alter the conclusions of this study. We also note that despite capturing the costs of early re-interventions during the ‘blanking’ period they were excluded from the reablation rate statistical analyses to align with the symptom data, which were only collected every 3 months. However, analysis of cryoablation as a first-line treatment in a US healthcare setting showed that when data are available, the inclusion or exclusion of a blanking period did not impact the cost-effectiveness conclusions of cryoablation [38].

All economic evaluations are subject to uncertainty with regard to the inputs used, which needs to be considered when interpreting the results. Due to the absence of data for a selection of long-term clinical outcomes, assumptions were used and these were validated by the clinical authors of the paper and external clinicians. In addition, we sought extensive input from clinicians and other international experts on the design of the economic model and the definitions used in the health states to ensure a representative reflection of the patient pathway.

We also looked to explore the implications of any assumptions on the model outputs via extensive parameter uncertainty analyses. We undertook deterministic and probabilistic analyses of uncertainty and found that the model findings were robust to all plausible changes. The sensitivity analysis showed that in all 11 scenario analyses, the ICER was below a threshold of £30,000 per QALY gained.

5 Conclusion

This is the first economic evaluation, which is reflective of the true clinical pathway, of a streamlined cryoablation-led protocol versus an optimised AAD-led protocol. This study found that the economic evaluation of the streamlined AVATAR protocol versus optimised AADs is similar to that of previously published evaluations of the conventional ablation protocol versus AADs. Therefore, the AVATAR streamlined protocol is a cost-effective option versus both the conventional ablation protocol and optimised AAD treatment in the UK NHS healthcare setting.