FormalPara Key Points for Decision Makers

Although total ankle replacement costs more than ankle arthrodesis at 52 weeks, the benefits of surgery last for several years and longer-term modelling suggests that total ankle replacement can be cost effective over the course of the patients’ lifetime.

There is a need for longer-term prospectively collected data comparing total ankle replacement and ankle arthrodesis.

Our findings can facilitate joint patient/clinician decision making regarding the choice of surgical approach alongside other considerations, including functional expectations and risk of complications.

1 Introduction

Over 20 million people in the United Kingdom (UK) have a musculoskeletal condition such as arthritis or back pain. Osteoarthritis is the most common form of arthritis and affects the hands, shoulders, hips, knees and ankles of 8.5 million people in the UK [1]. Ankle osteoarthritis is less prevalent than hip or knee osteoarthritis but has become more common in recent years [2]. Unlike other joints, the most common cause of ankle osteoarthritis is following trauma, such as ankle fracture or recurrent severe sprains [3]. Symptoms vary but as ankle osteoarthritis is a long-term condition and cannot be cured, it may lead to reduced mobility and requires treatment. Osteoarthritis is the eighth most common cause of disability in the UK [1]. The direct costs of osteoarthritis and rheumatoid arthritis to the health care system were £10.2 billion in 2017 [1].

More than 29,000 patients each year in the UK are referred to specialists due to symptoms of ankle osteoarthritis [3]. Most are treated non-operatively with analgesia, activity modification, weight loss, braces, or physiotherapy. When these have been tried and failed and patients are diagnosed with end-stage ankle osteoarthritis, they typically undergo National Health Service (NHS) surgical treatment. The main options are ankle arthrodesis (i.e. ankle fusion [AF]) and total ankle replacement (TAR). AF was introduced in 1879 and is still the most common treatment of end-stage ankle osteoarthritis; however, TAR has been performed more and more often in the last decade [4].

TAR is an operation to replace an ankle joint with a prosthesis. The talar and tibial surfaces are prepared and resurfaced with metal components with a plastic insert in between them [5]. The prosthesis allows a gliding motion. The metal components have a coating that encourages the bones to grow into them, and the plastic component can be mobile that moves forwards and backwards as the ankle moves, or that is fixed to the tibial component. There is no conclusive evidence on which one is better. Major complications of TAR are the need for revision, most often due to loosening of metal components, either with evidence of infection or without any evidence of infection, known as aseptic loosening [6]. Revision is defined as the removal or exchange of any components of the implanted device inserted during ankle replacement surgery. Revision is generally another TAR or conversion to AF [7]. In a recent study, revision TAR had 100% survival at 2 years and 87% survival at 4 years [8], while conversion to AF had a 1-year survivorship of 96% and 72.4% at 5 years [9]; a systematic review that included studies with at least 1-year follow-up reported survivorship of 84% [10]. However, patient satisfaction with conversion to AF is low [11]. There is currently limited evidence of the long-term outcomes of revisions.

AF is an operation to convert a stiff painful joint into an even stiffer but painless joint. The remaining cartilage is removed and the two bone ends are opposed in the most appropriate position and held together with screws or plates until they join to become one (bone fusion) [12]. It can take up to 16 weeks for the joints to fuse. AF can be conducted as an open procedure or arthroscopically depending on the surgeon’s preference. Major complications include problems with bone fusion, adjacent joint arthritis, and the need for revision [6]. Revision is defined as taking down the fusion mass and having single or two-stage revision fusion for non-union, conversion to TAR, or conversion to amputation. The success rate is usually quite high and is reported to be between 77 and 85% in empirical studies [13].

Both AF and TAR reduce pain and improve outcomes [4, 14,15,16] but are very different treatments. In AF, the stresses move to the adjacent joints, which have been shown to wear with time and may require further surgery [4]. AF also alters gait kinematics [4]. TAR offers a greater range of motion, however as a mechanical joint, can lead to bone cysts and loosening of the implants [6]. Despite this, TAR is expected to produce better outcomes, reduce pain, and increase mobility compared with AF. This will lead to a reduction in the need for home adaptations, help from family and friends, and reduced working time if employed. After both TAR and AF, immobilisation is required using a plaster or walking boot.

All studies comparing TAR with AF have been observational and no robust randomised trials have been reported. While some studies have evaluated the cost effectiveness of TAR compared with AF [6, 17, 18], these studies are all decision models extrapolating data from published sources. They have suggested that TAR has the potential to be cost effective. Both TAR and AF are provided in the NHS, but there is no reliable evidence as to which is the best treatment for end-stage ankle osteoarthritis. National Institute for Health and Care Excellence (NICE) does not have guidance on the treatment of end-stage ankle osteoarthritis, and patients must often rely on surgeons to advise on the best option for them [19]. Both clinicians and decision makers require a strong evidence base to provide appropriate treatment to patients, and to ensure the best health outcomes and the best use of limited healthcare resources.

This paper reports the results of a cost-utility analysis evaluating TAR compared with AF over 52 weeks using patient-level data from a randomised controlled trial (RCT). The primary within-trial analysis was conducted in accordance with the intention-to-treat (ITT) principle and from the NHS and Personal Social Services (PSS) perspective. Secondary within-trial analysis adopted a wider perspective and included out-of-pocket costs incurred by the participants as well as productivity losses due to absenteeism. Sensitivity analysis included only patients who received the surgery to which they were randomised (per-protocol analysis). We also conducted a model-based analysis that extrapolated the within-trial results over the expected lifetime of the patients. The main clinical results are reported separately [20], with a primary outcome of self-reported pain-free function 52 weeks post-surgery.

2 Methods

2.1 Trial Design and Population

TARVA is a pragmatic, prospective, multicentre, parallel-group, non-blinded RCT aiming to compare the clinical and cost effectiveness of TAR versus AF in patients with end-stage ankle osteoarthritis aged 50–85 years.

Participants were recruited in 17 UK hospitals, varying widely from district general hospitals to specialist orthopaedic hospitals. Participants were identified during routine clinic appointments or through screening of referral letters or clinic lists. Participants aged 50–85 years were recruited as this is the typical age range of patients with end-stage ankle OA eligible for surgical treatment. Eligibility criteria were patients with end-stage ankle osteoarthritis who the surgeon believes to be suitable for both TAR and AF (considering deformity, stability, bone quality, soft tissue envelope and neurovascular status), able to read and understand the Patient Information Sheet (PIS) and trial procedures, and willing and able to provide informed consent.

If eligible, the patient watched a bespoke trial video and read the PIS and a generic factsheet about ankle osteoarthritis and its treatment options. After receiving informed consent, magnetic resonance imaging (MRI) and preoperative assessment was conducted 14–30 days prior to surgery. MRI is performed as part of standard care and there were no other requirements for extra tests or hospital visits. Patients were then randomised to one of the surgical approaches. The randomisation process was performed using the Sealed Envelope™ randomisation service based on a minimisation algorithm. Three minimisation variables were used: surgeon, presence of osteoarthritis in the subtalar joint, and presence of osteoarthritis in the talonavicular joint (as determined by preoperative MRI). Patients were allocated 1:1 to the TAR and AF groups. Baseline measures were recorded at the preoperative assessment. All participating surgeons had performed at least 10 TARs and at least 10 AFs. In both TAR and AF, surgeons’ usual postoperative procedure was followed, including immobilisation type and weight-bearing status [21]. Follow-up visits were conducted at 12, 26 and 52 weeks. Further details on the trial are available in the trial protocol [21], statistical analysis plan [22] and results [20] papers.

Patients with osteoarthritis were involved in the design of the TARVA study and development of the participant information resources, and a patient representative attended Trial Steering Committee meetings. The public, patient organisations, and charities (Versus Arthritis) were actively involved throughout the trial via newsletters, annual meetings and their websites. Ethical approval was granted by the London Bloomsbury Research Ethics Committee (14/LO/0807), and clinical trial authorisation was given by the Medicines and Healthcare products Regulatory Agency. The trial sponsor was University College London, and the trial was registered with the Controlled Trials ISRCTN Registry (ISRCTN60672307).

2.2 Cost of Surgery and Health Care Resource Use

A micro-costing approach, which is often used in surgical RCTs [23], was used to calculate the cost of surgery. Detailed information on surgery was collected in a purposely designed Case Report Form (CRF), including information on surgery duration, operating surgeon and his/her grade, type of devices, duration of hospital stays and duration of immobilisation. The average surgery duration in the TAR group was 121 min (range 60–244 min) versus 103 min (range 45–240 min) in the AF group. The average duration of hospital stay was 2.5 days (range 0–12) in the TAR group and 2.1 days (range 0–17) in the AF group. Immobilisation was prescribed by means of walking boot and/or plaster. Nineteen percent of patients in the TAR arm were allowed to weight bear within 2 weeks, compared with only 5% of patients in the AF arm. The use of a walking boot was assumed for the entire duration of immobilisation. Participants were wearing a boot for an average of 8.9 weeks (range 0–46) in the TAR group and an average of 13.8 weeks (range 0–52) in the AF group. The plaster was assumed to be changed every 6 weeks. Participants were wearing plaster for an average of 3 weeks (range 0–52) in the TAR group and an average of 4.6 weeks (range 0–26) in the AF arm. The cost of each component was calculated using the unit costs (Online Resource Table S1). Over 50% of all TAR implants used were fixed-bearing.

To collect data on healthcare resource use, we adapted a Client Service Receipt Inventory (CSRI) questionnaire, which participants completed at baseline and 12, 26 and 52 weeks post-operation. The questions related to the use of General Practice (GP) and community NHS services, inpatient stays, outpatient visits, accident and emergency (A&E) visits and social service use. The questionnaire was developed in collaboration with patient representatives. The unit costs were obtained from NHS Reference costs [24] and Unit Costs of Health and Social Care (see Online Resource Table S2) [25,26,27]. During the trial, a concomitant medications log was kept that recorded dosage, frequency and duration of each prescription. The medications were costed using the British National Formulary [28].

Participants also responded to the questions related to out-of-pocket payments, including transportation costs incurred in the receipt of care, equipment, mobility aids and home adaptations. We collected information on payments for parking, taxis, and bus and train journeys. For private car journeys, we collected information on the miles per journey and used fuel costs to obtain the cost of a journey. Equipment, mobility aids and home adaptations were paid for fully or partially by the PSS, or fully out-of-pocket.

Participants were also asked about their employment status, time they had to take off work due to problems with their ankle, and family and friends’ time off work or usual activities due to care. We used the human capital approach to estimate the cost of lost productivity in the TAR group versus the AF group. Unit costs were the average gross hourly earnings for men/women and full-time/part-time employees. To value family and friends’ time off work, we used the same approach; however, as no information on carers was collected, we were not able to distinguish by gender or employee status (full-time vs. part-time), and we therefore used gross average earnings of all employees in the UK. We included the cost of replacing an employee, but these results should be interpreted with caution as the cost comes from a private study and has not been confirmed by any peer-reviewed publications. All unit costs related to out-of-pocket payments and other wider costs are reported in Online Resource Table S3.

All costs are reported in 2018/2019 Great British Pounds (GBP), and, where needed, the costs were adjusted for inflation using the NHS cost inflation index (NHSCII) [27].

2.3 Outcomes

The outcome of the economic evaluation was quality-adjusted life-years (QALYs), which were calculated using the EQ-5D-5L index values. The EQ-5D-5L questionnaire was completed by patients at baseline and 12, 26 and 52 weeks post-operation. The EQ-5D-5L Crosswalk Index Value Calculator, which maps EQ-5D-5L to the EQ-5D-3L value set and is recommended by NICE [29], was used to estimate the index values. QALYs were calculated as the area under the curve using the standard methodology [30].

The protocol [21] specified secondary within-trial analysis taking the shape of cost-effectiveness analysis using a patient-reported outcome measure, the Manchester-Oxford Foot Questionnaire (MOXFQ) [31] walking/standing domain score at 52 weeks post-surgery. However, as the main results paper did not report statistically significant differences in MOXFQ walking/standing domain scores, we deviated from the protocol and did not conduct a cost-effectiveness analysis.

2.4 Statistical Analysis

Statistical analysis was prespecified in a health economics analysis plan [32]. All analyses were based on ITT and corresponded to the analyses in the clinical-effectiveness paper [20]. We reported the mean cost and standard deviation (SD) of each component comprising the cost of surgery, and estimated the difference between the two groups using a linear regression model controlling for the presence of osteoarthritis in each of the two adjacent joints (subtalar and talonavicular) and adjusting for clustering at surgeon level. We then calculated the total cost of surgery and estimated the difference in total cost between the two groups using the same regression model as described above. We used bias-corrected bootstrapping (1000 iterations) to calculate 95% confidence intervals (CIs).

We report the mean and SD of the number of visits based on the type of healthcare resource use. We calculated the mean cost per patient for the TAR group versus the AF group by type of service at 12, 26 and 52 weeks post-operation. The difference in the number of visits and the difference in costs were estimated using the same regression model as described above for surgery cost components. One exception was the proportion of participants who reported inpatient admission, where we used a probit model.

We estimated mean index values at each time point for TAR versus AF, and the mean unadjusted QALYs from baseline to 52 weeks post-operation.

Total costs included costs of surgery, health care resource use, concomitant medications, and equipment, mobility aids and home adaptations paid for by the PSS. The difference in each cost component was analysed using a linear regression model controlling for the presence of osteoarthritis in each of the two adjacent joints and adjusting for clustering at the surgeon level. The difference in total costs was analysed using a similar model but, in addition, controlling for baseline healthcare resource use costs. We used bias-corrected bootstrapping (1000 iterations) to calculate 95% CIs.

The mean per-participant differences in costs and QALYs were jointly estimated via bootstrapped seemingly unrelated regression with 1000 iterations to account for the correlation between costs and QALYs. We controlled for baseline healthcare resource use costs, baseline index values and minimisation variables, and adjusted for clustering at surgeon level. The incremental cost-effectiveness ratio (ICER) was calculated as the mean estimated difference in costs divided by the mean estimated difference in QALYs. We plotted the mean ICER and bootstrapping results on a cost-effectiveness plane (CEP). The bootstrapping results were used to calculate the cost-effectiveness acceptability curve (CEAC) [30], which shows the probability that TAR is cost effective compared with AF at 52 weeks for a range of cost-effectiveness thresholds for an additional QALY. As the time horizon was 52 weeks, costs and QALYs were not discounted. The analysis was complete case analysis, as <15% of participants were missing an ICER.

2.5 Secondary and Sensitivity Analyses

We conducted a post hoc subgroup analysis of total costs based on the type of TAR implant used: fixed-bearing versus mobile, as differences were noted in the clinical-effectiveness analysis [20]. Secondary within-trial analysis was conducted from the wider perspective, including out-of-pocket costs and the cost of productivity loss described in Sect. 2.2. Additional analysis also included the cost of replacing an employee, but as mentioned above, these results should be interpreted with caution. The wider perspective has not been applied in most studies that compared TAR and AF, but out-of-pocket costs are relevant to the condition and the intervention. As end-stage osteoarthritis can lead to disability, home adaptations and equipment are necessary for the patients and would lead to a reduction in working hours or retirement due to the condition.

Some patients received a different procedure from that for which they were randomised to. Therefore, we conducted a sensitivity analysis in the form of a per-protocol analysis, where we only included participants who received the surgery to which they were randomised.

2.6 Model-Based Analysis

We built a decision model to extrapolate the trial results for the patients’ lifetime horizon, from the NHS and PSS perspective. We also constructed a simple Markov model using Microsoft Excel™ (Microsoft Corporation, Redmond, WA, USA), which simulates patients’ pathways after TAR or AF. The structure of the model is shown in Fig. 1.

Fig. 1
figure 1

Model structure

The model was based on 1-year cycles, and there were 17 cycles in the model as the average life expectancy for a cohort aged 50–85 years was 17 years [33]. After surgery, patients can stay in good health, move to the revision state, or die. A patient can be in a revision state for 1 year only and then they move to ‘good health after revision’ or the death state. Transition probabilities are reported in Online Resource Tables S4 and S5.

Revision rates are based on the clinician’s opinion. The revision rate for TAR was assumed to be 1.2%, while the revision rate for AF was assumed to be 5% in the first 3 years (see Online Resource Table S5) and 0% thereafter. Revision is assumed to be AF in both the TAR and AF groups as this is more common in the UK. The death rate is based on the Public Health England Life Expectancy Calculator [34]. Each health state in the model was assigned a cost and a QALY outcome (as reported in Online Resource Table S6).

The costs assigned to the ‘good health’ and ‘good health after revision’ states are based on baseline resource use costs from the trial, whereas index values are based on the 52-week post-operation index values from the trial. We applied decrements to the index values due to revision based on the decrements reported by SooHoo and Kominski [17] as this was the only available source for these data. We assume that the revisions are successful and patients stay with the same quality of life for the remainder of their life. We discount costs and QALYs at the rate of 3.5%, as recommended by NICE [35].

We calculated an ICER at the lifetime horizon based on the model. Monte Carlo simulation was performed to address parameter uncertainty. We used gamma distribution for costs, beta distribution for index values, and Dirichlet distribution for transition probabilities. The distribution parameters are specified in Online Resource Table S7. Using the results from the simulation, we plotted the CEP and the CEAC.

3 Results

In total, 152 patients were randomised to TAR and 151 patients were randomised to AF. There were four crossovers from AF to TAR, which were analysed according to the ITT principle. As this was a complete case analysis, we only included patients with complete cost and QALY data. Twenty-one patients withdrew from the trial before surgery, one patient withdrew before 26 weeks, and five patients withdrew at 52 weeks. Eighteen (6%) of 303 randomised participants did not complete a CSRI on one of the visits, and one of these patients also had missing EQ-5D questionnaire data at 26 weeks and 52 weeks. After excluding participants who had withdrawn and who had missing data, we obtained the analytic sample that included 129 participants in the TAR group and 135 in the AF group.

The post hoc analysis was performed using a per-protocol approach, as the two groups were defined based on the surgery they received, not what they were randomised to. Seventy-two patients had a fixed-bearing implant, 61 patients had a mobile implant, and 131 patients had AF.

The analysis was not powered for the cost-effectiveness outcomes, and therefore the statistical significance tests need to be interpreted with caution. Participants were balanced between the two groups regarding their baseline characteristics. The mean age was 68.0 years (SD 8.1) in the TAR group and 67.7 years (SD 8.0) in the AF group. There were 30 (23%) female participants in the TAR group and 44 (32%) female participants in the AF group. There was a higher proportion of patients with diabetes (11% vs. 7%) and obesity (10% vs. 6%) in the AF group, but more deformity in the TAR group. More details on the baseline characteristics are reported in the clinical effectiveness paper [20]. We believe that the sample is representative of the general population of patients with end-stage osteoarthritis; patient characteristics are in line with other similar studies. On average, the age of patients is typically over 60 years, there are usually more men than women, and patients tend to have a relatively high body mass index of 30 \(\mathrm{kg}/{\mathrm{m}}^{2}\) [14, 16].

3.1 Costs

When we compared the cost of surgery, TAR was more expensive compared with AF by £2175 (95% CI £806, £2932) and was statistically significant. The cost of each component is reported in Table 1. TAR was more expensive due to the use of more expensive devices and longer surgery duration. The mean total cost of TAR with a fixed-bearing implant was £9048 (SD £2253), and the mean cost of TAR with a mobile implant was £9817 (SD £3822).

Table 1 Cost of surgery

There were some differences in baseline resource use between TAR and AF. Participants in the TAR group used less inpatient, outpatient and social care but more community care services (Table 2). These differences are most likely due to changes in the presence of randomisation, and are not statistically significant; however, it is important to account for the differences in baseline values in the analysis. Post-operation resource use was higher in the TAR group compared with the AF group except community care use at 52 weeks, when it was higher in the AF group. None of the differences were statistically significant.

Table 2 Cost of health care resource use

We assumed that all patients who were employed part-time or full-time at baseline had to take 6 weeks leave due to surgery. If we assume this is unpaid leave, lost earnings were £2807 in the TAR group and £2947 in the AF group (Table 3). Eighty-five participants were employed—40 in the TAR group and 45 in the AF group. Some participants reported reducing their working hours due to the problems with their ankle. This resulted in an average loss of earnings over 52 weeks of £683 in the TAR group and £1034 in the AF group. This difference was not statistically significant (p = 0.27). Two patients in the TAR group and three patients in the AF group had to retire due to their ankle problem.

Table 3 Out-of-pocket costs

The out-of-pocket spending on equipment, aids and adaptations, as well as transportation costs, were similar in both groups (see Table 3). Fifty-seven percent of participants in the AF group were using help from their family or friends compared with 43% in the TAR group at baseline, which reduced to 15% in the AF group and 14% in the TAR group 52 weeks post-operation. The average hours of informal care per month were 24 h (SD 29) in the AF group and 16 h (SD 21) in the TAR group at baseline, which reduced to 13 h (SD 14) in the AF group and 10 h (SD 9) in the TAR group at 52 weeks. On average, the cost of informal care was estimated at £1716 (SD £3340) in the TAR group compared with £3708 (SD £6403) in the AF group for the entire 52-week period. This difference was − £2102 (95% CI − £3245, − £695) and was statistically significant (p < 0.01).

The total cost per patient in the TAR group, from the NHS and PSS perspective, was £2544 higher than in the AF group (Table 4), a statistically significant result (p < 0.01). The main driver for the cost difference was the cost of surgery. Other differences in cost components were not statistically significant.

Table 4 Total cost per patient (within-trial analysis)

3.2 Outcomes

Average baseline EQ-5D-5L index values were virtually the same in both groups and were equal to 0.5 (SD 0.2), which is lower than expected; however, when having a closer look at the EQ-5D profiles, we noticed that the vast majority of participants reported moderate to severe problems with mobility, slight to severe problems with usual activities, and moderate to severe pain. Considering the age of participants in the sample, participants are likely to have comorbidities that would also affect their health-related quality of life. Four participants in the TAR group and four participants in the AF group had negative index values that are worse than death, which could be due to the limitations of the Crosswalk Index Calculator. However, based on the participants’ descriptive profiles, the low scores are reasonable as all of them report severe problems with mobility, extreme pain, and being unable to perform usual activities. Three of those participants reported having extreme anxiety/depression.

At 26 weeks, the average EQ-5D-5L index values were 0.71 (SD 0.20) in the TAR group and 0.67 (0.22) in the AF group, and at 52 weeks, the average EQ-5D-5L index values were 0.74 (SD 0.21) in the TAR group and 0.71 (SD 0.22) in the AF group.

On average, patients in the TAR and AF groups had 0.68 (SD 0.15) and 0.65 (SD 0.17) QALYs at 52 weeks. The difference between the groups was estimated to be 0.019 (95% CI − 0.015, 0.05) and was not statistically significant (Table 5).

Table 5 Results of within-trial analysis

3.3 Within-Trial Analysis

Over the 52-week period, the mean incremental cost per QALY gained was £131,999, from the NHS and PSS perspective. Using the bootstrapping technique, we generated empirical distribution of ICERs and presented them on the CEP (Fig. 2).

Fig. 2
figure 2

Cost-effectiveness plane (within-trial analysis). QALYs quality-adjusted life-years

The CEP shows that TAR is more expensive than AF; however, there is a relatively high degree of uncertainty in the data as incremental QALYs vary from − 0.04 to 0.08. Most bootstrap ICERs are shown in the top-right quadrant of Fig. 2. TAR is almost certainly more expensive than AF but may generate more QALYs.

We also used the bootstrapping results to estimate the probability of TAR being cost effective, compared with AF, at various cost-effectiveness thresholds. The probability was low and was equal to 0.5% at a threshold of £20,000 per QALY gained, and 1.7% at a threshold of £30,000 per QALY gained. This probability increases and reaches 37.6% at a threshold of £100,000 per QALY gained (Fig. 3).

Fig. 3
figure 3

Cost-effectiveness acceptability curve (within-trial analysis). QALY quality-adjusted life-year

3.4 Secondary and Sensitivity Analyses

Over 52 weeks, the mean incremental cost per QALY gained was £7533 from the wider perspective. This ICER is considerably lower than the results of the primary analysis; however, costs from the wider perspective introduce considerable uncertainty as they require a lot of assumptions. The probability of TAR being cost effective was 59% at a threshold of £20,000 per QALY gained, and 64% at a threshold of £30,000 per QALY gained.

Post hoc analysis suggests that the total cost of fixed-bearing TAR is higher by £1537 compared with AF (Table 5). Nonetheless, there is some evidence that fixed TAR generates more QALYs than AF (incremental difference 0.031), however this was not statistically significant (p = 0.222). If we consider the wider perspective, fixed-bearing TAR has a potential to be cost effective as the total cost is £1289 lower in the fixed TAR group compared with the mobile group (Table 5); however, the difference is not statistically significant. We did not report the ICER for this comparison as, based on the results, fixed-bearing TAR dominates AF and we did not require a decision rule.

The per-protocol analysis resulted in very minor differences in total costs and QALYs; therefore, the results are not reported here but are available upon request.

3.5 Model-Based Analysis

Model-based analysis suggested that TAR is more expensive but generates more QALYs, compared with AF, when extrapolated to a lifetime horizon. The ICER was estimated to be £4200 per QALY gained. Cost and QALY differences are presented in Table 6.

Table 6 Model-based costs and QALYs per group

The result of Monte Carlo simulation (n = 5000) is in the cost-effectiveness plane shown in Fig. 4.

Fig. 4
figure 4

Cost-effectiveness plane (model-based analysis). QALYs quality-adjusted life-years

The mean ICER is shown in the north-east quadrant of Fig. 4, meaning that TAR is more expensive and also generated more QALYs. When we varied the cost and quality-of-life parameters in the model, we observed that most points still lay in the north quadrant; therefore, TAR is most certainly more expensive than AF over a lifetime horizon. However, there is uncertainty in the number of QALYs attained as some ICERs are shown in the north-west quadrant, implying that TAR may generate fewer QALYs and be more expensive. Hence, there is considerable uncertainty around the lifetime ICER, and more reliable, longer-term data are required to obtain a more robust result.

Even though there is uncertainty over the lifetime horizon, there was a 69% probability that TAR is cost effective under a cost-effectiveness threshold of £20,000 per QALY gained (Fig. 5).

Fig. 5
figure 5

Cost-effectiveness acceptability curve (model-based analysis). QALY quality-adjusted life-year

We obtained some evidence that fixed-bearing TAR performed better than mobile TAR and AF. If we assume that all patients receive fixed-bearing TAR and assign their quality of life to all TAR patients, the difference in QALYs between the two groups will increase (Table 6).

TAR is still more expensive and generates more QALYs compared with AF, with an ICER of £2535 per QALY gained. When we conducted Monte Carlo simulations for this result, we found that the probability of TAR being cost effective was 72% under the cost-effectiveness threshold of £20,000 per QALY gained.

4 Discussion

This is the first completed multicentre RCT to compare TAR against AF for patients with end-stage ankle osteoarthritis using an economic evaluation. TAR generated more QALYs than AF at 52 weeks (adjusted difference 0.02, 95% CI −  0.015, 0.05). The total cost of TAR, from the NHS and PSS perspective, adjusted for baseline values and minimisation factors, was £2544 higher than the total cost of AF (95% CI £1074, £3879). The ICER was £131,999 per QALY gained at 52 weeks. The results suggest that it is important to account for wider costs when comparing TAR and AF, as this has a large impact on the ICER. However, out-of-pocket and lost productivity costs also introduce a high degree of uncertainty. When considering implants and definitive surgery, 52-week data need to be interpreted with caution since the benefits begin after the 52-week window, and hence the more important analysis relates to longer-term modelling.

Model-based analysis suggested that TAR is more expensive than AF but generates more QALYs when extrapolated to a lifetime horizon. Over a lifetime horizon, the ICER was £4200 and there was a 69% probability that TAR was cost effective under the cost-effectiveness threshold of £20,000 per QALY gained. When comparing fixed-bearing TAR against AF over a lifetime horizon, the ICER was £2535 and there was a 72% probability of TAR being cost effective under the cost-effectiveness threshold of £20,000 per QALY gained.

There are sparse data on the health-related quality of life of patients after ankle surgery. Index values in patients after TAR and AF have been previously estimated in a prospective, non-randomised cohort of TAR and AF patients using the Short Form-6 Dimension (SF-6D) instrument [36]. Their baseline index values were higher for both TAR (0.67, 95% CI 0.64, 0.69) and AF (0.66, 95% CI 0.63, 0.68). At 52 weeks, their index values (0.73, 95% CI 0.71, 0.76 in TAR, and 0.73, 95% CI 0.70, 0.76 in AF) were comparable with TARVA (0.74, 95% CI 0.70, 0.77 in TAR, and 0.71, 95% CI 0.67, 0.74 in AF). Compared with other orthopaedic surgeries, hip replacement at 5-year follow-up has been shown to have an 85% probability of being cost effective under the cost-effectiveness threshold of £20,000 per QALY gained, but the comparison was with no surgery [37].

Orthopaedic surgery is expected to last for much longer than 1 year, and hence extrapolating the results further than 1 year is common in the orthopaedic literature [37,38,39]. Three models explored the cost effectiveness of TAR compared with AF [6, 17, 18]. The study by SooHoo and Kominski implemented a simple decision tree model using data from published studies assuming a patient population with end-stage osteoarthritis at the age of 55 years and a lifetime time horizon [17]. The analysis suggested that TAR had the potential to be cost effective if the implant survived more than 7 years, but these data were obtained when TAR surgery was in its infancy. The study by Courville et al. showed the cost effectiveness of TAR compared with AF at a lifetime horizon in a similar hypothetical cohort of 60-year-old patients with end-stage ankle osteoarthritis [6]. The authors built a Markov model similar to ours and also accounted for the possibility of below-knee amputation, allowing for either revision TAR or conversion to AF and a possibility of developing additional areas of arthritis. Most data were obtained from published sources, and some parameters, such as success of revision TAR and conversion to AF, were estimated. Nwachukwu et al. built a Markov model using the patients’ lifetime horizon and data from the US administrative database for costs, and published literature for other model parameters including QALYs [18]. Unlike the other two studies, the authors considered three treatment options: TAR, AF, and non-operative treatment. The age of the patients in the hypothetical cohort was 55 years, but unlike in our study, there was no focus on end-stage ankle osteoarthritis as non-operative treatment was considered. The authors also accounted for indirect costs, such as the cost of productivity loss due to the index procedure, revision procedure, and due to ankle osteoarthritis overall. TAR had the potential of being cost saving compared with AF when including both direct and indirect costs. The most recent UK study by Place et al. assessed the cost effectiveness of primary TAR compared with doing nothing based on a 51-patient cohort [40]. The data were prospectively collected, including EQ-5D-3L and MOXFQ. The average preoperative index value was 0.41, which was comparable with 0.50 in the TAR group in our study. The average MOXFQ walking/standing score (0–100, lower scores are better) was 75.76 pre-operation and 53.27 6 months post-operation; in our sample, these scores were 81.6 and 35.8. The age of patients was similar to TARVA patients, with an average of 67.7 years. Follow-up was 6 months post-operation; the cost effectiveness was assessed at the patients’ lifetime, assuming that health-related quality of life was constant until death, but discounting at the 3.5% annual discount rate recommended by NICE [35]. The cost of the surgery was assessed using the national tariff for a very major non-trauma foot procedure with a complication and comorbidity score of 0–3. The study concluded that TAR is cost effective compared with doing nothing, with an ICER of £4466 per QALY gained [40]. Consistently, researchers highlight the lack of high-quality data to provide robust estimates of the cost effectiveness of TAR.

Using EQ-5D-5L at baseline and 26 and 52 weeks, our study provided the index values that allow us to estimate QALYs. Using microcosting approach and a purposely designed CSRI questionnaire, we obtained detailed data on the cost of surgery and cost of healthcare resource use from healthcare and PSS and wider perspective.

The limitations of this study relate to the short-term follow-up; a simple decision model was used to extrapolate the results to the patients’ lifetime horizon, not including such health states as developing adjacent joint arthritis and below-knee amputation, which is a rarely but costly complication, and uncertainty in key parameters such as revision rate in both the TAR and AF groups. More than one revision is possible but only one was assumed in this model. However, as there is a long-term follow-up of TARVA participants, we plan to build a more sophisticated model when these data are available.

The results from the wider perspective are approximate as this involves a number of assumptions. Furthermore, the method we used to estimate productivity loss, i.e. the human capital approach, has limitations. This approach estimates potential lost production, not a real-world loss [41], and assumes ‘full employment’, which implies that there is no involuntary unemployment and the replacement worker will be someone who would be employed anyway. In practice, a replacement worker would be hired from a pool of unemployed workers, and it is argued that there is no impact on production beyond the friction period of replacement.

5 Conclusion

This study reports important new evidence on the surgical approaches used to treat end-stage ankle osteoarthritis. Based on the within-trial analysis, at 52 weeks TAR is not cost effective compared with AF; however, 52 weeks is not long enough to capture the important costs and clinical benefits of surgery. Long-term economic modelling suggests that TAR has the potential to be cost effective over the patients’ lifetime. A more detailed decision model based on longer-term TARVA data and other published sources is required to consider a more detailed patient pathway to provide a more robust estimate of the cost effectiveness of TAR.