Introduction

Thyroid malignancy is detected in approximately one in four cytological indeterminate thyroid nodules, including cytology with atypia of undetermined significance or follicular lesions of undetermined significance (Bethesda III, AUS/FLUS) and cytology (suspicious for a) follicular neoplasm (Bethesda IV, FN/SFN) or (suspicious for a) Hürthle cell neoplasm (Bethesda IV, HCN/SHCN) [1]. Current guidelines recommend repeat fine needle aspiration cytology (FNAC) in Bethesda III nodules, and consideration of clinical features, ultrasound characteristics and patient preference in both Bethesda III and IV nodules, before deciding to proceed with either active surveillance or diagnostic surgery [1,2,3,4]. In the Netherlands, from 2017 to 2019, approximately 1300 Bethesda III and 650 Bethesda IV cytology results were reported per year. Many of these patients underwent diagnostic surgery [5]. Better preoperative differentiation could avoid futile diagnostic surgeries for benign nodules of indeterminate cytology, including the associated costs, risks of surgical complications, lifelong thyroid hormone substitution in patients with subsequent hypothyroidism, and possible negative influence on the patients’ health-related quality of life (HRQoL) [6,7,8]. However, none of the plethora of available additional diagnostics are currently part of the standard diagnostic workup following national or international guidelines [4, 9,10,11,12].

Our recent randomised controlled multicentre trial confirmed the results of our previous meta-analysis and demonstrated that implementation of 2-[18F]fluoro-2-deoxy-D-glucose positron emission tomography/computed tomography ([18F]FDG-PET/CT) in the preoperative workup accurately ruled out malignancy and prevented 40% of the futile diagnostic surgeries for benign nodules [7]. If the application of [18F]FDG-PET/CT is limited to nodules with non-Hürthle cell cytology (AUS/FLUS and FN/SFN), a 48% reduction can be established, optimizing therapeutic yield and limiting the unbeneficial use of valuable resources [13].

Prior to the implementation of any new test or procedure, it is crucial to evaluate cost-utility. We previously reported a model-based cost-utility analysis of [18F]FDG-PET/CT in a European setting, which demonstrated that [18F]FDG-PET/CT could be cost-effective as compared to management with diagnostic surgery or molecular testing over a 5-year period [7, 14]. To the best of our knowledge, no cost-utility analysis was performed alongside a clinical trial to date, even though such a design would offer a high level of evidence and a most accurate reflection of real-world clinical practice. Here, we present the results of the cost-utility analysis derived from our randomised controlled multicentre trial. In this analysis, we compared the lifelong societal costs and quality adjusted life years (QALYs) of an [18F]FDG-PET/CT-driven workup to the costs and QALYs of diagnostic surgery in patients with indeterminate thyroid nodules. The observed and prospectively collected 1-year trial outcomes were extrapolated using a Markov model.

Material and methods

Trial design, patients and treatment

The Efficacy of [18F]FDG PET in Evaluation of Cytological indeterminate Thyroid nodules prior to Surgery (EfFECTS) trial was a prospective, triple-blinded, randomised controlled multicentre trial performed in 15 hospitals in the Netherlands (ClinicalTrials.gov: NCT02208544). The trial, including the current study, was approved by the Medical Research Ethics Committee on Research Involving Human Subjects region Arnhem-Nijmegen, Nijmegen, the Netherlands. Written informed consent was obtained from each of the participants prior to any study activity. Comprehensive descriptions regarding patient eligibility, selection, randomisation, blinding, [18F]FDG-PET/CT procedures, and sample size calculation are reported in our previous work [13]. In summary, patients with a Bethesda III or IV thyroid nodule (confirmed on central review; Bethesda III on repeat FNAC) and scheduled diagnostic surgery were eligible for inclusion (Table 1). There was one index nodule per patient. Patients were randomly assigned to an [18F]FDG-PET/CT-driven group or diagnostic surgery group in a 2:1 ratio (Fig. 1). Randomisation was stratified for patient sex, age, thyroid nodule size, Bethesda classification (III or IV), and inclusion site. A partial-body [18F]FDG-PET/CT of the neck was acquired in all patients, and centrally assessed by two experienced nuclear medicine physicians for any focal [18F]FDG-uptake in the thyroid that was visually higher than the background uptake of the surrounding thyroid tissue and that corresponded to the index nodule in size and location. Patient allocation and the result of the [18F]FDG-PET/CT scan were not disclosed to the patient nor his/her local physician. Subsequently, the recommended patient management in the [18F]FDG-PET/CT-driven group was based on the result of the scan. When the index nodule was [18F]FDG-positive, patients were advised to proceed to the scheduled diagnostic surgery. When the index nodule was [18F]FDG-negative, active surveillance was recommended, with at least a follow-up ultrasound after one year. Any additional follow-up visits during the trial were permitted at the discretion of the local physician. In the diagnostic surgery group, all patients were advised to proceed to the scheduled surgery, in accordance with current (inter)national guidelines [4, 12]. In all patients in both groups, postoperative management was based on the local histopathological diagnosis and adhered to the Dutch national guidelines [12]. The current study adhered to this local histopathological diagnosis as a reference standard, as this diagnosis likely best reflects the patient’s illness perception and estimated costs. Consequently, minor differences exist between the current study and the trial’s main report, for which all histopathology was centrally reviewed [13]. Index nodules diagnosed as non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) or follicular tumour of uncertain malignant potential (FT-UMP) are considered borderline tumours: they were postoperatively treated as benign nodules, but diagnostic surgery for these potentially premalignant nodules is considered justified [15, 16]. The study-related follow-up for all patients was 1 year.

Table 1 Baseline characteristics of the patients enrolled in the trial
Fig. 1
figure 1

Flowchart of the first year, visualizing the study procedures, observed treatment and treatment outcomes, and health state at the end of the first year of all patients who participated in the EfFECTS trial. cTT, completing total thyroidectomy. Fu, follow-up. HT, hemithyroidectomy (including isthmus resection (n = 3) and hemithyroidectomy plus nodulectomy (n = 2). POHT, postoperative levothyroxine-dependent hypothyroidism after partial thyroidectomy procedure. PSC, permanent surgical complication, including recurrent nerve paralysis and permanent hypoparathyroidism. RAI, radioiodine ablative therapy. TSC, transient surgical complication, including haematoma with re-exploration surgery, wound infection, seroma, and transient hypoparathyroidism. TT, total thyroidectomy. a: One patient underwent RAI after initial, uncomplicated TT for malignancy; two patients underwent cTT for malignancy but no RAI

First year costs and utilities

Real-world volumes of thyroid nodule-related health care consumption for 1 year, counted from the date of the [18F]FDG-PET/CT scan (defined as baseline), were extracted from individual medical records for each patient. The extracted data included all thyroid surgery and associated days of hospitalization, additional procedures and days of hospitalization following surgical complications, outpatient clinic visits and diagnostics that were related to the diagnosis and treatment of the indeterminate thyroid nodule, additional diagnostic procedures and consultations with other physicians related to [18F]FDG-PET/CT incidental findings, and use of thyroid-related medication. Volumes concerning non-thyroid-related health care consumption, productivity losses and HRQoL during the first year were patient-reported at baseline, 3, 6 and 12 months, using the iMTA Medical Consumption Questionnaire (iMCQ), the iMTA Productivity Costs Questionnaire (iPCQ) and the EuroQol 5-dimension 5-level (EQ-5D-5L) questionnaire, respectively (Fig. 1) [17,18,19]. Questions on health care and productivity covered a fixed recall period by design of each questionnaire, varying from one to 3 months; intermediate periods were individually interpolated from the closest available questionnaire. Utilities were calculated from the EQ-5D-5L domain scores using the Dutch tariff [20]. These utilities represent the valuation of quality of life on a scale from 0 (worst possible health, similar to death) to 1 (perfect health). Quality adjusted life years (QALYs) for the first year were estimated as the area under the utility curve [20, 21].

The estimated cost of one partial-body [18F]FDG-PET/CT scan was €754 [22, 23]. Other health care costs were valued using reference prices or the 2019 reimbursement rates of the Dutch System of Diagnosis-Treatment Combinations, where appropriate and available [23]. Costs for complications of thyroid surgery (i.e., prolonged hospitalization, re-admission, and/or additional surgical procedures) were estimated using complication rates reported in literature and procedural Dutch reimbursement rates [22]. Costs of productivity losses were valued using the friction cost method and reference prices for productivity [23]. Travel expenses were included at €0.19 per kilometre [23]. We estimated all costs from a Dutch societal perspective in Euro. All prices were indexed to 1 December 2019 using the Dutch consumer price index [24].

The total societal costs per patient were estimated as the sum of medical costs for all thyroid nodule-related and other health care consumption, patient costs (i.e., travel expenses and informal care), and costs from productivity losses. All costs related to the [18F]FDG-PET/CT, including procedure costs, costs for additional healthcare consumption for incidental [18F]FDG-PET/CT findings, pertinent travel expenses, and other reported patient costs were only taken into account for the patients in the [18F]FDG-PET/CT-driven group.

Multiple imputation was applied to account for possibly selectively missing questionnaire data, using age, sex, allocation, EQ-5D-5L utility scores and time-dependent variables for thyroid surgery and benign or malignant histopathological diagnosis as predictor variables. One hundred imputed datasets were created for the 1-year data.

Modelled lifelong costs and utilities

To estimate lifelong costs and utilities, a Markov model with 12 health states and a 1-year cycle length was constructed using Stata (version 14.2. StataCorp, College Station, TX, USA).

Model structure

The model represented health states that may occur from the second year onwards for either an [18F]FDG-PET/CT-driven workup or diagnostic surgery (Fig. 2). These health states included active surveillance (i.e., follow-up of the thyroid nodule with yearly ultrasound), end of follow-up (i.e., patients discharged from active surveillance without thyroid surgery), observation after thyroid surgery (i.e., hemithyroidectomy [HT], total thyroidectomy [TT], completion TT [cTT], and/or radioactive iodine [RAI] ablation), medication-dependent hypothyroidism following HT, permanent complications due to HT or (c)TT, recurrent (including persistent) malignant disease after HT or (c)TT and/or RAI, or death. Health states following HT or TT may apply to patients with either benign or malignant disease. The “cTT + RAI” procedure and recurrent disease states (grey-shaded shapes in Fig. 2) only apply to patients with malignant disease.

Fig. 2
figure 2

Markov tree visualizing the health states (ovals), possible transitions between health states after each 1-year cycle length (arrows), treatments (boxes), and decisions (diamonds) that patients may encounter in the Markov model. Patients enter the model in their actual health state at the end of the first year. Grey-shaded fields and corresponding transitions only apply to patients with malignancy; all white fields and corresponding transitions apply to patients with either benign or malignant lesions, although different (transition) probabilities, costs, and utilities may apply as presented in Tables 2 and 3. cTT, completing total thyroidectomy. HT, hemithyroidectomy. RAI, radioiodine ablative therapy. TT, total thyroidectomy

Model parameters

Values for the (time-dependent) probabilities in the Markov model were collected from a comprehensive Medline literature search, from Statistics Netherlands, and/or from the EfFECTS trial and adhered to the Dutch national guidelines (Table 2) [4, 12, 13, 25]. Parameters for which no information was found or that varied highly among literature were estimated by a local expert panel, including an endocrinologist, a nuclear medicine physician, and a health economist. For patients undergoing active surveillance for an [18F]FDG-negative nodule, a mean follow-up of 3 years was assumed.

Table 2 Transition probabilities for the Markov Model, including uncertainty

The negative predictive value (NPV) of [18F]FDG-PET/CT was 95.1% in the EfFECTS trial [13]. To prevent overestimation of the accuracy of an [18F]FDG-PET/CT-driven workup, we used this NPV to assume a 0.049 (= 1–0.951) probability of missed malignancies in unoperated patients (i.e., a false-negative [18F]FDG-PET/CT), even though none were reported in the EfFECTS trial and its extended follow-up [13]. We assumed that any missed malignancies would be detected within the first 5 years of follow-up, and could occur among patients residing in the “active surveillance” or “end of follow-up” state (Table 2).

Cost parameters

Costs for thyroid-related procedures and costs for each cycle in a particular health state were derived from reference prices, 2019 reimbursement rates, and previous cost-utility studies, where appropriate and available, and adhered to the national guidelines (Table 3) [14, 22, 23, 26,27,28,29,30]. For the “active surveillance” state, we assumed one yearly visit to the endocrinologist and an ultrasound of the neck every 12–24 months.

Table 3 Costs and utilities for the Markov Model, including uncertainty

Productivity losses for thyroid-related procedures were inferred from the reported iPCQ data over the first year of the EfFECTS trial or from literature, where appropriate. Yearly costs for other non-thyroid-related health care consumption, informal care, and other productivity losses were estimated from the reported first-year cost-questionnaire data in our study, using restricted linear regression analysis with age, sex, and QALYs as predictors (restricting coefficients to predict non-negative costs) (Supplementary Table 1). Travel expenses were estimated from the number of hospital visits for each procedure or health state, and the patient-reported travel distance.

Utility parameters

Utilities were calculated, starting from age and sex-dependent general utilities [20], by subtracting disutilities for specific health states (Table 3). These disutilities were derived from literature or elicited from the previously mentioned expert panel based on a time-trade-off weighting. QALYs were calculated by the discounted sum of utilities over the lifelong evaluation period.

Other parameters

A 4% and 1.5% discount rate were applied to all future costs and utilities, respectively [23]. In addition to the base-case values, distributions were specified to account for the uncertainty in the parameters. These were either triangular parameter distributions (on a specified range, with mode equal to the base-case value) or normal distribution (with specified SD and mean equal to the base-case value).

Lifelong extrapolation

With each of the 100 imputed 1-year datasets, 10 sets of model parameter values were drawn at random from the specified parameter distributions. Then, for each of the 1000 parameter sets and starting from each patient’s health state at the end of the first year, the Markov model was used to simulate 1000 extrapolated patient histories. For each parameter set, the average over the extrapolated costs and QALYs was added to the 1-year costs and QALYs, as an estimate of the patients’ expected lifelong outcomes.

Statistical analysis

Baseline characteristics were compared between the allocated groups using Pearson’s chi-squared or Fisher’s exact tests for categorical data, and independent samples t-tests or Mann–Whitney U tests for continuous data, where appropriate. Univariate comparisons of the 1-year costs and QALYs were performed using independent unequal-variances t-tests, aggregating the 100 multiple imputation sets using Rubin’s rules (accounting for sampling and imputation uncertainty). Similarly, lifelong costs and QALYs were compared by aggregating the 1000 parameter sets using Rubin’s rules (accounting for sampling, imputation and parameter uncertainty) [31]. Unadjusted (univariate) results are presented in the Supplementary data.

In the analyses presented here, we adjusted for the trial’s stratifying variables using a generalized linear model with robust estimator for observed heteroscedastic data [13, 32, 33].

Minor imbalances in baseline characteristics and malignancy rates were observed across the allocated groups despite stratified randomisation (Table 1). To avoid an impact of these imbalances on costs and utilities over the lifelong period, we also adjusted for these covariates: the local benign/borderline or malignant histopathological diagnosis, EQ-5D-5L utility score at baseline, medical history (binary, represented by the periodic use of non-thyroid medication), and productivity at baseline (represented by the patient-reported contractual work hours per week. Unadjusted results are presented in Supplementary Tables 3, 4, 5 and Supplementary Fig. 1. Results are presented as means and their 95% confidence intervals (CI), mean difference and 95% CI, and p values, where appropriate. All analyses adhered to the intention-to-treat principle. A p value ≤ 0.05 is considered statistically significant. Data analysis was performed using SPSS Statistics version 26 (IBM Corp., Armonk, NY, USA).

Cost-utility analysis

Cost-effectiveness acceptability curves (CEACs) were used to graph the probability that an [18F]FDG-PET/CT-driven workup is cost effective compared to diagnostic surgery, as a function of willingness to pay (WTP) for a QALY. In the Netherlands, a willingness-to-pay threshold of €50,000 per QALY is recommended by the Dutch Council for Public Health and Health Care for conditions with an intermediate disease burden [34]. The probability of cost-effectiveness was calculated as the one-sided p value for the difference in net benefit (net benefit = WTP × QALYs − costs). The statistical analysis of the net benefit was identical to the analysis for costs and QALYs separately.

To explore the impact of individual parameters in the Markov model, univariate sensitivity analyses were performed and presented in a tornado diagram. Individual parameters were set at extreme values (Table 4), while keeping the other parameters at their base-case value and for each trial patient simulating 10,000 extrapolated patient histories beyond 1 year.

Table 4 Probabilities, costs, and utilities for univariate sensitivity analyses

Results

Between July 2015 and October 2018, 132 adult patients with a Bethesda III or IV thyroid nodule were enrolled in the EfFECTS trial (Table 1). All patients completed all study-related procedures and 1-year follow-up. Diagnostic surgery was avoided for 25 of 91 (27%) patients in the [18F]FDG-PET/CT-driven group, as compared to 1 of 41 (2%) in the diagnostic surgery group (p = 0.001) [13]. The unoperated index nodules remained unchanged in size and unsuspicious on ultrasound surveillance and were considered benign after 1 year. During study follow-up, 106 (80%) patients underwent diagnostic surgery: 29 (22%) nodules were malignant, 8 (6%) were borderline tumours, and 69 (52%) were benign. For the [18F]FDG-PET/CT-driven group, this resulted in avoided futile diagnostic surgery for 25 of 62 (40%) benign nodules.

First year utilities and costs

EQ-5D-5L, iMCQ and iPCQ questionnaires were fully completed at baseline, 3, 6, and 12 months by 121 (91.7%), 114 (86.4%), 107 (81.1%) and 106 (80.3%) of 132 patients, respectively, which were equally distributed across both randomisation groups. According to the EQ-5D-5L, the valuation of quality of life was similar in the [18F]FDG-PET/CT-driven and diagnostic surgery groups at all four measurements (Table 5). QALYs estimated from the EQ-5D-5L for the first year were similar in both groups (p = 0.57).

Table 5 Estimated utilities and quality adjusted life years (QALYs) per patient

The medical costs related to the index thyroid nodule were primarily determined by all regular healthcare consumption: a diagnostic workup, outpatient clinic visits, surgeries, medication, and RAI in case of malignancy (Table 6, Supplementary Table 4). In the [18F]FDG-PET/CT-driven group, additional costs were made for the [18F]FDG-PET/CT procedure (€754 per patient), but fewer diagnostic surgeries were performed, resulting in lower surgical costs per patient. Based on observed healthcare consumption, the mean costs for regular thyroid nodule-related healthcare were €6,100 in the [18F]FDG-PET/CT-driven group as compared to €7400 in the diagnostic surgery group, with a mean difference of –€1300 (p = 0.01). Additional healthcare consumption due to incidental findings on the [18F]FDG-PET/CT (e.g., costs for additional ultrasound and/or FNAC procedures for an [18F]FDG-positive thyroid incidentaloma) increased the medical costs in the [18F]FDG-PET/CT-driven group. This reduced the cost difference between both strategies to a mean difference of − €1,000 (p = 0.06) in thyroid nodule-related medical costs. Costs for surgical complications and other healthcare consumption (i.e., care unrelated to the thyroid nodule), patient costs, and productivity losses were similar across both groups. The total first-year societal costs were €15,500 in the [18F]FDG-PET/CT-driven group as compared to €20,100 in the diagnostic surgery group, with a mean difference of − €4500 (p = 0.06).

Table 6 Estimated 1-year and lifelong societal costs per patient

Lifelong utilities and costs

Estimated using our Markov model, the lifelong utilities were similar for both strategies, with 19.273 mean QALYs for the [18F]FDG-PET/CT-driven group and 18.871 for the diagnostic surgery group (p = 0.42).

None of the lifelong societal costs were statistically significantly different between the two groups (Table 6). The mean discounted lifelong societal costs were €103,500 per patient in the [18F]FDG-PET/CT-driven group as compared to €113,400 in the diagnostic surgery group, with a mean difference of − €9,900 (p = 0.14). Lifelong extrapolation thus increased the size of the difference in QALYs and costs without reaching statistical significance.

Cost-effectiveness analysis

From a societal perspective, lifelong costs appeared in favour of [18F]FDG-PET/CT-driven management while HRQoL was sustained. Consequently, according to our analysis, [18F]FDG-PET/CT-driven management is very likely cost-effective as compared to diagnostic surgery for Bethesda III/IV thyroid nodules, regardless of the willingness to pay per QALY. The probability of cost-effectiveness is > 80% for any willingness to pay and minimally varies over the range of willingness to pay. The probability is 87% at €20,000 per QALY, 84% at €50,000, and 82% at €80,000 per QALY (Fig. 3).

Fig. 3
figure 3

Cost-effectiveness acceptability curve (CEAC). For increasing willingness-to-pay thresholds, this figure shows the probability that [18F]FDG-PET/CT-driven management is cost-effective as compared to diagnostic surgery. Analysis was performed for the first-year (dashed line) and lifelong (continuous line) cost-effectiveness analysis

Univariate sensitivity analysis

Results of the univariate sensitivity analysis are shown in Fig. 4. At a willingness-to-pay of €50,000 per QALY, [18F]FDG-PET/CT-driven management remained cost-effective as compared to diagnostic surgery for the predetermined ranges of all of the parameters tested. Of the parameters selected for univariate sensitivity analysis, the disutility after HT for a benign nodule, the probability of a missed malignancy after initial surveillance for an [18F]FDG-negative nodule (representing the false-negative rate or NPV of [18F]FDG-PET/CT), the disutility of active surveillance of an [18F]FDG-negative nodule, and the price of the [18F]FDG-PET/CT had the largest influence on cost-effectiveness to the detriment of [18F]FDG-PET/CT-driven management.

Fig. 4
figure 4

Tornado plot showing the results of the univariate sensitivity analysis on the lifelong incremental net monetary benefit per patient (x-axis) of [18F]FDG-PET/CT-driven management as compared to diagnostic surgery, for a willingness to pay of €50,000 per QALY. Dark grey bars represent lower parameter values and light grey bars represent higher parameter values. The vertical line at €0 represents the break-even situation, i.e., when both strategies have equal net benefit. The vertical line at €30,000 represents the incremental net benefit of the base case analysis for a willingness to pay of €50,000 per QALY. HT, hemithyroidectomy. QALY, quality-adjusted life year

Discussion

The EfFECTS trial compared an [18F]FDG-PET/CT-driven diagnostic workup to diagnostic surgery in Bethesda III and IV thyroid nodules and previously demonstrated that [18F]FDG-PET/CT ensured an oncologically safe 40% reduction in diagnostic surgery for benign nodules, accurately ruling out malignancy with a sensitivity of 94.1% [13]. The current cost-utility study demonstrated that an [18F]FDG-PET/CT-driven workup reduced the 1-year thyroid nodule-related and societal costs. The clear 1-year cost differences persisted over the lifelong period, albeit with a larger 95% CI due to additional modelling uncertainties. Sustained HRQoL was observed over the first year as well as the lifelong period. Consequently, an [18F]FDG-PET/CT-driven is very likely cost-effective as compared to diagnostic surgery for Bethesda III/IV nodules.

The current study is in line with the results of the previous cost-effectiveness study from our group, which reported modelled cost-effectiveness of [18F]FDG-PET/CT-driven management in a Dutch setting over a 5-year horizon and provided the rationale for the EfFECTS trial [14]. According to that study, [18F]FDG-PET/CT was dominant over three reported alternative strategies, reducing costs while preserving HRQoL with an incremental net benefit of €3700, €1000, and €3900 as compared to diagnostic surgery or management driven by one of two commercial molecular marker panels, respectively. These two specific molecular marker panels have greatly evolved over the recent years and improved their diagnostic accuracy. It is likely that the cost-utility balance has changed in their favour. However, at $3600 per test (i.e., €3109; €1 = $1.13 on 10–01-2022, Medicare reimbursement rate [29]), nearing the costs of a hemithyroidectomy procedure, cost-effectiveness of these molecular marker panels likely remains challenging in a European setting.

Approximately a dozen cost-effectiveness studies are currently available on the use of commercially available molecular marker panels in indeterminate thyroid nodules. These mostly American studies generally focussed on the direct medical costs for thyroid-nodule related care only, including the costs and utilities for molecular testing, surgery, potential surgical complications, and (postoperative) observation. From different types of cost-effectiveness models, mixed conclusions regarding cost-effectiveness were reached [26, 29, 30, 35]. Nicholson et al. reported that both the Afirma® GSC and ThyroSeq® v3 were superior to diagnostic surgery [29]. In contrast, Balentine et al. demonstrated that diagnostic surgery was less costly and more effective than Afirma® GEC testing and ultrasound surveillance over a 5-year period. Similar to the results of the current study, their results proved sensitive to the estimated postoperative utilities and those of surveillance after a negative test [26]. Hu et al. recently demonstrated that selective molecular testing (i.e., molecular testing following a repeat Bethesda III or single Bethesda IV result) admittedly prevented 9.5% fewer diagnostic surgeries for benign nodules than reflexive molecular testing (i.e., molecular testing following any first Bethesda III or IV result), but was likely the cost-effective strategy due to the high costs of molecular testing. Their results were most sensitive to the costs of molecular testing [35].

The number of cost-effectiveness studies from a European perspective is limited. A recent study from a Dutch perspective estimated that molecular testing may save a considerable number of repeat FNAC procedures and diagnostic surgeries in Bethesda III and V nodules, resulting in a net saving of €100 and €4100 for these cytological categories, respectively. Unfortunately, the study excluded Bethesda IV nodules from their analysis [5].

To the best of our knowledge, the current study is the first cost-utility analysis on additional diagnostics in indeterminate thyroid nodules to be performed alongside a randomised controlled clinical trial. This contrasts our study with previous cost-utility analyses and provides a unique perspective. Our observed first-year healthcare consumption data and quality of life assessments are unparalleled, especially in patients with indeterminate thyroid nodules. By incorporating these data into a comprehensive lifelong cost-utility model, we presented a scenario that most accurately reflects real-world clinical practice. In contrast, most previous cost-utility studies used a theoretical base case, a more simplified model, somewhat idealized parameters, and/or a limited time horizon. Any lifelong HRQoL effects and (lifelong) costs other than the direct medical costs (i.e., costs for other health care consumption, patient costs, and productivity losses) were often disregarded in these studies [14, 26,27,28, 35].

In previous studies, the possibility of patient crossover between management strategies over time was also seldom taken into account [14, 26,27,28, 35]. We previously recognized that the therapeutic yield of [18F]FDG-PET/CT is influenced by patient preference and treatment compliance. This directly reflects on health-care consumption volumes and costs. Shared decision-making is crucial to carefully determine the most suitable management strategy for individual patients and prevent noncompliance, as well as to optimize the use of valuable diagnostic resources [13]. This is a dynamic process, in which preferences and interests may change as time passes. In studies on the natural course of cytologically benign nodules, up to 24% of nodules were surgically resected as time passed, primarily due to compressive symptoms [36,37,38]. It is important to acknowledge the dynamics of clinical practice in a cost-effectiveness model, too, as this may prevent overestimation of an effect of any given strategy. To account for this, our model included a yearly probability of surgery despite a negative [18F]FDG-PET/CT, a probability that surveillance of an [18F]FDG-negative nodule would end, and a probability to re-enter active surveillance after it had previously ended (Table 2, Fig. 2).

For the Markov model, we used triangular distributions for probabilities (Table 2) and utilities (Table 3) when uncertainty about these parameters was asymmetric. The base-case parameter value was the mode of the triangular distribution. Due to the asymmetry, the mean parameter value in the analysis was typically higher than the base-case value (by on average 18%, at most 67%). For the utilities, the higher mean of some parameters could be in favour of the [18F]FDG-PET/CT-driven group (e.g., utilities concerning surgery for benign disease and concurrent complications); others could be disadvantageous to the [18F]FDG-PET/CT-driven group (e.g., utilities concerning active surveillance of [18F]FDG-negative nodules). For the probabilities, the higher means were typically disadvantageous to the [18F]FDG-PET/CT-driven group (e.g., the probability of surgery for benign lesion after continued surveillance for [18F]FDG-negative nodules or the probabilities of complications due to thyroid surgery beyond the first year). Altogether, we believe the use of asymmetric triangular distributions was likely disadvantageous to the [18F]FDG-PET/CT-driven group and may have underestimated its cost-effectiveness, which was nevertheless more favourable than in the diagnostic surgery group.

As a Markov model remains a simplified reflection of the real situation, this is a limitation of any model-based cost-utility analysis and thus also applicable to the current study. The accuracy of the estimated probabilities, costs, and utilities are dependent on the availability and quality of representative source data. Although we performed a comprehensive literature search to ensure a careful, evidence-based determination of all model parameters, the best fitting literature for some variables was only moderately related. In these cases, an expert panel was additionally consulted. This included all parameters concerning the active surveillance of [18F]FDG-negative indeterminate thyroid nodules, for which we had to rely on literature about benign nodules and expert opinion. For example, a disutility of 0.02 was assigned to the active surveillance health state. We chose a limited but conservative disutility as compared to the disutility of observation after an uncomplicated HT for a benign nodule (0.01) to prevent overestimation of HRQoL in favour of an [18F]FDG-PET/CT-driven workup and to account for any suspense of not knowing a definite histopathological diagnosis. Patients under surveillance may experience some degree of cyclic psychological distress centering around their yearly follow-up visits, although evidence supporting that assumption is currently lacking and we have not observed it in the EfFECTS trial [26]. A recent study with a limited median 15-month follow-up found no evidence of such effects and showed sustained HRQoL in patients under surveillance following a negative molecular test [39]. We included the disutilities of both observation after HT and observation after a negative [18F]FDG-PET/CT scan in our univariate sensitivity analysis. Although these disutilities did affect the incremental net benefit, [18F]FDG-PET/CT remained the cost-effective strategy across the tested ranges.

In conclusion, the current cost-utility study showed that an [18F]FDG-PET/CT-driven diagnostic workup reduced the 1-year thyroid nodule-related and societal costs while sustaining quality of life. Following the observed reduction in diagnostic surgery for benign nodules, an [18F]FDG-PET/CT-driven workup is very likely cost-effective from a Dutch societal perspective as compared to diagnostic surgery for Bethesda III/IV nodules.