FormalPara Key Points for Decision Makers

This study confirms that the structure of the model significantly affects cost-utility analysis outcomes.

Pharmacometric-based pharmacoeconomic models, which embody more biologically sound connections, more adequately predict real-life outcomes, such as toxicity variation with dose adjustment and drug exposure during therapeutic monitoring.

The usefulness of these results varies with the cost-utility analysis goal, but using biologically plausible models could improve the extrapolation of clinical trial data to long-term and real-world scenarios.

1 Introduction

Regulatory acceptance of new medications offers cancer patients the prospect of therapies with enhanced effectiveness and improved tolerability, although often associated with elevated costs [1]. Entities with vested interests—including governmental bodies, healthcare providers, and insurance agencies—are persistently confronted with challenging choices concerning reimbursements, treatment coverage and clinical guidelines. Disparities can be seen between different nations owing to variations in healthcare infrastructure, pricing regulations, patient demographics, and regional economic circumstances. The balancing act between therapeutic benefits and financial costs has gained increasing importance and, consequently, so have the methods to determine the costs and make informed judgements about value.

In health economics, the main aim is to efficiently use the available resources to improve the health of the population [2]. Pharmacoeconomic analyses are used to inform decision-making related to reimbursement, pricing, and resource allocation. The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) recognizes that well-conducted randomized controlled trials (RCTs) provide strong evidence for estimating treatment effects, but might bear limitations for long-term outcomes, safety data and real-world treatment patterns [3, 4]. Mathematical models are therefore increasingly used as they allow for (I) forecasting long-term outcomes and (II) bridging the gap between the trial populations and the more diverse patient populations encountered in real-world healthcare settings [3,4,5]. A cost-utility analysis (CUA) is a type of pharmacoeconomic analysis that not solely focuses on clinical outcomes, but also incorporates health-related quality of life into the assessment [1]. This is often done using a measure called quality adjusted life years (QALYs), which combines both the length and quality of life gained from a treatment. The main outcome in CUAs is the incremental cost–utility ratio (ICUR) that is based on the incremental cost of one additional unit of effect, e.g., the QALY.

Diverse pharmacoeconomic model structures are being applied to conduct CUA. Illustratively, following the regulatory approval of sunitinib in 2006 for the management of gastrointestinal stromal tumors (GIST), four distinct mathematical models were utilized to perform a CUA (Fig. 1). These encompassed time-to-event models, namely the exponential and Weibull models, and Markov models, both discrete and continuous in nature [6,7,8,9]. The occurrence of adverse events was either not evaluated or added through logistic regression models, as also commonly done in exposure–response analyses [10]. Since the model structure significantly impacts the ICUR, any structural uncertainty can lead to substantial variations in the results of CUAs, thereby affecting the decisions derived from these outcomes [11,12,13]. In response to this issue, the Center for Health Technology Evaluation (CHTE) within the National Institute for Health and Care Excellence (NICE) has reviewed the potential impact of structural uncertainty and proposed cases for change [14]. Likewise, the ISPOR has initiated a special task force to improve the consistency of health care decision-making processes and to introduce a good practice guidance, including modeling approaches [15].

Fig. 1
figure 1

Schematic representation of (a) the pharmacometric-based pharmacoeconomic framework based on Hansson et al. [15, 16], (b) time-to-event models, (c) Markov models, and (d) logistic regression models. The time-to-event models (b) describe the time distribution for the relevant events through a hazard function: exponential (λ), or Weibull (κ × λ × t(λ1)). In the Markov and logistic regression models (c and d) the probabilities for a specific state B were given per six-week time interval. This state probability was either dependent on the current state: (c) i.e., discrete-time Markov model (Pyx—transition probability from state x to state y) and continuous-time Markov model (λxy—transfer rate constant from state x to state y). Alternatively, this can be independently of the current state x, (d) i.e., logistic regression (Py—probability to state y). AUC area under the curve, HFS hand foot syndrome, SLD sum of longest diameter, sVEGFR-3 soluble vascular endothelial growth factor 3

Pharmacometric models are quantitative mathematical descriptions developed to characterize and predict drug behavior in the body (pharmacokinetics, PK) and the body’s response (pharmacodynamics, PD) by integrating data from clinical trials, real-world studies and mechanistic insights. PK models describe how drugs are absorbed, distributed, metabolized, and excreted, aiding to derive optimal dosing regimens for different patient populations. PD models capture the relationship between drug concentration and its effects, predicting efficacy and safety. For a more elaborate introduction to pharmacometrics, the reader is referred to earlier publications [16, 17].

In pharmacometrics, models are used to describe the PK and PD of a drug on a population level, in which data from all individuals are simultaneously assessed through nonlinear mixed-effects models. The structural features of the PKPD models enable the integration of mechanistic aspects [18]. Taking these properties into account, pharmacometric models could be interesting tools to describe the incremental effect of drug treatment in a pharmacoeconomic setting, including long-term outcomes, safety data, and real-world treatment patterns [19, 20]. For example, previous pharmacometric models predicted that TDM of sunitinib would provide no clinical benefit, a prediction later confirmed by clinical data [21, 22]. Previous work by Pink et al. has exemplified that mechanism-based pharmacometric simulations resembled the cost-effectiveness from an actual RCT and could be used for analyses of subgroups, dosing regimens and protocol deviations to inform pricing decisions [23].

The aim of this study was to evaluate the outcome differences between a pharmacometric-based pharmacoeconomic modeling framework approach [18, 24, 25] and the existing traditional pharmacoeconomic model frameworks for sunitinib in GIST, as previously published [6,7,8,9] (Fig. 1). We utilized an established pharmacometric model framework [22] that includes adverse events, tumor progression, and overall survival (OS) to simulate clinical outcomes for patients with GIST undergoing sunitinib treatment or no treatment. These simulations were then used to re-estimate the traditional pharmacoeconomic model parameters, enabling a comparative CUA of sunitinib therapy versus no treatment. Additionally, the study explored the predicted impact of sunitinib TDM on treatment outcomes within these different model frameworks. Following the aims, data-generating mechanisms, estimands, methods, and performance measures (ADEMP) framework [26], our methodology included simulating the “truth” using the pharmacometric-based pharmacoeconomic model framework and comparing the performance of each traditional pharmacoeconomic model against this truth.

2 Methods

2.1 General Workflow and Baseline Dataset

A visual representation of the workflow is displayed in Fig. 2. An overview of the ADEMP structure used in this work can be found in Table S1 in Online Resource 1. The methods were designed to align with the CHEERS (Consolidated Health Economic Evaluation Reporting Standards) guidelines to ensure transparency and comprehensiveness. However, the primary aim of this work was to compare pharmacoeconomic modeling methods, not to determine the cost–utility of sunitinib treatment for a specific country. This focus required certain methodological choices to facilitate a robust comparison between different modeling approaches, rather than strict adherence to the guidelines.

Fig. 2
figure 2

Overview of the workflow. ICUR incremental cost-utility ratio, PE traditional pharmacoeconomic, PMX pharmacometric-based pharmacoeconomic, SLD sum longest diameter, TDM therapeutic drug monitoring, QALY quality-adjusted life years

As a first step, “true” data from a clinical trial was simulated using an earlier developed pharmacometric model framework for sunitinib in patients with GIST [22]. The model framework consisted of four models describing the time course of adverse events [hypertension, neutropenia, hand-food syndrome (HFS), and fatigue], soluble Vascular Endothelial Growth Factor Receptor-3 (sVEGFR-3) concentration, tumor growth, and OS (time-to-event Weibull model) (Fig. 1). Additionally, a separate model describing the time course of the adverse event thrombocytopenia, similar to the neutropenia model, was developed and added to the framework. A more detailed description of each model is provided in Online Resource 1. Patients were solely censored at death or when study end was reached.

A target population (Npatient = 1000) representing patients metastatic and/or unresectable GIST was generated using the distribution of original patient demographics [weight (normal distribution): mean = 73.5, standard deviation (sd) = 18.7, interval 36–185 kg; baseline tumor size (log-normal distribution): mean = 182.7, sd = 134.2, interval 29–822 mm) [22, 24, 25] (Fig. 2: “Dataset generation”). The selection of 1000 patients for the initial simulation was made to represent data from a large clinical trial. Using this population in combination with the pharmacometric framework, two study arms were simulated to represent (1) one arm with a continuous sunitinib dosing regimen of 37.5 mg daily and (2) one arm without sunitinib. Both study arms ran over a period of 104 weeks. The simulated clinical outcomes of the model included tumor progression (more than 20% growth from nadir), death, and the incidence of adverse events over time.

Following accepted clinical practices and the prescribing information [27], dose reductions (possible doses: 0 mg, 12.5 mg, 25 mg, and 37.5 mg, based on available tablet sizes) were implemented in the case of unacceptable adverse events, with the included adverse events variables being closely monitored. In the simulations, fatigue and HFS were monitored daily, while neutrophil count (ANC), diastolic blood pressure (dBP), and platelet count were evaluated every 6 weeks in accordance with established clinical protocols. For the first occurrence of grade 3 adverse events (or grade 2 for HFS and fatigue), the patient dose was reduced to 0 mg/day until the toxicity improved to ≤ grade 1, at which point the original dose was reinstated. In instances of grade 4 AEs (or grade 3 for HFS and fatigue), as well as for recurring adverse events [i.e., more than one instance of grade 3 AEs (or grade 2 for HFS and fatigue)], subsequent dosing was resumed at a reduced level (by 12.5 mg) following resolution of the toxicity under 0 mg/day [27].

The simulated toxicity data was transformed into a binary format denoting the presence (yes) or absence (no) of unacceptable toxicity, aggregated for every 6-week cycle. The criteria for defining unacceptable adverse events were as follows: neutropenia—an ANC < 1 × 109/L, thrombocytopenia—a platelet count < 50 × 109/L, diastolic hypertension—a dBP ≥ 110 mmHg, HFS ≥ grade 3 and fatigue ≥ grade 3. If a patient experienced unacceptable adverse events, they were classified as experiencing an adverse event related to that specific toxicity variable for the duration of the treatment cycle in question. Patients continued therapy and were monitored for survival throughout the study period, regardless of whether they experienced disease progression.

All data visualization and data management were conducted in RStudio in R (version 4.2.0). Model simulations were conducted with the package mrgsolve (version 1.0.3), datasets were generated using dmutate (version 0.1.3) and dplyr (1.0.9). Figures were generated using the package ggplot2 (version 3.4.3).

2.2 Traditional Pharmacoeconomic Model Estimations

All traditional pharmacoeconomic model parameters were estimated in NONMEM version 7.5.0 on the basis of the simulated data (Fig. 2: “Estimation of PE models”). The pharmacometric-based models were not re-estimated, except for the thrombocytopenia model, which was newly developed (see Online Supplementary 1 for further details). The traditional pharmacoeconomic model structures were selected on the basis of four distinct mathematical models that were utilized to perform a CUA of sunitinib [6,7,8,9]. In the recreation of the four traditional pharmacoeconomic models describing progression-free survival (PFS) and OS, the indicator function f(θ,x) was employed to indicate the comparators of interest, i.e., whether the patient received sunitinib treatment by multiplying an estimated model parameter (θDRUG) by the binary indicator variable Xdrug [Xdrug = 0: absence of sunitinib treatment (study arm without sunitinib), Xdrug = 1: presence of sunitinib treatment (study arm with continuous sunitinib dosing regimen at starting dose 37.5 mg daily and potential dose adjustments)].

$${f\left(\theta ,x\right)= \theta }_{\text{DRUG} }\bullet {X}_{\text{drug}}$$
(1)

A more detailed discussion of the model structures and the equations underlying each separate model can be found in Online Resource 1.

2.3 Generation of the Model Frameworks

The final model structures and corresponding parameter estimates were translated into mrgsolve for simulations. In addition to the existing pharmacometric-based pharmacoeconomic model framework, the four traditional pharmacoeconomic model frameworks were generated by merging the corresponding PFS, OS (separated for the TTE models, combined for the Markov Models), and toxicity models (five logistic regression models) (Fig. 2: ‘Creation of simulation frameworks’).

Individual utility values (UVi) were determined at each occasion (UVij) ranging from 0 (lowest quality of life) to 1 (perfect quality of life) following the EuroQol-5 Dimension (EQ-5D) questionnaires [28], using a multiplicative function (Online Resource 1) [29]. The decision to confine utility values between 0 and 1 was motivated by the relatively high quality of life observed in patients with GIST, even at advanced stages of the disease [30]. The specific UVs for each condition were based on literature (UVbaseline: 0.712, UVneutropenia: 0.9777 (≥ grade 2), UVthrombocytopenia: 0.9895 (≥ grade 3) or 0.892 (≥ grade 4), UVHFS: 0.8813 (≥ grade 3), UVfatigue: 0.999 (≥ grade 1), or 0.9107 (≥ grade 3), UVprogression: 0.577) [18, 31,32,33], with the baseline UV accounting for the reduced quality in life owing to presence of GIST in all patients.

The economic evaluation was conducted from the Dutch healthcare system perspective, with costs reported in 2023 (Euros). A time horizon of 2 years (104 weeks) was selected as most adverse events occur within this period, and the median survival is approximately 2 years (around 100 weeks). This time span allowed for a more focused analysis of model performance by concentrating on the critical period during which the most significant events and costs occur. The total costs were calculated per patient over the span of 104 weeks or until the time of death. Information on the drug costs of sunitinib was gathered from the Dutch National Health Care Institute [34]. The costs of regular follow-up consisted of imaging, laboratory costs and medical visits declared via the Dutch Healthcare Authority [35] where visits were considered to occur every 16 weeks (as reported in 2023 (Euros)) [18]. Costs for adverse events were based on those estimated in a previous study in a German population (as reported in 2019 (Euros)) [36, 37]. The costs for adverse events were based on data from Germany owing to the unavailability of equivalent Dutch data. Given the similarities between the Dutch and German healthcare systems [36], German data was deemed to serve as a suitable proxy. A summary of all costs is provided in Table S3 (Online Resource 1). Both costs and life-year outcomes were discounted by 4% per year, following Dutch guidelines [38]. No adjustments for inflation were made as the cost data were directly taken from recent studies.

2.4 Simulations of Treatment Outcomes and Costs

The final models were utilized to conduct a CUA comparing sunitinib treatment with no treatment in patients with GIST, assessing treatment outcomes and the ICUR. For each of the five model frameworks, a single virtual patient cohort (Npatient = 10,000) was created over a 2-year period (Fig. 2: ‘CUA’). The selection of 10,000 patients for the virtual cohort was made to ensure a robust analysis, as a larger sample size reduces noise owing to random variability and helps to ensure that observed differences between models are owing to structural factors rather than random chance.

Two simulations were performed for each model framework: one with sunitinib treatment (Xdrug = 1) and one without (Xdrug = 0), both using the same underlying patient population. The ICUR was calculated by comparing sunitinib treatment with no sunitinib treatment.

Additionally, the precision of the outcomes for each model (OS, QALYs, and ICUR) was evaluated by repeating the simulations (Nsimulation = 100). In each iteration, a new virtual population was generated. The outcomes of the pharmacoeconomic model frameworks were compared to the outcomes of the pharmacometric model framework. For each traditional pharmacoeconomic model framework (‘PE’), the mean of the 100 simulations was quantified as the relative deviation from the base (RDB), i.e., the relative deviation from the pharmacometric-based pharmacoeconomic model framework (‘PMX’) (Eq. 2).

$$\text{RDB}= \frac{{\text{Outcome}}_{\text{PE}}-{\text{Outcome}}_{\text{PMX}}}{{\text{Outcome}}_{\text{PMX}}} \times 100\%$$
(2)

The impact of reducing drug prices following the expiration of patents was additionally explored. As the median reduction in the Netherlands was 41% [39], a 40% reduction scenario was considered alongside two more extreme scenarios of 60%, and 80% to the drug price. For each discount scenario, the total costs and the ICUR were based on the mean of 100 simulations for each model framework.

2.5 Therapeutic Drug Monitoring

A scenario with Ctrough-based dose adjustments was evaluated where the developed frameworks were used for CUA of sunitinib in TDM (Fig. 2: TDM workflow). Firstly, it was assessed whether each of 1000 simulated patients from the sunitinib treatment arm (Sect. 2.1 Baseline dataset) had a simulated Ctrough value below or above the target exposure at day 57, for both efficacy (> 37.5 ng/ml) and safety (< 75 ng/ml) targets separately [40] (Fig. 2: ‘Dataset generation’ TDM workflow). This procedure was similar to a previous pharmacoeconomic study of TDM for sunitinib [36]. Differences in PFS and OS between patients above and below the efficacy threshold of 37.5 ng/ml were estimated using Eq. 1. Similarly, differences in toxicity between patients above and below the safety threshold of 75 ng/ml were estimated, using Eq. 1 to estimate the difference between patients. In addition, four pharmacoeconomic model frameworks were generated, where the influence of sunitinib exposure (Ctrough above or below safety and efficacy thresholds) on clinical outcomes was assessed (Fig. 2: ‘Estimation PE models and Creation of simulation frameworks’ TDM workflow).

Thereafter, using the model frameworks, the impact of TDM, with accompanying dose adjustments, was explored for a virtual patient population (Npatient = 10,000) over a 2-year time horizon (Fig. 2: ‘CUA’). A Ctrough above 37.5 ng/ml was associated with efficacy and a Ctrough of 75 ng/ml to toxicity. Two variables were created per patient (X37.5 and X75) indicating whether the patient was below, within or above these limits. Patients were randomly assigned into three groups according to the estimated underlying probability distribution based on the 1000 simulated patients from the sunitinib treatment arm: sub-therapeutic (< 37.5 ng/ml, p = 0.557: X37.5 = 0, X75 = 0), therapeutic (37.5–75 ng/ml, p = 0.414: X37.5 = 1, X75 = 0) and supra-therapeutic (> 75 ng/ml, p = 0.029: X37.5 = 1, X75 = 1) [41].

In the traditional pharmacoeconomic model frameworks, TDM intervention was simulated following a previous publication [36] to occur after cycle 1 of therapy (day 42). Here, all patients were assigned into the therapeutic group unless toxicity had occurred during cycle 1, thereby mimicking immediate correct dose adjustment following TDM. In the following cycles (every 42 days), patients remained in the assigned group or transited to the sub-therapeutic group, in case of toxicity (i.e., through dose reduction).

Within the pharmacometric-based pharmacoeconomic model framework, simulations were not treatment cycle-dependent and sunitinib monitoring was conducted on day 15, 29, and 57, adhering to a recommended TDM schedule [42]. Dose adjustments were individualized based on a predetermined range of available dosages (0, 12.5, 25, 37.5, 50, 62.5, and 75 mg). Doses were increased by increments of 12.5 mg for exposures below the target, while exposures above the target led to a decrease by the same amount. Not all patients achieved the therapeutic window following a single dose adjustment; occasionally, multiple dose alterations were required. The protocol for dose modifications in response to toxicities was consistent with the procedures outlined in Sect. 2.1.

3 Results

3.1 Clinical Outcomes

The toxicity outcomes simulated from the pharmacometric-based and the traditional pharmacoeconomic frameworks (TTE Weibull framework for illustration) are summarized in Fig. 3. The toxicities HFS, neutropenia, and thrombocytopenia that were simulated by the pharmacometric-based pharmacoeconomic model framework reduced over time, whereas in the traditional pharmacoeconomic model framework, time had no influence on the toxicities. Consequently, the highest incidence of individual toxicities simulated for a treatment cycle was higher in the pharmacometric-based pharmacoeconomic framework than in the traditional pharmacoeconomic frameworks for HFS (11.1% and 8.1%, respectively), neutropenia (6.5% and 4.1%, respectively), and thrombocytopenia (3.0% and 1.2%, respectively).

Fig. 3
figure 3

Simulated toxicity development in patients. This figure compares the pharmacometric-based pharmacoeconomic model framework (PMX framework), which integrates dose-exposure-response relationships allowing for dose adjustments over time, with the traditional pharmacoeconomic model framework (PE framework, specifically the TTE Weibull framework for illustration). Blue bars represent patients treated with sunitinib starting dose of 37.5 mg, and red bars represent untreated patients (both Npatient = 10,000). Toxicity occurrence is expressed as a percentage of individuals experiencing toxicity among those alive at each treatment cycle. Notably, the PMX framework accommodates dose reductions, while the PE models reflect the average drug effect, inclusive of dose reductions. PE traditional pharmacoeconomic, PMX pharmacometric-based pharmacoeconomic

The average presence of toxicities over the treatment period was similar for the pharmacometric-based pharmacoeconomic framework (HFS = 7.5%, fatigue = 4.5%, neutropenia = 5.1%, thrombocytopenia = 0.75%, and diastolic hypertension = 0.88%) and the traditional pharmacoeconomic framework (HFS = 7.3%, fatigue = 5.4%, neutropenia = 3.8%, thrombocytopenia = 0.93 %, and diastolic hypertension = 0.73%). However, the total number of unique toxicities per patient over the treatment period was lower in the pharmacometric-based pharmacoeconomic framework (0 = 27.8% of patients, 1 = 39.3%, 2 = 26.2%, 3 = 6.3%, 4 = 0.4%, and 5 < 0.01%) compared with the traditional pharmacoeconomic framework (count 0 = 15.7% of patients, 1 = 28.5%, 2 = 32.7%, 3 = 19.3%, 4 = 3.5%, and 5 = 0.24%).

An overview of the simulated PFS and OS for each model framework are provided in Table 1. The Kaplan–Meier for OS (Fig. S1) and PFS (Fig. S2) can be found in the Online Resource 1. The DTMM and TTE exponential models underestimated the OS at 104 weeks compared to the pharmacometric model framework.

Table 1 OS and PFS outcomes simulated from the five pharmacoeconomic model frameworks a

3.2 Cost-Utility Outcomes

The predicted ICUR varied across models, where the highest was found for the exponential TTE model at 199,232 euro per QALY [standard deviation (sd) 8655] and the lowest for the discrete Markov model at 112,484 (sd 1855). The Weibull TTE model showed an ICUR of 153,050 (sd 4037), and the continuous Markov model predicted an ICUR of 121,197 (sd 2981). The largest deviations of the simulated ICUR compared with the pharmacometric-based framework were for the TTE exponential model (+ 39.6%) and DTMM (− 21.2%), whereas there were smaller differences for the TTE Weibull (+ 7.2%) and CTMM (− 15.1%). Similar trends were seen when the precision was evaluated through re-simulation (Fig. 4), as the ICUR for the pharmacometric-based framework and the TTE Weibull model overlapped. The variability in ICURs between the re-simulations were similar across all models.

Fig. 4
figure 4

The incremental costs versus QALY per model. Comparing sunitinib treatment versus no treatment. Each model was simulated 100 times (Nsimulation = 100), and each point represents the average of the patient population (Npatient = 10,000) in one simulation. The red dashed line represents the willingness to pay threshold, which was set at €80,000 per QALY [36]. CTMM continuous time Markov model, DTMM discrete time Markov model, PMX pharmacometric-based pharmacoeconomic, TTE time to event, QALY quality-adjusted life years

In the treatment groups, drug costs accounted for the largest portion of expenses, ranging from 90.4 to 91.4%. Check-up costs constituted 2.0–2.3% of expenses, while toxicity-related costs were 6.5–7.4% of total. Conversely, in the no-treatment group, check-up costs represented the largest expense percentage, ranging from 59.6 to 70.5%. Across the pharmacoeconomic model framework predictions, the decrease in total cost reduction was almost proportional to the percentage of drug cost discount Fig. S3 (Online Resource 1). Additionally, an overview of the CHEERS checklist can be found in Table S4 (Online Resource 1).

3.3 Therapeutic Drug Monitoring

Simulation results for OS indicated that most traditional pharmacoeconomic models can predict the difference in survival between the use of TDM and no TDM intervention, by showing no difference in OS. Similarly, simulations related to drug toxicity correctly predicted that there is only a slight advantage of using TDM (Fig. S4, Online Resource 1). The DTMM model was an exception, where an initial survival benefit for the no TDM group diminished over time (Fig. S5, Online Resource 1).

There was, however, a noticeable difference in the proportion of patients within the therapeutic window predicted by the pharmacometric-based pharmacoeconomic framework compared with the pharmacoeconomic model frameworks (Fig. 5). Initially, both types of simulations demonstrated an increase in the proportion of patients within the therapeutic window after implementing TDM. Nonetheless, in the traditional pharmacoeconomic frameworks, the proportion of patients below the therapeutic window gradually increased over time owing to the occurrence of toxicity, while the time-dependency was less pronounced in the pharmacometric-based pharmacoeconomic framework.

Fig. 5
figure 5

Distribution of patient drug exposure in relation to the therapeutic window over time. PE traditional pharmacoeconomic, PMX pharmacometric-based pharmacoeconomic.

4 Discussion

A previously developed pharmacometric model framework was extended to simulate clinical outcomes for patients with GIST, both with and without sunitinib treatment. Four traditional pharmacoeconomic model frameworks were successfully replicated from published literature to describe these efficacy and toxicity outcomes, including the effect of sunitinib treatment. Using both the pharmacometric-based and traditional pharmacoeconomic model frameworks, simulations of clinical outcomes and cost-utility were performed and differences in outcomes were quantified. The results demonstrate that the model structure influences the results of a CUA. Compared with the traditional pharmacoeconomic frameworks, the pharmacometric-based pharmacoeconomic framework can capture real-life patterns, such as the time course of toxicity and frequency of occurrence of adverse events, as well as the expected result from TDM intervention.

In line with literature [43,44,45,46,47], the impact of different model structures on outcomes became evident. While all traditional pharmacoeconomic model structures were estimated on the same dataset, discernible differences in simulated outcomes for sunitinib efficacy and costs were observed. The analysis revealed that, given the current high drug prices of both the brand-name and generic sunitinib [48], the drug-related expenditures constitute the predominant portion of the overall incremental costs and decreases in drug price correspond to a nearly linear reduction in the total costs. Considering the daily administration of sunitinib, it follows that the drug costs are directly proportional to the estimated OS. This notion, in turn, provides an explanation for the simulated outcomes of the traditional pharmacoeconomic framework, where both the ICURs and the OS estimates–represented as percentage deviations of the underlying pharmacometric-based predicted values–have a smaller deviation for TTE Weibull (ICUR: + 7.2%, OS gain: − 3.2%) and CTMM (ICUR: − 15.1%, OS gain: + 1.1%) framework. Larger disparities were seen for the TTE exponential (ICUR: + 39.6%, OS gain − 43.0%) and the DTMM (ICUR: − 21.2%, OS gain: + 13.7%) frameworks, likely owing to that the survival model structures only used one parameter to describe the shape of the hazard or probability function, in line with the previously applied models for GIST [6,7,8,9]. Underestimation of the ICUR in the DTMM framework could be explained by the large simulated difference in OS between the treatment and the no treatment group (i.e., large gain in OS), while overestimation of the ICUR in the TTE exponential framework is likely owing to the small simulated difference in survival (i.e., small gain in OS) (Fig. 4, Table 1, Fig. S1 Online Resource 1).

It should be noted that it is unsurprising that the Weibull performed best as it is the underlying model of the “true” simulated data. However, the focus of this study was not to define the best model but to exemplify that the choice of model structure should be made on the basis of its capacity to describe the observed data and the potential biases that can arise if this is not done. Careful consideration should be given to exploring time-varying hazards when selecting a model. This holds particular significance considering that commonly used pharmacoeconomic models are often applied in published works without explanation for their selection or demonstration of their ability to describe the trial data [49].

In reviewing data from phase I–III clinical trials where sunitinib was assessed for efficacy in patients with GIST displaying resistance and/or intolerance to imatinib, the observed median OS ranged between 31 and 61 weeks, varying by trial [50,51,52,53]. This range aligns with our simulated predictions, which estimated a median OS of 49 weeks on the basis of a pharmacometric-driven pharmacoeconomic framework. Additionally, traditional pharmacoeconomic models resulted in estimates of 49.5 weeks (time-to-event Weibull model) and 44.7 weeks (competing risks traditional mixed model). Conversely, the estimates of 25.2 weeks (time-to-event exponential model) and 21.5 weeks (discrete-time mixed model) from traditional pharmacoeconomic frameworks appear underestimated when compared with the observed trial data.

The logistic regression models—consistent with its typical characteristics [10]—did not account for (1) time-varying effects, (2) toxicity-driven dose adjustments, or (3) individual predispositions to toxicity. Although the traditional pharmacoeconomic frameworks demonstrated similar average toxicity frequencies to the pharmacometric-based pharmacoeconomic framework, the pharmacometric framework displayed higher peak toxicities. This aligns with expectations, as the pharmacometric-based pharmacoeconomic framework can dynamically simulate the gradual evolution of toxicity over time and can factor in possible dose adjustments once a severe adverse event has occurred. Additionally, while traditional pharmacoeconomic models capture average toxicity rates, they tend to overestimate the variety of different toxicities per patient. This discrepancy likely arises from the random effect component (ETA) within the pharmacometric adverse events models that dictates the likelihood of each toxicity within an individual. Through this likelihood, some patients are more prone to re-experience the same toxicity, unlike the traditional pharmacoeconomic logistic regression models where the tendency for toxicity occurs entirely by chance. The relevance of the pharmacometric framework’s capacity to account for both the time-varying changes and the likelihood of individuals to re-develop a toxicity for CUAs depends on the purpose and type of analysis performed. For instance, these features could be relevant for extrapolating CUAs beyond the observed time horizon.

The results from the TDM evaluation demonstrated that for both the safety and efficacy outcomes, the simulations from the traditional pharmacoeconomic model frameworks captured the difference in toxicity and OS outcomes between the TDM and no TDM scenario, despite the more empirical relationship between dose interventions and outcomes within the model structures. However, this resemblance is likely owing to the limited effect of TDM on sunitinib effect and toxicity. In other words, the traditional pharmacoeconomic model frameworks are capable of predicting the lack of impact [22], but it remains unclear whether in case of a positive effect of TDM, the model simulations would have remained predictive. The traditional pharmacoeconomic model frameworks largely overestimate the proportion of patients with treatment overexposure, which in a scenario of TDM effect could have resulted in reduced simulated toxicities and OS. Owing to the lack of a mechanistic connection between drug dosage, exposure and response, traditional pharmacoeconomic models depend on the effect size from specific exposure levels, complicating the extrapolation of results to broader patient populations not typically represented in clinical trials. This reliance on observed effect sizes poses challenges in applying clinical trial findings to real-world populations.

Examples of toxicity and TDM illustrate that the pharmacometric-based pharmacoeconomic approach offers significant benefits in handling structural uncertainty by selecting models on the basis of their capacity to describe available data and mechanistically incorporating drug actions and patient variability—features often missed by traditional pharmacoeconomic models. Additionally, while not commonly employed in pharmacometrics, model averaging is an alternative tool that can further account for uncertainty by combining predictions from multiple models, offering a contrast to the single-model reliance typical of traditional methods [54].

This study’s main limitation is its single-drug focus, making it challenging to generalize findings to other CUA scenarios. For example, toxicity costs—although minor in this case—could be substantial in therapies with costly toxicity management. Nevertheless, a similar trend may be expected for other TKIs owing to the shared cost profiles, clinical presentations, and mechanism of action. Secondly, assumptions about parameter distributions and uncertainties are commonly made in traditional pharmacoeconomic modeling [55] and the data-driven selection of between-patient variability could introduce differences between the modeling methods. This study did not investigate how these assumptions potentially bias the results, nor how this is influenced by varying the number of patients in the baseline dataset (i.e., decreased precision), as the focus lied on structural uncertainty. Moreover, we did not express precision in terms of the likelihood of being cost-effective for the selected Dutch willingness-to-pay threshold because all simulated predictions were above this threshold, and therefore, this value was not expected to provide additional information about differences in model structure. Given the large amount of simulated patients and the relative comparison of each traditional pharmacoeconomic model framework to the pharmacometric–based pharmacoeconomic model framework, parameter uncertainty is not expected to influence the results. Lastly, to simplify the analysis, the effect of informative [i.e., missing not at random (MNAR) due to progression of tumor size] dropout was not simulated in the pharmacometric framework. However, lack of incorporation of informative dropout in the conducted pharmacoeconomic analysis is likely to result in selection bias and thus could be another potential model limitation when not taken into consideration [56, 57].

5 Conclusions

This study underscores the impact of model structure on the outcomes of a CUA and by extension its potential influence on regulatory and reimbursement decisions. Particularly for drugs like sunitinib, where daily administration constitutes a significant portion of the overall cost, deviations from the predicted survival time can greatly affect the CUA, particularly in terms of both initial and post-patent expiration pricing. Therefore, the selection of the model structure is a pivotal aspect of CUA and should ideally be based on its best fit to the available data. Models that incorporate certain mechanistic elements, such as accounting for time-varying changes in toxicity incidence, are preferable [20]. These models are more likely to provide simulations that closely resemble real-world clinical scenarios and can extend the analysis beyond the trial’s follow-up period, as demonstrated in previous research [18]. Additionally, pharmacometric-based pharmacoeconomic models that consider correlations within individuals are likely to give predictions that are more accurate, particularly for evaluations of dose individualization scenarios. The integration of clinical trial data and economic analysis, as recommended by the ISPOR task force, has the potential to enhance and expedite the reimbursement decision-making process. While pharmacometric-based modeling is time-consuming, which may be a limiting factor, its increasing importance in drug development and regulatory approval processes makes further extension into pharmacoeconomic analyses a relatively straightforward next step. Thus, incorporating these detailed models into pharmacoeconomic evaluations could streamline and improve both CUA and the reimbursement decision-making process.