FormalPara Key Points for Decision-Makers

In addition to Parkinson’s disease symptoms and motor complications, future models should include non-motor complications and treatment changes.

Methodological quality could be improved by presenting parameter estimates transparently and by critically assessing their influence on the results.

Deterministic, scenario-based, and probabilistic sensitivity analyses and validations of parameters and model designs should be considered in the interpretation of results.

1 Introduction

The treatment of Parkinson’s disease (PD) and its comorbidities is economically important, as therapeutic options are often highly specialised and therefore expensive [1,2,3,4,5,6,7,8,9,10]. Primary therapy consists of levodopa [11]. If PD progresses, other treatment options are available, for example dopamine agonists to stimulate dopamine production, or enzyme inhibitors to reduce dopamine degradation (e.g. decarboxylase/monoamine oxidase B/catechol-O-methyl transferase inhibitors). Combinations of several agents are also available, for example carbidopa (decarboxylase inhibitor) with levodopa (levodopa-carbidopa gastrointestinal gel [LCIG]/Duodopa®). To further improve motor complications (motor fluctuations and dyskinesias, also known as ‘off times’ during the day), absorption can be improved by infusion/injection (apomorphine) or more continuous administration by a pump (LCIG). In addition, multiple medications can be combined as best medical treatment (BMT). Moreover, surgical treatment options such as the invasive deep brain stimulation (DBS) or the non-invasive magnetic resonance tomography-guided focused ultrasound therapy (MrgFUS) are used when medication fails. In addition to the treatment of PD symptoms and motor complications, as well as consequences of the disease (e.g. falls due to limited mobility), treatment of non-motor complications (e.g. sleep disorders, depression, dementia) is often needed, which places an additional financial burden on the healthcare system [12]. As there is currently no disease-modifying treatment for PD, PD remains a chronic illness and therefore generates high healthcare costs over the long term [13]. Furthermore, the economic burden of PD is expected to increase owing to the older age of patients and an ageing population worldwide [14].

Economic evaluations are used to determine the efficiency of (new) interventions. In a cost-effectiveness analysis, the healthcare costs and effects associated with a new intervention are compared and related to costs and effects of an established treatment by calculating the incremental cost-effectiveness ratio. Effectiveness thereby can be measured based on clinical scales [e.g. Hoehn & Yahr (HY) or the Unified Parkinson’s Disease Rating Scale (UPDRS)], as well as quality-adjusted life-years (QALYs), a standard measure for utilities in health economic evaluations, with one QALY representing 1 year of best health-related quality of life, and zero QALYs representing death. The HY scale measures the symptoms of PD, considering unilateral symptoms (HY I), bilateral symptoms (HY II), mild-to-moderate disability with impaired postural reflexes (HY III), severely disabling disease but still able to walk or stand unassisted (HY IV), and confinement to a bed or wheelchair unless aided (HY V). Compared to the HY scale, the UPDRS assesses symptoms through 42 questions in four domains (1) mentation, behaviour, mood, (2) activities of daily living, (3) motor examination and (4) complications of therapy.

Economic evaluations are often conducted based on clinical trials investigating the efficacy of a new intervention [15]. These “piggyback” studies benefit from the existing structure of the main trial such that health economic data are available at the same time as efficacy data with low additional costs for the health economic evaluation and data highly adapted to the research objective. However, clinical trials are generally limited to shorter follow-up periods, making it difficult to obtain long-term results. Furthermore, scenarios beyond the study protocol, such as younger/older patients with more/less symptoms or treatment changes, cannot be evaluated. To overcome these limitations, mathematical models can be used to determine the course of the disease for different time horizons by combining data from multiple sources [15].

Because economic evaluations based on modelling approaches cannot draw conclusions based on empirical data, a critical assessment of methodological quality is needed to determine the influence of model design, model assumptions, and parameter estimates, and to interpret results reasonably [16]. Systematic reviews of the literature on modelling the cost effectiveness of treatments for PD before 2010 showed major weaknesses in adherence to good modelling practice recommendations [17,18,19]. The authors criticized the predominant use of a healthcare payer’s perspective and insufficient consideration of adverse events (other than motor complications) and comorbidities. Furthermore, sensitivity analyses to deal with statistical uncertainty, and validations of model design and parameter estimates were often inadequate. Therefore, the authors concluded that interpretation of the results may be limited. The current systematic review therefore updated previous reviews [17,18,19] by assessing the methodological quality of modelling approaches for economic evaluations of the treatment of motor symptoms in PD published after 2010.

2 Methods

2.1 Literature Review and Assessment

2.1.1 Literature Search

A literature search was conducted in PubMed, EconLit, the Cochrane Database of Systematic Reviews, National Health Service Economic Evaluation Database and Health Technology Assessment databases of the UK National Health Service Centre for Review and Dissemination (March 2010 to July 2022). The starting date for the date range of the literature search of March 2010 corresponded to the end date of literature searches of published reviews on the same subject [17, 18]. The search was based on the following keywords: ‘Parkinso’s disease’, ‘economic evaluation’, ‘decision analysis’, ‘health care model’, ‘Markov model’, ‘discrete event simulation’ and ‘QALY’ [see Electronic Supplementary Material (ESM)]. This search was supplemented by a hand search of the reference lists of the included literature. A detailed description of search terms can be found in the ESM.

Included studies had to (1) address patients with PD, (2) present an economic evaluation (i.e. costs and effects of at least two interventions had to be compared and related), (3) compare treatment of motor symptoms and (4) be based on a mathematical modelling approach. Studies were excluded if they (1) were not original research, (2) were not published in English or (3) were published before 2010.

Literature was reviewed, the data extracted and the quality assessed by two independent authors (JD, JZ). Discrepancies were solved through discussion between the two authors.

2.1.2 Data Extraction and Assessment

Data extraction of the included evidence covered the following domains:

  • Table 1. ‘Summary of analytic framework and model features’: reference, country, funding, study type, comparators, target population, perspective, time horizon, discount rate, outcomes, analytic approach, sensitivity analyses and validations, conclusions (by the authors).

  • Table 2. ‘Summary of input parameters of published cost-effectiveness models’: reference, transition probabilities, efficacy, utilities, costs.

Table 1 Summary of analytic framework and model features
Table 2 Summary of input parameters of published cost-effectiveness models

As international quality checklists such as the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) [16] or the checklist by Drummond et al. [20] do not focus on modelling aspects of economic evaluations (such as the choice of model design, parameter estimation, dealing with uncertainties especially in the context of modelling) and thus do not contribute to all relevant aspects of the current study objective, a checklist of the German Scientific Working Group was used to assess the study quality [21]. This comprises 56 questions on context (two questions), evaluation framework (eight questions), analytic approach (four questions), health outcomes (seven questions), costs (nine questions), discounting (four questions), presentation of results (seven questions), uncertainty (six questions), discussion (seven questions) and conclusions (two questions). Each of the questions could be rated as 1 = ‘criterion fulfilled’, ½ = ‘criterion partially fulfilled’ and 0 = ‘criterion not fulfilled’, thus a maximal of 56 points could be achieved.

3 Results

The literature search resulted in 430, 23, 22 and 51 hits in PubMed, the National Health Service database, the Cochrane Library and EconLit, respectively (Fig. 1). In addition, eight studies were identified via screening of the references of included economic evaluations and systematic reviews. A total of 534 hits were found, of which 34 were duplicates. After screening the title and abstract, 51 full-text articles were reviewed. Of these, two were excluded that did not relate to PD, 13 were not health economic evaluations, two did not compare treatment of motor symptoms and six did not use a modelling approach. In addition, eight systematic reviews whose literature was screened were excluded. In sum, 20 original papers were included.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram. NHS National Health Service, PD Parkinson’s disease

3.1 Treatment Options and Target Population

All studies evaluated the cost effectiveness of symptomatic treatment options after the onset of motor complications (Table 1) [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. These included the dopamine agonist ropinirole [40] and the monoamine oxidase B inhibitor rasagiline [29, 31], as well as entacapone [31] and opicapone [32] as catechol-O-methyl transferase inhibitors. Furthermore, treatment with LCIG [24, 33, 35, 36] and extended-release carbidopa/levodopa [22] as a decarboxylase inhibitor as well as apomorphine [23, 39, 41] were evaluated. In addition to medical treatment, surgical interventions such as DBS [25,26,27,28, 30, 34, 38] and MRgFUS [37] were investigated.

The initial age of the patients varied between 52 [25, 30] and 65 [28, 40] years, depending on the treatment and its intended administration during the course of PD (Table 1). Accordingly, the model design was adjusted to the severity of the disease of the target population depending on the treatment under study. For the treatment of early motor complications, patients were assumed to be aged 52–63 years and frequently in HY I–III [25, 29, 30, 39]. For the treatment of motor complications in advanced stages of PD, patients were assumed to be aged 59–65 years with the majority being in HY II–IV and with higher ‘off times’ per day compared with patients with early motor complications [22,23,24, 26,27,28, 31,32,33,34,35,36,37,38, 40, 41].

The majority of studies took place in the UK [24, 27, 30, 33, 35, 36, 41] und USA [22, 29, 31, 32, 38, 39] healthcare systems. Three studies pertained to the Germany [25, 26, 41], and two to Canada [23, 37]. Remaining studies pertained to the Netherlands [40], Taiwan [28] and Japan [34] (Table 1).

3.2 Modelling Approach and Transition Probabilities

In all but three publications [29, 30, 37], disease progression was modelled by the disease-specific symptom scales HY [23,24,25,26,27,28, 33,34,35,36, 38, 40, 41] and ‘off times’ because of motor complications caused by treatment measured using the UPDRS IV question no. 32 [22,23,24,25,26,27, 31,32,33,34,35,36, 38,39,40,41] (Table 1). Thereby, the majority of models chose a combination of HY levels and ‘off times’ per day [23, 24, 27, 33,34,35,36, 38, 40, 41]. In addition to HY levels and ‘off times’ per day, two studies integrated the natural progression that occurs because of the lack of disease-modifying treatments in PD, despite adequate symptom reduction with medication [25, 26]. Other models were based on surgical complications [37]. In addition, two publications modelled changes in treatment caused by discontinuation of treatment during the course of PD [29, 30]. Furthermore, one microsimulation modelled ‘off times’ per day by estimating parameter distributions [39].

The transition probabilities for disease progression and treatment efficacy were taken from clinical trials (Table 2). Progression reported in the trials differed by target population. Treatment efficacy was either realised through improvements in ‘off times’ per day [22, 23, 25, 26, 31, 32, 39, 40], through a combination of improvements in HY stages and ‘off times’ per day [24, 27, 33,34,35,36, 38, 41], or through improvements in UPDRS scores [28, 30]. In addition to disease progression, effects on treatment adherence [40], treatment discontinuation [32, 36, 39] and the incidence of surgical complications [37] were also considered. Efficacy (if stated) was assumed to last between 3 months and 5 years [22,23,24,25,26, 29,30,31,32,33,34,35,36, 38, 39, 41].

To reflect increased mortality due to PD, either PD-specific data from clinical trials [28, 29, 34, 35, 41] or country-, age- and/or sex-specific data from mortality tables were weighted with appropriate risk rates [23,24,25,26,27, 30, 32, 33, 36, 38,39,40]. The probability to die within 1 year based on PD-specific data ranged from 0.004 to 0.225 [42,43,44], with severity-specific values reported once [43]. The relative risk of death compared to the general population ranged from 1.18 to 4.99 [45,46,47,48,49,50,51,52], again considering disease severity [47, 48]. One study did not include a PD-specific mortality risk [31]. Two other studies did not report transition probabilities for mortality [22, 37].

3.3 Utilities

All studies were cost-utility studies and thus included QALYs as the outcome (Table 2). Three US studies [53,54,55] were used to determine utilities (standard gamble) for different ‘off times’ per day [22, 27,28,29, 31, 32, 34, 38, 39]. In addition, Swedish data (EQ-5D index) [35] were integrated into models based on HY stages and ‘off times’ per day [23, 27, 28, 34, 35, 38, 39, 41]. The data quality was significantly reduced by the older publication date of the studies [53,54,55] and small sample sizes [35, 54, 55], with data for later stages of the disease not reported on in any study. One study failed to elaborate on the methodology used to determine QALYs [53]. Several authors calculated the EQ-5D index based on HY stages and ‘off-times’ per day using UK tariffs [24, 33, 36]. In four studies, PD questionnaire (PDQ-39) data were used to predict the EQ-5D index because data on country-specific utilities were missing [25, 26, 30, 37]. Treatment-related improvements in QALYs [25, 26, 30, 39], reductions in QALYs because of adverse events [23, 25, 26, 32, 37, 39, 41] as well as caregiver disutilities [24, 33] were also considered.

A total of eight studies reported QALYs without referring to the country of the health system under study [23, 27, 28, 34, 35, 38, 39, 41]. In two studies, the country for which QALYs were calculated could not be determined [37, 40]. Clinical effectiveness were integrated as life-years gained [24, 28, 29, 31, 32, 35, 36, 39], UPDRS [26] and PDQ-39 [25] points gained, as well as avoided ‘off time’ [39] (Table 1).

3.4 Costs

The majority of studies were based on average health states costs to which intervention costs were added [22,23,24,25,26, 29, 31,32,33, 35,36,37, 39, 40] (Table 2). Only six studies did not rely on health states but included cost categories expected to be influenced by treatment [27, 28, 30, 34, 38, 41].

Intervention costs included drug costs [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] and costs related to adverse events [23,24,25,26,27, 33,34,35,36,37,38,39, 41]. In the case of LCIG and apomorphine, costs of equipment and supplies were added [24, 33, 35, 36, 41], while in the case of surgical interventions, cost of surgery and battery replacement, if appropriate, was considered [25,26,27,28, 30, 34, 37, 38, 41]. If not included in the average health state costs, costs of follow-up visits [24, 27, 28, 30, 33, 35, 36, 38], nursing/care [27, 28, 30, 38] and falls [27, 30, 38] were integrated separately. Indirect costs due to productivity losses were considered by two studies [28, 31].

Reimbursement data were used in all studies to determine costs, especially intervention costs [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. In addition, data from randomised controlled trials for drug costs [22, 25,26,27, 30] and adverse event costs [23, 25,26,27, 34, 38, 39] were used. Occasionally, medical costs were based on hospital reimbursement data [28], survey data [22] and results of meta-analyses [38]. Average costs of health states were taken from cost-of-illness studies [25, 26, 29, 31], reimbursement data [23, 24, 33, 35, 36] and a Dutch guideline for PD [40]. Three authors stated that they relied partially on expert opinion to calculate costs [22, 24, 33].

Most studies measured costs from a payer’s [22,23,24,25,26,27, 30,31,32,33,34,35,36, 38, 39] or healthcare provider’s perspective [29, 40, 41] (Table 1). Three studies indicated that they were conducted from a societal perspective [28, 31, 37].

3.5 Assumptions

Assumptions about model structure were reported in five studies [22, 23, 27, 30, 40], with two studies providing only very limited information [22, 40]. In more than half of the studies, assumptions about parameter estimates were reported [23, 24, 27, 30, 33,34,35,36, 38,39,40,41], and again with some studies reporting them only briefly [22, 34, 40, 41]. Overall, justifications for assumptions were often missing [22, 27, 33, 35, 41] or only partially reported [23, 24, 30, 38]. In addition, only two studies presented assumptions made in an aggregated form [23, 39]. Seven studies did not elaborate on assumptions [25, 26, 28, 29, 31, 32, 37].

3.6 Uncertainty and Validation

Uncertainties of parameter estimates were assessed using one-way sensitivity analyses [22,23,24,25,26,27, 29, 30, 32,33,34,35,36,37,38, 40, 41], scenario analyses [26, 27, 32, 33, 36, 39] and probabilistic sensitivity analyses [23, 24, 28,29,30, 32,33,34, 40, 41] (Table 1). For this purpose, parameter ranges reported in the literature were primarily used [25,26,27, 29, 30, 36] or variations between 10% and 50% were assumed. These were supplemented by scenarios that considered the research question and the intervention under study [26, 27, 32, 33, 36, 39]. In probabilistic sensitivity analyses, the underlying distributions were often not explicitly named [23, 32, 33, 41]. When specified, costs were assumed to be gamma distributed [24, 29, 30, 40], transition probabilities were assumed to be beta distributed [24, 29, 30, 34, 40] and utilities were assumed to be normally [30] or beta distributed [24, 29, 40]. The distribution parameters were not specified.

The majority of the results depended on the parameter estimates based on treatment efficacy and effects [22,23,24,25,26, 30, 32, 33, 35,36,37, 40, 41], intervention cost [22, 25, 27, 30, 33, 34, 37, 38], healthcare utilisation costs [24, 33, 36] and utility values used to assess health status [27, 29, 34, 38, 40] (Table 1). Furthermore, model assumptions in the discount rate [24, 26, 33, 41] and the time horizon [30, 37, 38] led to major variations in the results. For validation parameters, estimates [24,25,26,27,28,29,30, 34, 38, 40] and results [22, 24, 26, 32, 39, 41] were discussed in relation to the published literature (Table 1).

3.7 Quality Assessment

Overall, a mean score of 42.1 points (out of 56 points; 75%) on the checklist of the German Scientific Working Group (range between 32 and 52 points; Table 3) was achieved in the quality assessment of the included studies. The introduction was often described in detail. However, some studies only briefly presented the comparative interventions and target population [23, 25,26,27,28, 34, 37, 38, 41]. In addition, some studies failed to consider a sufficiently long time horizon (>3 years) [31, 37]. The methodology used was mainly adequately selected and applied. Some studies showed deficiencies in the transparent presentation of parameter estimates [23, 25, 26, 28, 29, 34, 36, 37] and model assumptions [22, 24,25,26, 28, 29, 31,32,33,34,35,36,37, 40]. Furthermore, parameter estimates for utilities were often not country specific [23, 27, 28, 34, 35, 39, 41] and cost categories were not always presented [23, 25, 26, 28]. In addition, costs were not justified by the perspective used [23, 31, 37]. In some cases, currency adjustments [23, 28, 32, 35, 36] were not reported. All included studies presented results as (incremental) costs, effects and incremental cost-effectiveness ratios. However, information on the effectiveness was missing for some studies [37] or results were presented superficially [28, 29, 31,32,33, 36, 37, 39,40,41]. Uncertainty was addressed by deterministic, scenario-based and probabilistic sensitivity analyses, although results were not always used for interpretation [23, 24, 29, 32, 35, 36, 39]. Likewise, the majority of studies failed to interpret results practically and give implications for the respective health system [22, 23, 25,26,27,28,29,30,31,32,33,34, 36, 40] as well as to validate the model structure [22, 23, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] and assumptions [22, 23, 28, 31,32,33, 35, 37, 39,40,41].

Table 3 Quality assessment using the checklist of the German Scientific Working Group

3.8 Funding

Fifteen studies were funded by manufacturing companies providing the treatment assessed [22, 24, 27,28,29,30,31,32,33, 35,36,37, 39,40,41]. Two studies received public funding [23, 26] and three studies were not funded [25, 34, 38].

4 Discussion

The literature on model-based economic evaluations of interventions for PD since 2010 [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] comprised 20 articles. Most target populations comprised patients with advanced PD, with a mean age of ≥60 years, and mainly intermediate disease stages (HY II–IV) [22,23,24, 26,27,28, 32,33,34,35,36,37,38,39,40]. All but three studies [23, 31, 33] considered the evaluated interventions to (might) be cost effective. None of the studies clearly rejected the intervention under evaluation. Sensitivity analyses showed that in particular the estimates of the treatment efficacy [22,23,24,25,26, 30, 32, 33, 35,36,37, 40, 41], cost [22, 24, 25, 27, 30, 33, 34, 36,37,38] and effectiveness/utility parameters [27, 29, 34, 38, 40], as well as model assumptions on the time horizon [30, 37, 38] and discount rate [24, 26, 33, 41] had a major impact on the cost-effectiveness results.

Numerous reviews have addressed the evidence on economic evaluations of PD treatment to inform decision makers about the economic efficiency of PD interventions [1,2,3,4,5,6,7,8,9,10]. However, the methodological aspects of modelling approaches were only addressed by three systematic reviews, including n = 19 studies published before 2010 [17,18,19]. Compared with the earlier literature, the quality of the model design had improved so that models were capable of comparing more than two treatment options [29, 31, 39, 41] or were able to simulate treatment changes [29, 30]. Furthermore, improvements had been made in modelling PD symptoms and motor complications by adequately combining these two factors to model treatment efficacy [23,24,25,26,27, 29, 33,34,35,36,37,38,39,40,41], as well as in using a PD-specific mortality [23,24,25,26,27,28,29,30, 32,33,34,35,36, 38,39,40,41]. In addition, the natural progression of the disease had been considered in two studies [25, 26]. Thus, the efficacy of interventions had been assessed by improvements or delayed worsening of PD severity [23, 24, 27, 28, 30, 33,34,35,36, 38, 41] and motor complications [22,23,24,25,26,27, 30,31,32,33,34,35,36, 38,39,40,41]. Furthermore, some models included adverse events [23,24,25,26,27, 30, 32,33,34,35, 37,38,39, 41] or treatment discontinuation/adherence [32, 36, 38,39,40]. However, the consequences of motor symptoms (such as falls) [27, 30, 38] or caregivers’ and patients’ disutilities due to PD/adverse events [23,24,25,26, 33, 39, 41] were only occasionally considered. As most of the modelling mapped the course of PD on the HY scale, courses based on more differentiated scales, such as the UPDRS, remain unconsidered. However, the integration of more differentiated symptom scales (compared to the HY scale) into Markov models is often challenging because the health states must be independent of each other, which is more difficult to achieve with more complex scales. Nevertheless, a more differentiated view of the course of disease would be desirable. Limitations also remain in the estimation of parameter values and in the lack of sensitivity analyses and validations conducted, which will be discussed in the following sections. Recommendations for future modelling can be found in Table 4.

Table 4 Recommendations for future models

4.1 Parameter Estimates

Sensitivity analyses revealed a large impact of efficacy, cost and effectiveness parameters on the results, making it necessary to account for uncertainties and assumptions around these parameter estimates. Efficacies of interventions were estimated using empirical data with follow-up periods that ranged from 3 months to 5 years, so that influences on PD symptoms and motor complications could be adequately addressed. However, effects were often assumed to be maintained over the entire time horizon (10–25 years), although the literature has reported reduced efficacy of treatments over the course of PD, which in turn lead to treatment changes [11]. Thus, it may be unrealistic to assume efficacy far beyond the time horizon of empirical data, and at least a decline in efficacy should be modelled when considering long time horizons.

Effectiveness measures included utilities (QALYs) [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] and life-years [24, 28, 29, 31, 32, 35, 36, 39], complemented by clinical effect measures (UPDRS [26] and PDQ-39 [25] points gained, as well as avoided ‘off hours’ [39]). As a QALY is a generic measure and results could be compared in different disease areas, all authors draw conclusions based on QALYs as an effect measure. Unfortunately, only limited data exist on health-related quality of life of patients with PD to calculate QALYs. Currently, only five studies were used as the basis for utility values in all included studies [35, 53, 54, 56, 57], four of which were published before 2005 [53, 54, 56, 57]. Data on severe PD were completely lacking. Thus, country-specific utilities could only be used in nine studies [22, 24,25,26, 29,30,31,32,33] and results are likely to be influenced by the lack of data on country-specific utility values. Country-specific utility data are therefore highly desirable for future studies. However, until data on health-related quality of life are available, authors should describe in detail the methods used to calculate effectiveness parameters, and uncertainties in parameter estimates should be adequately addressed in sensitivity analyses and validations. Moreover, results should be critically discussed.

In contrast, cost parameters were based on country-specific data. However, not all of the studies described the methods for calculating costs in sufficient detail (e.g. the cost categories included and their monetary valuation [23, 25, 26, 28, 32, 35, 36] and some failed to justify the cost categories based on the cost perspective chosen [23, 31, 37]). In particular, only adverse events related to the administration of LCIG, apomorphine, or DBS [23,24,25,26,27, 33,34,35,36,37,38,39, 41] and falls [27, 30, 38] were considered as cost categories, and the models did not consider other consequences of motor symptoms or non-motor complications (e.g. depression or sleep disorders). Moreover, indirect costs from a societal perspective were only considered by two studies [28, 31]. A detailed methodological description of parameter estimation and a more frequently applied societal perspective for cost parameters is therefore strongly recommended for the future.

4.2 Sensitivity Analyses and Validations

Compared with previous systematic reviews [17,18,19], the current evidence included more comprehensive sensitivity analyses. The previous literature included deterministic sensitivity analyses, but only a few studies carried out scenario-based or probabilistic sensitivity analyses. In contrast, more recent studies supplemented deterministic sensitivity analyses with scenario analyses [26, 27, 32, 33, 36, 39] and probabilistic sensitivity analyses [23, 24, 28,29,30, 32,33,34, 40, 41] to account for uncertainty around point parameter estimates. The parameter ranges used were based on the literature [25,26,27, 29, 30, 36] or on percentage variations around the estimated values [22, 24,25,26, 32,33,34,35,36,37,38, 40, 41]. In particular, the values varied widely, between 10 and 50%, without justification, for example by considering the distribution of parameter estimates or external validated values from the literature [22, 24,25,26, 32,33,34,35,36,37,38, 40, 41]. It would therefore be useful to describe the methods of sensitivity analyses transparently and justify chosen parameter ranges, as well as specify distributions and distribution parameters determined for probabilistic sensitivity analyses.

Unfortunately, the results of sensitivity analyses were rarely used to interpret results, thus for eight studies (all manufacturer funded), conclusions on cost effectiveness should be considered with caution [23, 29, 32, 33, 35,36,37, 41]. In general, the interpretation of probabilistic sensitivity is less frequently incorporated into conclusions and misinterpreted. For example, a probability for the cost effectiveness of 70% is often misinterpreted as cost effective for the intervention under study. Thereby, the authors neglect that the counter probability indicates that on average 30% of the cases in the comparison group are expected to be cost effective. In such a case, the results can only be interpreted as a tendency towards the cost effectiveness of the studied intervention, and this therefore requires precise description within conclusions. The cost effectiveness of the evaluated intervention is therefore only (mathematically) certain if the probability of cost effectiveness exceeds a high threshold (e.g. 95% assuming a commonly used α-level of 0.05).

In addition to internal validation of parameter estimates through sensitivity analyses, parameter estimates [24,25,26,27,28,29,30, 34, 38, 40] and results [22, 24, 26, 32, 39, 41] were also externally validated against values reported in the literature, although not all parameter estimates were always explicitly reported [23, 25, 26, 28, 29, 34, 36, 37]. However, internal and external validations of model structures [24,25,26] and assumptions [23, 27, 30, 38, 39, 41] were rarely reported. It would therefore be desirable to validate the model structure and assumptions like parameter estimates in future research.

4.3 Conflicts of Interest

Because of the methodological weaknesses described, compliance with the recommendations on good modelling practice is extremely important [16, 20]. Only if the development of the model is independent of conflicts of interest, and methods and results are presented transparently and discussed critically, can correct and meaningful interpretation of results and conclusions be achieved. Most of the included studies were funded by the manufacturing companies, so that a financial conflict of interest and possible influence on results cannot be excluded [22, 24, 27,28,29,30,31,32,33, 35,36,37, 39,40,41]. A Canadian study was part of a health technology assessment and independently reviewed by the Canadian Agency for Drugs and Technologies in Health Common Drug Reviews, where the results of the review did not confirm those of the authors [23]. Only three studies were conducted financially independently [25, 34, 38]. We therefore recommend that authors and reviewers check the (methodological) quality of the economic evaluations using published checklists [16, 20, 21] prior to publication.

4.4 Strengths and Limitations

This study examines the methodological quality of the modelling approaches used in economic evaluations of interventions for PD published between 2010 and 2022. Unlike previous reviews that assessed economic evaluations based on empirical data and focused on cost-effectiveness results and their socioeconomic consequences [1,2,3,4,5,6,7,8,9,10], the current study provided methodological recommendations to improve future modelling by systematically assessing the quality of the published literature. For this purpose, a German checklist of the German Scientific Working Group from 1999 was used [21]. Unfortunately, international checklists to assess the quality of economic evaluations, such as CHEERS [16] or the checklist by Drummond et al. [20], do not explicitly focus on aspects on modelling quality and therefore could not be used in the current study. Accordingly, an international checklist focusing on modelling for health economic evaluations is needed. Furthermore, the current review mainly focused on Markov models, which were used by most of the included studies. Thus, other modelling approaches, such as microsimulation or discrete event simulation, were not addressed in this study. However, innovative new methodological approaches might overcome some limitations of Markov models (e.g. the consideration of discrete time intervals, the Markov assumption of independence in health states over time or the integration of treatment changes) and should therefore be used more frequently to model cost effectiveness in the future. Finally, it should be noted that two of the included studies were published by the first author of this review [25, 26]. Although the evaluation was independently cross-checked by the other co-authors, this may have led to unintended bias.

5 Conclusions

Overall, compared with the previous literature up to 2010 [17,18,19], the quality of models for economic evaluations of interventions in PD and the transparency of their presentation has improved over the last decade. Caution was taken in cost and transition parameter estimation, as well as in consideration of uncertainty via probabilistic sensitivity analyses. However, most of the limitations mentioned by previous systematic reviews are still not adequately addressed. Future research should therefore strive to integrate the consequences of motor symptoms (e.g. comorbidities/falls, discontinuation of or changes in treatment) and non-motor complications (e.g. depression, sleep disorders, dementia). Furthermore, parameter estimates should be presented transparently and, if possible, adopted to the societal perspective. The uncertainty of parameter estimates should be assessed by sensitivity analyses and be considered in the interpretation of results. Moreover, validations of the model design and assumptions are recommended.