FormalPara Key Points

Two pharmacoeconomic Markov models are developed to test the cost–utility of genomic screening of CYP2C19 and CYP2D6 for patients suffering from major depressive disorder (MDD).

The results of pharmacoeconomic models are used in a post-hoc analysis to understand the main drivers of three alternative cost-effectiveness levels (“poor,” “standard,” and “high”).

Relevant suggestions are put forward concerning the most important factors influencing the effectiveness and/or cost-effectiveness of genetic testing for patients diagnosed with MMD.

1 Introduction

Major depressive disorder (MDD) is a common and severe psychiatric disorder that has enormous economical and societal costs [1]. These are mainly due to its relatively high prevalence in the general population (point prevalence of about 4.7%) [2], the high disability associated with the disorder [3], and the elevated risk of recurrence after a first depressive episode [4]. It is also plausible that these epidemiological figures are underestimated given that theoretical models of the natural history of depression suggest much higher lifetime prevalence (33.2% for females and 20.5% for males) [5]. Importantly, the costs of MDD vary substantially depending on the type of clinical course [6, 7] and response to treatment [8, 9]. For instance, Ivanova et al. [8] showed that direct and indirect healthcare costs were statistically significantly higher in employees affected by MDD with treatment-resistant depression compared with those responsive to therapies. Furthermore, the data synthesis by Mauskopf et al. [9] showed that annual healthcare costs and productivity losses were significantly lower for patients with MDD who were remitters or partial responders compared with nonresponders to antidepressants. In this context, there has been much interest in developing strategies to implement precision psychiatry and to decrease the burden of MDD using, for instance, methods to predict treatment response and the risk of manifesting adverse reactions or developing treatment resistance.

Pharmacogenetics is one of the key tools of precision psychiatry, and consists in the study of how genetic differences among individuals affect drug response or drug-induced adverse effects. Indeed, genetic variants have been suggested to explain part of the large interindividual variability observed in response to antidepressants and in the onset of adverse effects, with most robust evidence available for genes involved in the metabolism of these drugs such as genes encoding cytochrome P450 (CYP) enzymes [10]. Most antidepressants are metabolized by the enzymes CYP2D6 and CYP2C19, which are highly polymorphic, with 148 and 36 alleles or variants described, respectively [11]. CYP2D6 and CYP2C19 genetic variants define phenotypic groups characterized by different levels of activity of the encoded enzyme: poor metabolizers (PM), intermediate metabolizers (IM), normal or extensive metabolizers (EM), and ultra-metabolizers (UM) [12]. The PM and the IM phenotypes are associated with absent or reduced ability to produce a functional enzyme, leading to reduced drug clearance or reduced ability to convert a prodrug into active metabolites. Conversely, the UM phenotype is associated with accelerated drug metabolism, which can lead to faster drug clearance or to faster formation of drug metabolites [13].

CYP2D6 and CYP2C19 PM, IM, or UM may show antidepressant plasma concentrations outside the recommended therapeutic range, thus increasing the risk of nonresponse or reduced tolerability [14]. The Clinical Pharmacogenetics Implementation Consortium (CPIC) has provided dosage recommendation for tricyclic antidepressants (TCA) and selective serotonin reuptake inhibitor (SSRI) antidepressants on the basis of CYP2D6 and CYP2C19 phenotypes [12, 14]. However, pharmacogenetic testing is not systematically adopted in the clinical setting. Barriers to the clinical implementation of pharmacogenetics in psychiatry include, among others, the need to demonstrate their long-term sustainability through cost–benefit and cost-effectiveness analyses. To date, several studies have explored the cost-effectiveness and clinical utility of pharmacogenetic testing in psychiatry. In most cases, these studies reported findings from clinical trials implementing commercially available pharmacogenetic decision-support tools [15]. These tools often include recommendations based on the variants located in several genes, some of which lack clear indications on their clinical utility based on the international guidelines. Moreover, sensitivity or specificity of these tests is not clear, and therefore their clinical utility has been even more difficult to evaluate. Thus, more independent studies are needed to define how a pharmacogenetic test (PGx) in psychiatry should be designed to be cost-effective and clinically useful.

The aim of this paper is to define two pharmacoeconomic Markov models to test the cost–utility of genetic screening of CYP2C19 and CYP2D6 for patients with MDD. Moreover, we implement a post-hoc analysis to understand the main drivers of three cost-effectiveness groups, termed “poor cost-effectiveness,” “standard cost effectiveness,” and “high cost-effectiveness,” and to predict the cost-effectiveness group of a patient on the basis of a restricted set of the input parameters used in the original pharmacoeconomic model.

2 Methods

2.1 Model Structure

We specify two 18-week nonhomogeneous Markov models with a 1-week cycle length, one for the genomic testing of CYP2C19 and another for the genetic testing of CYP2D6. The Heemod package implemented in the R software for statistical computing [16, 17] is used for this purpose. For each CYP, the reference population are two hypothetical cohorts of 2000 adult patients diagnosed with MDD. For each of the two models, 1000 patients are subjected to a genetic test (PGx) to determine their metabolizer phenotype in order to identify an appropriate treatment for them. In contrast, the other 1000 patients in the control group are not tested and are treated under the standard guidelines (NoPGx). The analysis is performed from the Italian societal perspective, meaning that we consider direct costs, such as treatment cost, and indirect costs, such as productivity loss due to illness.

The model (Fig. 1), informed by the clinical expertise of psychiatrists, works as follows. A patient, diagnosed with MDD, is either tested or not for their metabolizer phenotype. The patient then enters one of the four arms of the model corresponding to the four types of metabolizer: PM, IM, EM, and UM. Consequently, a patient is treated according to their phenotype. The latter is known from the genetic test, or under the standard guidelines if the patient is included in the control group. If the phenotype of a patient is known, the treatment is expected to be more beneficial, and patients suffer from fewer side effects, having greater chances of a positive response.

Fig. 1
figure 1

The proposed nonhomogeneous Markov model. PM poor metabolizer, IM intermediate metabolizer, EM extensive metabolizer, UM ultra-metabolizer

After getting tested (or not, depending on the cohort), patients immediately start the treatments. After 2 weeks, they are then examined by a clinician to check if any side effect has emerged. If there are no side effects, patients continue the same treatment and move into the state “wait” until the sixth week of the trial, when they are observed again to assess their response. Patients experiencing side effects have two possibilities: (a) a change in the treatment (change in dose or different drug). They move in the state “switch or titrations” and are observed again after 2 weeks, when they are evaluated for side effects. As a consequence, they reenter the first loop, mimicking a trial-and-error system: this can be referred to as the “side effect/no side effect loop” in which every time there is a change in therapy the patients are evaluated again after 2 weeks for side effects; (b) a “wait and see” strategy, meaning there are no changes in the treatment course and patients move into the state “wait” until the sixth week, when they are evaluated again. In the following cycles, patients move between states, as described in Fig. 1. To clarify, we observe some patients who, given the first evaluation, end up in the “wait” state, and others who are evaluated negatively for side effects and thus move into the “wait” state or the “switch or titrations” state, respectively. Patients in the “switch or titrations” state are again evaluated for side effects after 2 weeks. Finally, after the sixth week, patients in the “wait” state are observed by a clinician to check for a positive or negative response to the treatment. The patients with a positive response continue the treatment until the last 18th cycle (this is defined as an “absorbing state”). On the other hand, patients who have not experienced a positive response from the treatment either attempt suicide and are later evaluated for side effects, entering the “side effect/no side effect loop,” or have a change in treatment, moving into the “switch/titration” state to successively move again among the Markov states in the “side effect/no side effect loop.” We also consider another group of patients: those who never leave the “side effect/no side effect loop” because they always experience side effects. These patients have a risk of hospitalization, and in that case they next move again into the “side effect/no side effect loop.” This scheme encompasses 6 weeks and is repeated two more times, until the 18-week treatment course is over. Thus, every 6 weeks, patients in the trial are evaluated for a positive effect. The choice of the 18-week period was made after consultation with clinical psychiatrists.

2.2 Model Input

Model parameters are searched for in literature and, in case of missing data, are derived from some assumptions we made about their distribution. All the parameters used in the model, their confidence intervals, their distribution, and the references are reported in Table 1 for the model specified for CYP2D6 cytochrome, hereafter CYP2D6 model, and in Table 2 for the model specified for CYP2C19 cytochrome, hereafter CYP2C19 model.

2.2.1 Probabilities

For the CYP2D6 model, patients are divided into four metabolizer classes, PM, EM, IM, or UM, on the basis of a previous prevalence study [18]. For the NoPGx strategy, probabilities to show any side effect for PM, EM, or UM are retrieved from Shams et al. [19], while the probability to show any side effects for the IM class is assumed to be the mean point estimate between the EM and PM probabilities. Probabilities to show a positive response given the treatment are instead retrieved from Rush et al. [20] for the EM and PM groups. For the UM class, these probabilities are assumed from Sluiter et al. [21], and similarly to the side-effect probabilities, the IM probabilities to show a response are estimated as the mean between the EM and PM probabilities. In the PGx strategy, PM and IM classes are assumed to be treated with a pharmacogenetic-guided treatment strategy and therefore to show side effects equal to the EM class. In contrast, no changes are assumed for the UM class compared with the NoPGx strategy. However, the probabilities to have a positive effect due to the treatment are assumed to be equal to the NoPGx strategy.

Table 1 CYP2D6 model parameters

For the CYP2C19 model, whose parameters are reported in Table 2, patients are divided into four metabolizer classes, PM, EM, IM, or UM, on the basis of the meta-analysis of Fabbri et al. [22]. For the NoPGx strategy, probabilities to show any side effect for PM, IM, and UM are computed using as reference the probability of a side effect for EM retrieved from Shams et al. [19] and the resulting odds ratios found in [22]. Similarly, probabilities to show a positive response for the PM, IM, and UM classes, given the treatment, are computed from the odds ratio found in Fabbri et al. [22] using as reference the remission rate for EM available in Rush et al. [20].

Table 2 CYP2C19 model parameters

As in the case of the CYP2D6 model, for the PGx strategy concerning CYP2C19, the PM and IM classes are assumed to show side effects equal to the EM class. In contrast, there are no changes for the UM class. However, probabilities to have a positive effect due to the treatments for EM and UM are assumed to be equal to those considered in the NoPGx strategy, while we assume there is an increase in the rate of remission for the PM and IM classes. Moreover, probabilities of switching, titration, waiting, and suicide are equal for all the metabolizer classes, all the strategies, and both cytochrome models. These probabilities are derived from Annemans et al. [23], while the probability to be hospitalized is retrieved from clinical practice.

2.2.2 Costs and Utilities

Costs and utilities (Table 3) do not differ between the two models. They are attached to each health state, and as this analysis is taken from a societal perspective, we consider direct and indirect costs. Hence, we have the cost of genetic testing, medication costs, costs for psychiatrist visits, hospitalization costs, costs due to suicide attempts, and, finally, productivity losses. All these costs are equal for both strategies, with the exception of the genetic testing that is conducted only in the PGx strategy. Some of the costs (medication, psychiatric visit) are specific to the geographical area (Sardinia, Italy) where the developed models could be implemented in future evaluations of the cost-effectiveness of PGX testing in psychiatry using data from a real-world setting. Costs are expressed in euros at the 2021 prices.

Table 3 Parameters used in the cytochrome P450 (CYP) 2D6 and CYP2C19 models

Psychiatric visits costs are attached whenever patients are clinically evaluated and, if patients suffer from side effects costs due to productivity losses, are added to the health state also. These costs also occur when there is no improvement observed in the patients’ health. As stated before, patients who keep showing side effects can be hospitalized, so costs of hospitalization are attached in this case. Likewise, when a suicide attempt takes place, the resulting costs are assigned. Medication costs, however, are equal and constant for all the health states of the model.

As for costs, utilities are equal for each model. In our case, quality-adjusted life years (QALY) is used to evaluate effectiveness of the two strategies. QALY is a utility that measures disease burden. It is measured on a scale from 0 (representing death) to 1 (representing a year of perfect health). We assign to each Markov state in the model a utility that reflects the quality of life of living with that health state. These values are retrieved from previous literature [24, 25]

2.3 Research method

We consider the analysis of a base-case scenario for both CYP2D6 and CYP2C19 models with the values of the point estimates reported in Table 2. The main output of the analysis is the incremental cost-effectiveness ratio (ICER) between the two strategies.

We also perform, for each model, a one-way deterministic sensitivity analysis (DSA) to check the uncertainty of each parameter. In our case, the extreme values are the 95% confidence interval (CI) bounds of the distribution of DSA. If CI is not available, values positioned at \(\pm \,25\%\) with respect to the arithmetic mean are used. Results are reported in a tornado diagram (see Fig. 2 for an example), which shows the ICER’s range due to the parameter uncertainty. Moreover, to assess the uncertainty surrounding the point estimates, as we do for the DSA, we perform for each model a probabilistic sensitivity analysis (PSA) with 1000 Monte Carlo replications. As suggested by Briggs et al. [26], we use a gamma distribution for costs and a beta distribution for utilities and transition probabilities. To determine the parameters of the distribution, we use the R package “sjstats” [27]. This package allows us to find the parameter of the distribution that we use in the PSA starting from the lower and upper bounds of the CI. We show the results of the PSA in a cost-effectiveness plane (CE plane), and on the basis of these results, we derive the cost-effectiveness acceptability (CEA) curves.

Fig. 2
figure 2

Deterministic sensitivity analysis results. DSA deterministic sensitivity analysis, Prob. probability, PM poor metabolizer, IM intermediate metabolizer, EM extensive metabolizer, UM ultra-metabolizer, PGx pharmacogenetically screened cohort, NoPgx standard guidelines cohort, QALY quality-adjusted life year, ICER incremental cost-effectiveness ratio, WTP willingness to pay, CYP cytochrome P450

The Markov models specified for both CYP2D6 and CYP2C19 cytochromes and the consequent sensitivity analyses are not able, per se, to explain the different drivers that could characterize different levels of cost-effectiveness obtained from different combinations of the input parameters of the Markov model.

To this purpose, we consider a post-hoc analysis aimed at finding the main drivers that allows us to distinguish among a “poor,” a “standard,” or a “high” cost-effectiveness level. The three levels are determined observing the relationships between the incremental effect (QALY) and the incremental cost of each possible scenario obtained from the sensitivity analysis. The main drivers of cost-effectiveness are determined through canonical correlation analysis (CCA), as implemented in R package “candisc” [28]. The main feature of CCA is that it allows the user to reduce the number of observed variables summarizing their information into latent factors accounting for cost-effectiveness [29]. The output of CCA is represented in a coordinate system that describes the maximum cross-covariance between three subsets of data, because forcing latent factors associated with three groups of cost-effectiveness levels to be correlated with each other should result in latent factors globally correlated with the overall cost-effectiveness, from three different perspectives.

At the same time, to enforce the intuition that the three cost-effectiveness levels represent scenarios that vary consistently from one to another, we assess whether the three cost-effectiveness levels are correctly classified by the whole set of Markov model input parameters using linear discriminant analysis (LDA) as implemented in the R package “MASS” [30]. LDA is aimed at finding a linear combination of features that characterizes or separates two or more classes of objects. The resulting combination, or discriminant function, may be used as a linear classifier or for dimensionality reduction before later classification [31].

Finally, we consider the possibility to predict the cost-effectiveness level on the basis of a restricted set of available information, i.e., a subset of the input parameters of the Markov model, through a machine learning approach. This task is accomplished considering the support vector machine (SVM) classifier implemented in the R package “e1071” [32]. SVM projects nonlinear separable groups of observations onto a higher-dimensional space by using different types of kernel function. The latter are next used to predict the most probable group for a new case [33].

3 Results

3.1 Base Case Scenario

Results of the base-case scenarios are presented in Table 4.

For the CYP2C19 model, PGx is shown to have a stronger effect in terms of utility gained with a difference of 0.0020 QALY, and a higher cost per patient of 120.22€ with respect to the NoPGx strategy. These results lead to an ICER of 60,094.17€ per QALY gained, which is below the willingness to pay (WTP) threshold of 75,000€. This WTP is recovered from the World Health Organization (WHO) cost-effectiveness guidelines [34, 35], which suggests that the WTP threshold should be up to three times the gross domestic product (GDP) per capita of a developed country such as in the case of our study (Italy). Moreover, this is line with the most recent research on the topic [21].

For the CYP2D6 model, PGx has a stronger effect in terms of utility gained with a difference of 0.0022 QALY, and a higher cost per patient of 104€ with respect to the NoPGx strategy. These results lead to an ICER of 46,908.24€ per QALY.

Table 4 Base-case scenario results

3.2 Sensitivity Analysis

3.2.1 Deterministic Sensitivity Analysis

The one-way DSA results are shown in the tornado diagram (Fig. 2). As the range of the parameter moves from the lower to the higher bound, the ICER changes. The dashed line refers to the WTP threshold of 75,000€. There are many interesting considerations that we can derive from the DSA. For both models, we find that, for the higher bound of productivity loss cost, the ICER becomes negative. Thus, the PGx strategy becomes cost-saving. We also find that, if the screening does not improve the probability to have side effects or to have a positive response if the patient is an IM, ICER becomes much higher than any reasonable WTP threshold. The same happens if the probability to show side effects for IM in the NoPGx cohort is low and if the probability to have an improvement is high, but we have an extremely high ICER if the probability of side effects for IM in the NoPGx cohort is in the lower bound, and if the cost of the test is too high, for the CYP2C19 model only. All other values of the parameters, however, confirm the robustness of the models. Moreover, for both models, these findings indicate that the IM class is the one the drives this cost-effectiveness analysis, so these values require the most attention in future clinical research.

3.2.2 Probabilistic Sensitivity Analysis

The PSA results are presented in Table 5. For the CYP2C19 model, the ICER computed with the 1000 Monte Carlo replications is remarkably similar to that characterizing the base-case scenario, as the PSA provides an ICER of 64,042.73€ per QALY. The same result holds for the CYP2D6 model, which provides an ICER of 44,377.39€ per QALY.

Table 5 Probabilistic sensitivity analysis results

Some uncertainty is observed from the results shown in Fig. 3. Each point in the cost-effectiveness plane represents a simulated scenario; the red line is the 75,000€ WTP threshold for QALY gained, and it divides the plane between the replications that are cost-effective and those that are not. We see that there is a lot of variability with respect to both incremental costs and incremental utility for both models. Moreover, Fig. 4 shows that, in the Monte Carlo simulation for the CYP2C19 model, the ICER is below the WTP threshold of the 75,000€ per QALY in almost 58% of replications. This means that, with our assumptions, screening for the CYP2C19 enzyme is going to be a favorable intervention around 58% of times. For the CYP2D6 model, the replications below the WTP threshold are slightly more than 63% of the total number of Monte Carlo replications.

Fig. 3
figure 3

Probabilistic sensitivity analysis: cost-effectiveness plane. PGx pharmacogenetically screened cohort, NoPgx standard guidelines cohort, QALY quality-adjusted life year, WTP willingness to pay, CYP cytochrome P450

Fig. 4
figure 4

Probabilistic sensitivity analysis: acceptability curve. PGx pharmacogenetically screened cohort, NoPgx standard guidelines cohort, QALY quality-adjusted life year

3.3 Post-hoc analysis

The results of the PSA shown in Fig. 3 suggest that for both models we can distinguish among three different cost-effectiveness levels. These are identified considering the following threshold values of incremental QALY and the ratio between incremental cost and incremental QALY that lead us to define the variable “type of cost-effectiveness” (TCE):

  • a \(\rightarrow\) “Poor” cost-effectiveness: \(\frac{\text { Incremental Cost }}{\text { Incremental QALY }} \le 0\) with incremental \({\text {QALY}}<0\)

  • b \(\rightarrow\) “Standard” cost-effectiveness: \(\frac{\text { Incremental Cost }}{ \text{ Incremental } \text{ QALY } } > 75{,}000\,\) €

  • c \(\rightarrow\) “High” cost-effectiveness: \(\frac{\text { Incremental Cost }}{\text { Incremental QALY }} > 0\) with \(\frac{\text {Incremental Cost}}{\text {Incremental QALY}} \le 75{,}000\,\) € and incremental \({\text {QALY}} < 0\)

The first goal of the post-hoc analysis is to understand the main drivers that allow us to discriminate among the TCE’s groups “a,” “b,” and “c.” To this purpose, we use CCA to effectively project the data, including all the variables used in the PSA, but not the incremental cost and the incremental QALY, into a subspace of linear combinations of the response classes “a,” “b,” and “c” of TCE that accounts for the greatest proportion of the between-group variance relative to within-group variance. In this case, with three groups, this subspace is two-dimensional and is represented in Fig. 5 for the CYP2D6 model. Figure 5 shows that 97.5% of the between-group mean differences are accounted for by the first canonical dimension (Can1) that clearly discriminates poor cost-effectiveness from high cost-effectiveness, with the standard cost-effectiveness group in a barycentric position. Both canonical dimensions Can1 and Can2 are significant, by a likelihood ratio step-down test (\(p < 0.0001\) for Can1 and p value \(<0.001\) for Can2). Can1 and Can2 are represented in Fig. 5 in terms of the scores obtained by the observations for each dimension, with ellipses overlaying 60% of the observations for each group. The positions of the group mean indicated by a “+” sign show the mean of each group on the canonical dimensions. The relationship between the observed variables and the canonical dimensions is shown by vectors, each one measuring the correlation \(\rho\) has with Can1 and Can2.

Fig. 5
figure 5

Canonical discriminant factors for cytochrome P450 (CYP) 2D6: the “poor” cost-effectiveness group is labeled “a,” the “standard” group “b,” and the “high” group “c”. PGx pharmacogenetically screened cohort, NoPgx standard guidelines cohort, Pr_SE probabilities of side effects, Pr_Impr probability of having an improvement (effect), Pr_SWT probability of switch/titration, PM poor metabolizer, IM intermediate metabolizer, EM extensive metabolizer, NoIm Ut utility for no effect

The variables that are negatively correlated with Can1 are: cost of test \(({\text {CostTest}}: \rho =-0.21, p=0.00)\); probabilities of side effects for IM patients who had a genetic test \(({\text {Pr}}\_{\text {SE}}\_{\text {IMGx}}: \rho =-0.51, p=0.00)\) and probability of having effects if IM \(({\text {Pr}}\_{\text {Impr}}\_{\text {IM}}\_{\text {NoPGx}}: \rho =-0.54, p=0.00)\). Those that are positively correlated with Can1 are: probability of having an improvement if IM \(({\text {Pr}}\_{\text {Impr}}\_{\text {IM}}: \rho =0.44, p=0.00)\) and probability of side effects if PM \(({\text {Pr}}\_{\text {SE}}\_{\text {PM}}\_{\text {NoPGx}}\_val: \rho =0.39, p=0.00)\). Thus, Can1 can be seen as a latent measure of cost-effectiveness of the proposed therapeutic processes represented in Fig. 1. High costs of genetic tests and the presence of (side) effects for the IM patients are the main drivers of “poor” cost-effectiveness (TCE, group “a”), whereas improvements associated with PM status or even the presence of side effects for EM patients are reasonably associated with “high” cost effectiveness (TCE, group “b”).

The variables that are negatively correlated with Can2 are: hospitalization utility \(({\text {Hosp}}\_{\text {Ut}}: \rho =-0.23, p=0.00)\); probability of being IM or EM \({\text {(Pr}}\_{\text {IM}}: \rho =-0.26, p=0.00, {\text {Pr}}\_{\text {EM}}: \rho =-0.27, p=0.00)\); probability of switch/titration \((Pr\_SWT: \rho =-0.32, p=0.00)\); probabilities of side effects for IM patients who had a genetic test \(({\text {Pr}}\_{\text {SE}}\_{\text {PMPGx}}: \rho =-0.29, p=0.00)\); probability of having an improvement if EM \(({\text {Pr}}\_{\text {Impr}}\_{\text {EM}}: \rho =-0.36, p=0.00)\); and probability of side effects if IM \(({\text {Pr}}\_{\text {SE}}\_{\text {IM}}\_{\text {NoPGx}}\_{\text {val}}: \rho =-0.20, p=0.00)\). All these variables mainly characterize intermediate states (switches, side effects, improvement, etc.) common to the three cost-effectiveness groups that cause patients to enter a new therapeutic path, whereas variables that are positively correlated with Can2 are: cost of test \(({\text {CostTest}}: \rho =0.56, p=0.00)\); hospitalization percentage \((\% {\text {Hosp:}} \rho =0.25, p=0.00)\); utility for no effect \(({\text {NoIm Ut:}} \rho =0.18, p=0.00)\); probability of side effects for PM patients who had a genetic test \(({\text {Pr}}\_{\text {SE}}\_{\text {PMPGx}}\_{\text {val:}} \rho =0.15, p=0.00)\); and probability of being EM \(({\text {Pr}}\_{\text {EM:}} \rho =0.29, p=0.00)\). These variables are associated with cases that mainly concern standard or high cost-effectiveness (groups b or c of TCE). Thus, Can2 can be seen as a latent measure of cost-effectiveness of the more outlying cases that can lead to effective (positive values of Can2) or ineffective (negative values of Can2) therapeutic processes.

The subspace obtained for the CYP2D19 model is very similar to that represented in Fig. 5 (data not shown); thus, to save space, it is not shown and the related results are not reported.

Next, we consider if the results obtained with CCA can be utilized to predict the cost-effectiveness group. More precisely, we consider LDA performed on the scores obtained for the variables used in CCA to determine Can1 and Can2. In this case, the emphasis is on classification rather than on finding latent discriminant variables. Moreover, LDA allows specification of the prior probabilities of group membership to make classification error rates comparable to what would be obtained in a population of interest. LDA with the two canonical variables Can1 and Can2 is first applied on the whole dataset of 1000 instances to evaluate how accurate this model is. For the CYP2D6 model, LDA is able to correctly classify 93.3% of cases with an average precision of 85.0% and an average recall of 94.0%. Thus, as expected, the canonical variables are able to correctly classify almost all cases. This means that, for almost each possible scenario considered in the PSA, it is possible to predict the most appropriate cost-effectiveness level. To illustrate the discrimination ability of LDA, the decision boundaries obtained from the two discriminant functions LD1 and LD2 concerning the CYP2D6 model are represented in Fig. 6.

Fig. 6
figure 6

Decision boundary obtained by linear discriminant factors for CYP2D6: the “poor” cost-effectiveness group is in the red area (circular points), the “standard” group in the green area (triangular points), and the “high” group in the blue area (squared points). LD1 and LD2 are the two linear discriminant functions. CYP cytochrome P450

Finally, although LDA has very good predictive capabilities, its real use to classify a possible cost-effectiveness scenario is unrealistic as the method shown above (LDA based on the canonical variables obtained from CCA) requires the knowledge, and thus the specification, of all the parameters used in the PSA. Taking this important limitation into account, we check for a predictive model that can work well with a reduced set of input variables. From this perspective, considering that the required model would be uniquely used for prediction and not in an exploratory/descriptive manner, we consider support vector machines (SVM) as a possible classifier as it is well known as one of the most robust prediction methods. A nonlinear SVM with a radial kernel function and constraints parameter (budget and gamma parameters) estimated with cross-validation is used on all the cases derived from PSA and for both the CYP2D6 and CYP2D19 models. However, the input variables are only some of the probabilities associated with the PM, IM, and EM states. More precisely, we consider as input variables the following: the probability of being PM, IM, or EM; the probabilities of side effects for the three above-mentioned states for patients who experienced the genetic test as well as for those who were not exposed to the test; and the probability of improvement for the same groups of patients. The probabilities associated with the UM state were intentionally left outside the set of input variables as they are hard to quantify in real practice. In this case, prediction capabilities of the SVM classifier are evaluated in terms of precision and recall for each group, as well as in terms of average precision and average recall of the classifier, as computed in Beleites et al. [36]. These metrics are estimated for both models using independent test set observations identified through a 5 × 5-fold cross-validation setting.

Results for both models are summarized in Table 6. With the exception of the prediction of class “a” of TCE (poor cost-effectiveness), the reported findings should be considered satisfactory as all the metrics are higher than 0.80. Thus, we can conclude that, given a set of hypothetical values for the probabilities considered as inputs of the SVM classifier, it is possible to predict with reasonable accuracy the cost-effectiveness of the genetic test and the associated therapeutic pattern.

Table 6 Performance metrics for the support vector machine classifiers

4 Discussion

The aim of this study was to present a cost–utility analysis for pharmacogenetic testing based on CYP2C19 and CYP2D6 enzyme genetics, to develop personalized treatment for patients with MDD. Furthermore, we aimed to understand, via post-hoc analysis, the driving factors leading to three different cost-effectiveness levels and the prediction of those levels based on a restricted set of input variables.

The two cytochromes were tested in two separate models. Indeed, while most antidepressants are metabolized by both cytochromes, usually one of the two has a major role, thus being more impactful on the kinetics and, presumably, the efficacy and safety of the drug. This is also reflected in the recommendations for the implementation of genetic testing to inform the use of a specific antidepressant (as reported in drug labels and in the international guidelines), which in most cases is referred to either as CYP2D6 or CYP2C19. As such, the genotyping of both CYP genes might be not necessary in most cases, and could unjustifiably inflate the cost and reduce the cost-effectiveness of its implementation.

There are several limitations characterizing the modeling approach used in this cost–utility analysis. Basically, all the limitations of Markov models in pharmacoeconomics as recently underlined by Carta and Conversano [37] affect our study as well. We had to assume many values that turned out to be the most important such as the probability of side effects and the probability of having an effect for the IM class of the PGx strategy. Moreover, the structure of the Markov model is built on the basis of the opinion and expertise of clinicians, not clinical trials. In addition, this study is not based on a real-world population but on a theoretical cohort of adults; thus, further studies based on patient-level data from a real-world setting are needed to confirm our findings as well as evaluate potential differences based on different treatments regimes with antidepressants metabolized by CYP2D6, CYP2DC19, or both enzymes. In addition, while we used the frequency of the metabolizing phenotypes estimated by CPIC for the European population, differences in the frequency of genetic variants among different populations might modify our results. Even considering these limitations, our study suggests that the genetically informed characterization of CYP metabolizing profiles could be cost-effective. Should this be confirmed by the application of the developed model to real-world data, it would support the implementation of such a test in the routine clinical setting. Of course, our study was not aimed at exploring the clinical utility of pharmacogenetic testing, which would require detailed, real-world data on the use, effectiveness, and safety of antidepressants in order to evaluate key features contributing to the clinical utility, such as the evidence of clinical validity, sensitivity, specificity, and reliability, among others [15]. Notably, a recent systematic review assessing the cost-effectiveness of PGx testing for the prescription of antidepressants and antipsychotics showed that, of the 18 studies evaluated, 16 supported the utility and cost-effectiveness of PGx testing, especially for tests based on CYP2D6 or CYP2C19 [38]. However, the number of studies using real-world data remains limited, and this is particularly relevant considering the high differences in frequency of the different metabolizing phenotypes among populations or subgroups, and of the heterogeneity of the health systems across different countries, which significantly impact the cost–benefit ratio.

4.1 Conclusion

Findings from our study suggest that the screening of either CYP2C19 or CYP2D6 genes to guide antidepressant treatment is cost-effective for a WTP threshold of 75,000€ per QALY. In the CYP2C19 model, results of the base-case scenario showed that the PGx strategy might be considered cost-effective with an ICER of 60,000€ per QALY with respect to the standard strategy of no genetic testing. On the other hand, in the CYP2D6 model, results of the base-case scenario showed that genetic screening for the CYP2D6 enzyme is cost-effective with an ICER of around 47,000€ per QALY. The one-way DSA showed that, in both models, for extreme values of the IM class and for the different costs of the test, the ICER could shift from cost-effective to unacceptable for any reasonable WTP threshold. These results show that the IM class is the major driver in this economic evaluation. Although the EM group is more numerous compared with the IM one, the EM class is not affected by genetic testing since the EM patients have the same probability of showing any side effect in both strategies. On the other hand, IM patients in the PGx strategy have a large drop in side-effect occurrence compared with the not-tested IM patients. This happens also for the PM class, but as it has a low frequency compared with IM, the effect is negligible in this evaluation. The results of the post-hoc analysis clearly show the driving factors that allow us to discriminate among “poor,” “standard,” or “high” cost-effectiveness scenarios and demonstrate the usefulness of the SVM classifier in predicting these scenarios starting from a reduced set of input probabilities.

While this result needs to be replicated in a real-world setting, our study provides evidence that might help overcome one of the main barriers to the implementation of pharmacogenetic testing in the clinic: the cost of the approach. Routine use of pharmacogenetics also implies that more strategies will need to be identified and put into place to render the cost-effectiveness ratio even more efficient, such as testing only those individuals failing the first antidepressant treatment in order to guide the medication of second choice on the basis of the genetic profile of the patient.