Erratum to: Using Parameter Constraints to Choose State Structures in Cost-Effectiveness Modelling

The article Using Parameter Constraints to Choose State Structures in Cost-Effectiveness Modelling, written by Howard Thom, Chris Jackson, Nicky Welton, Linda Sharples, was originally published electronically on the publisher’s internet portal (currently SpringerLink) on 24th March, 2017 with open access under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/bync/4.0/). However, the license should have been the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The original article has been updated.

Merging two states in a transition model, such as similar types of event, is practically equivalent to constraining the outward transition probabilities, costs and utilities to be equal for the two states. Thus, the state structures can be compared by assessing whether these constraints are reasonable. This can be done using standard methods for comparing statistical models, and suitable data.
For example, comparing transition probabilities requires data consisting of the numbers of patients observed to transition out of the states of interest to each potential destination. To compare costs and utilities between states, individual-level samples are required. Maximum likelihood and Akaike's information criterion can then be used to assess the constraints. If such data are not available, they might be derived from published summaries, or the comparison can be made informally.

Introduction
Health economic evaluations rely on cost-effectiveness models, such as Markov multi-state models [1], to produce accurate comparative assessments of the costs and health effects of different interventions for the management of disease. Given a cost-effectiveness model, there may be uncertainty about the correct transition probabilities, costs or utilities. This is commonly termed parameter uncertainty and managed by probabilistic sensitivity analysis [2,3]. Research recommendations can also be guided by the expected value of perfect information (EVPI) and expected value of partial perfect information, comparing the benefits in terms of costs and monetised health effects gained from a decision based on evidence, where parameter uncertainty is removed or reduced, with that based on current evidence [4].
However, all models are idealised representations and the choice of structure for the model may be uncertain. Moreover, different choices can change decision recommendations, as found in models for breast cancer and in varicella vaccination [5,6]. In this article, we consider uncertainty about the choice of states in a state-transition health economic model, a subject which has, to our knowledge, not yet been formally addressed. An example in coronary artery disease (CAD) is the choice between a model with split and merged severities of CAD, illustrated in Fig. 1. The split-state model divides the 'CAD' state into 'high-risk CAD' and 'low-risk CAD', as severity may have an effect on costs, health utilities and the probability of death. These structural choices are currently made informally, on the basis of clinical opinion and the availability of data [7]. Guidelines recommend scenario analyses or parameterising structural uncertainties [2,8,9], but it is often unclear how they can be parameterised.
Formal statistical approaches for comparing model structures against the data used to build them include the Akaike information criterion (AIC) [10,11] and, for Bayesian models, the deviance information criterion [12,13]. These trade off the fit to the data, represented by the likelihood, with the complexity, related to the effective number of parameters, of the models. Thus, they can find the optimal balance between the risk of bias (from excluding important events or predictors) and the reduced uncertainty in the estimates from a smaller model. These criteria can also be used to construct weighted averages over the possible structures [11,14]. However, these methods are only valid for models fit to the same datasets and it has been shown that multi-state models with different state structures use different datasets [15]. Another approach is to split the model into a series of sub-functions and add discrepancy parameters to the outputs of these functions to represent state structure uncertainty [16]. However, the discrepancies do not indicate which assumptions are more plausible and can be difficult to interpret for complex models. A further approach is to compare the ability of the models to predict the events represented by both models [15,17], in the above CAD example, this would be CAD and death. However, calculating the appropriate measure of fit for the restricted information criteria described in this article is technically demanding. State structures might also be compared by informal validation against external data if available.
We propose a method that allows the choice between state structures to be parameterised, and for which standard likelihood-based model selection criteria are valid. This enables us to compare structures under the principle that similar states may be merged if the consequences of occupying them are the same. Here, the 'consequences' for a patient consist of the potential exit states, the probabilities of transition to these exit states, the costs and the utilities. We show that smaller models can be reformulated into practically equivalent models on the larger state space by constraining the outward transition probabilities, costs and utilities to be the same for the 'merged' states. The model choice is then a matter of assessing whether each of these constraints is reasonable, based on the fit to data. We also consider using 'partially merged' models with different state structures for transition probabilities, costs and utilities, depending on the most appropriate choice for each consequence; for example, we may assume the costs in high-risk CAD (c H ) to be the same as those in low-risk CAD (c L ) but that the transition probabilities to the dead state (P HD and P LD ) are different. We illustrate our Dead (D) Fig. 1 Coronary artery disease (CAD) models with split and merged CAD severity. P XY is the probability of making a transition from state X to state Y in a cycle approach in models comparing treatment strategies for the management of depression, and diagnostic tests for CAD.

Methods for Comparing State Structures by Assessing Parameter Constraints
Suppose we have data consisting of the numbers of individuals who are observed to make a transition between each pair of states over a particular time interval, and corresponding denominators of the total number of patients at risk. The models are fitted to these data by maximum likelihood or Bayesian estimation, giving estimates of transition probabilities between states over one cycle of a discrete-time model [18]. If such data are not explicitly available, they might be derived from related data (such as published relative risks of death) under weak assumptions, and we discuss an example in Sect. 4. Costs and utilities for states are estimated from samples of individual-level costs and utilities, or from published unit costs combined with assumptions, expert beliefs or data on individual resource use.

Merging Two States with One Common Exit State
Consider again the split-and merged-state models for CAD presented in Fig. 1. It is intuitive that if we impose the constraint: the fitted models should give the same predictions of expected survival. We prove this formally in Appendix 1 in ESM by showing that the likelihood of the split-state model, subject to the above constraint, is proportional to that of the merged-state model, with a proportionality factor that is independent of P D . Thus, the estimate of P D , and thus the expected survival over any time horizon, will be identical under both the constrained split-and mergedstate models. This also applies to Bayesian estimation if the prior on P D is the same in the merged and constrained models, and to 'reversible' models where the transition back from high to low is permitted because the probability of death P D would remain independent of the disease state. Furthermore, if we also constrain the costs and utilities of the states to be equal, as c H ¼ c L ¼ c and u H ¼ u L ¼ u, the models will give the same predictions of lifetime costs and quality-adjusted survival, and hence the same decision recommendations. Thus, the uncertainty regarding state structure has been parameterised, as a choice of whether these three constraints are reasonable. If all three are supported by the data, the merged model can be used because it is equivalent to the constrained split model. If all constraints are invalid, then the fully split model is most appropriate. A 'partially merged' model can also be recommended, for example, if the transition probabilities but not the costs are found to be equivalent.

Merging Any Number of States with Any Number of Exit States
The principle and procedure outlined above apply to models in which the states to be merged have any number of 'exit states', for example, different causes of death, provided the exit states are common to the states to be merged. Figure 2 illustrates two models; one splits states A and B while the second merges these states. The exit states, E 1 ; . . .; E m , are the same for states A and B. To make the split-state model equivalent to the merged state-model, we use the constraints . . .; m, where P AE i and P BE i are the probability of transiting to state E i from A and B, respectively, and constrain the costs and utilities as before, c A ¼ c B ¼ c and u A ¼ u B ¼ u. Thus, the model choice involves determining, for each i, whether the probability of death from cause i, the cost and the utility depends on the disease status being A or B. A further generalisation is illustrated in Fig. 3. In this case, we consider merging n states A 1 ; . . .; A n with transitions to m states E 1 ; . . .; E m . The necessary constraints are The model can be fully reversible and any transitions can be allowed between the merging states A j . The The states A j may represent severities of disease, and the E i different causes of death, but this result is entirely general to problems of whether to split or combine a set of states A 1 ; . . .; A n for which the potential destination states after leaving the set are the same for each i ¼ 1; . . .; n. The 'split' and 'merged' models shown in Fig. 3 may both be part of a common larger state structure, for example, there may also be transitions into the A j , or into and out of the E i . However, only the constraints (2) on the outward transition probabilities, costs and utilities are required to effectively 'merge' the states. Thus, the choice of structures is parameterised as a choice of whether the outward transition probabilities, costs and utilities are common between A 1 ; . . .; A n .

Merging States with Different Exit Transitions
An adaptation is required when the states being merged have different exit states, as illustrated by models (a) and (b) in Fig. 4. This is a special case of the structure in Fig. 1, where we know that the probabilities of death are different (P 13 ¼ 0 and P 23 6 ¼ 0) between the states being considered for merging. In discrete time, there is no choice of parameters for which model (a) is equivalent to (b) as a patient in state 2 may exit directly to state 3, but even with P 12 ¼ 1, a patient starting in state 1 would take at least two cycles to reach state 3.
However, we can extend model (a) by including a nonzero transition between states 1 and 3 (P 13 Fig. 4. This model can be constrained to model (a) by setting P 13 ¼ 0 or to model (b) by setting P 13 ¼ P 23 . A comparison between models (a) and (b) is then possible by assessing these constraints on model (c).

Application to a Markov Model with Individual Patient Data: PANDA
In this section, we present an application to a health economic model for patients with symptoms of depression for whom their general practitioner is considering prescribing anti-depressant medication. The model was used to compare the cost effectiveness of severity thresholds above which to treat patients with depression with anti-depressant medication, and to estimate the value of a proposed randomised controlled trial to compare severity thresholds. The severity of symptoms was measured on the Hamilton Depression Rating (HAMD) scale, and three alternative treatment thresholds (HAMD [ 2, HAMD [ 15 or HAMD [ 25) are compared with a policy of no treatment.

Model for Cost Effectiveness of Anti-Depressant Treatment by Depression Severity
The model consists of a short-and a long-term component.
The short-term model uses linear regression based on published studies [19][20][21] to predict a patient's HAMD score over the first 12 weeks after treatment initiation. Psychiatric Association [22]. This four-state model is illustrated in Fig. 5. The joint likelihood of the observed data is the product of the probabilities of making the transitions we observed, along with terms for the likelihoods of observed costs and health valuations of observed state occupancies. The transition probabilities are estimated by maximum likelihood from the numbers of individuals observed to move between each pair of states in merged data from the control arms of the IPCRESS, THREshold for AntiDepressant response (THREAD) and TREAD studies [23][24][25]. Log-normal distributions were used for state costs. These depended on dosing and monitoring regimes inferred from expert clinical opinion and publicly available drug and services costs [26,27]. As clinical evidence and opinion was that anti-depressant medications have no effect on transition probabilities beyond the initial 12-week period [28], we used the same probabilities between the categories of depression severity in the treated and untreated components. However, the distributions of HAMD at 12 weeks will differ between treated and untreated patients, as will their costs. Owing to a lack of reliable evidence, state utilities were not modelled directly. We instead mapped incremental gains in HAMD, defined as the difference between the mid-points of the category range, to incremental health utilities using published evidence [29][30][31].

Alternative Model Structures and Results
The transition probabilities between the four states are informed only by the individual transition history data, and there is no prior clinical belief regarding, for example, how the transition probability to well differs between mild, moderate and severe. Therefore, it is possible that, for these data, a more parsimonious structure that merges two or more of these states could give more precise estimates of cost effectiveness. Thus, we consider a 'two-state' model merging all depression states, a 'Mod-Severe' model merging the moderate and severe states, and a 'Mild-Mod' model merging the mild and moderate states. Merged IPCRESS, THREAD and TREAD data were re-analysed to estimate these transition probabilities for each structure. Costs for merged health states are estimated as weighted averages of their constituent costs, with weights defined by the baseline prevalence of the four depression states. The same prevalence was assumed for each cycle as the available prevalence estimate was representative of an average distribution over time. Utilities were mapped from incremental gains in HAMD. These models with 'fully merged' states ignore any prior clinical belief that costs or utilities are different between the states (Table 3). Finally, we consider 'partially merged' models, where outward transition probabilities across states are assumed to be equal but costs and HAMD, and therefore utilities, associated with the states are assumed to be different. The HAMD[2 threshold was the most cost effective at a willingness-to-pay threshold of £20,000 for all but the twostate model, where ''no treatment at any HAMD threshold'' was most cost effective ( Table 1). The lower cycle costs for mild depression (£110 treated, £49 untreated) than for depression of any severity in the two-state model (£186 treated, £149 untreated) explain the substantial difference Well (1)

Moderate
(3) Mild (2) Severe (4) Well (1) Moderate or Severe (3|4) Mild (2) Well (1) Mild, Moderate or Severe (2|3|4) Well (1) Mild or Moderate (2|3) Severe ( Table 2). The EVPI results indicate a short-term trial with a 12-week follow-up is cost effective under all models, though the absolute EVPI estimates vary from approximately £70 to £95 million between the models. A long-term trial, with 2 years of follow-up to better inform the Markov model components, is not likely to be cost effective under any of the models except the two-state model. However, when costs and utilities differ but the outward transition probabilities are merged, the decision and research recommendations are the same across all models (Table 1).

Comparison of State Structures Using Constraints
We compare models by constraining parameters in the full (four-state) model to produce models that are equivalent to those with two or three states. We label the four health states as 1 (well), 2 (mild), 3 (moderate) and 4 (severe). The multi-state models being compared are illustrated in Fig. 5.
The four-state model is equivalent to the two-state model if the 'recovery rates' are constrained to be independent of depression severity, thus P 21 ¼ P 31 ¼ P 41 , and if the costs and HAMD/utilities of the mild, moderate and severe states are assumed to be equal to those of the single depressed state in the two-state model. To constrain the four-state model to be equivalent to the three-state 'Mod-Severe' model, we constrain the recovery rates to 'well' and the rates to 'mild' to be the same, P 31 ¼ P 41 and P 32 ¼ P 42 , respectively, and constrain the costs and HAMD/utilities of the states to be equal. Likewise, the four-state model is constrained to the three-state 'Mild-Mod' model by constraining the recovery rates to 'well' and the progression rates to 'severe', P 21 ¼ P 31 , P 24 ¼ P 34 , along with the costs and HAMD/utilities for the mild and moderate states. Other transition probabilities, such as the probabilities of relapse (P 12 , P 13 ; P 14 ), are unaffected by the constraints.
Each constraint is assessed by comparing the likelihood and AIC contributions, describing how well the resulting model fits when estimated using corresponding observed transitions between states. Full details of this method are given in Appendix 3 in ESM. The log-likelihood and AIC for each potential constraint are given in Table 2. An example code to conduct the comparisons in the R statistical software [32] is presented in Appendix 6 in ESM. Under the unconstrained four-state model, the estimated recovery rates to well are substantially different for a patient with severe depression, thus P 41 6 ¼ P 31 and P 41 6 ¼ P 21 . This is shown formally by the lower AIC for P 21 6 ¼ P 31 6 ¼ P 41 compared with the constraints where P 31 ¼ P 41 or P 21 ¼ P 41 . However, the recovery rates are similar between mild and moderate, thus the AIC is not changed substantially when moving between P 21 ¼ P 31 and P 21 6 ¼ P 31 . The differences between P 32 and P 42 and between P 24 and P 34 under the four-state model are less striking. This is confirmed by the small difference in AIC between P 32 ¼ P 42 and P 32 6 ¼ P 42 , and between P 24 ¼ P 34 and P 24 6 ¼ P 34 . Thus, on the basis of transition probabilities, there is a negligible difference between the three-state Mild-Mod and four-state models, and these are both preferred over the two-and three-state Mod-Severe models, as expected.
Second, we compare the costs informally because these were based on expert belief. The treated costs are the same, though the untreated costs are slightly different, between mild and moderate. Thus, there is some evidence that a model with unconstrained costs is more appropriate. The costs for severe depression are substantially different from mild and moderate depression, arguing against the twostate and 'Mod-Severe' models. Prior judgement deemed that utilities are primarily determined by severity of depression, which broadly favours models that have finer classifications of HAMD.
Based on the chosen model, 'treat if HAMD [ 2' is the optimal strategy. Because the model extrapolates beyond HAMD-D scores included in trials, we conclude that antidepressant medications are cost effective over the range of HAMD scores included in the trials. We also conclude that there is likely to be value in a short-term trial that recruits patients with milder disease (lower HAMD scores); however, a long-term follow-up is not likely to be cost effective.

Application to a Model Informed by Published Parameters: CECaT
In this application, there are no individual-level data. Instead, the transition probabilities out of the states being considered for merging are obtained from published estimates. To formally compare the state structures, we have to derive the implicit transition counts underlying the published data. The Cost-Effectiveness of non-invasive Cardiac Testing (CECaT) study [33] was a randomised trial of diagnostic strategies for CAD, comparing angiography alone with three non-invasive functional tests (followed by confirmatory angiography if positive). Following the trial, a Markov multi-state health economic model was developed, based on previous models by Mowatt et al. [34,35] and Kuntz et al. [36]. The full structure and assumptions are detailed by Thom [17]. Briefly, a patient with suspected CAD receives one of five alternative diagnostic test strategies and is assigned a diagnosed severity, as a result of which they may receive either medical management or revascularisation. The diagnosed severity may be incorrect because the tests are not perfect and vary in their sensitivity and specificity. The model then proceeds with an annual cycle for 30 years, and at each cycle, a patient may have a myocardial infarction and/or die from any cause.
In these models, CAD severity is categorised into discrete states, representing the increasing risk of myocardial infarction and death, and the increasing need for revascularisation. Mowatt et al. [34,35] used three risk states: low (no CAD), medium (CAD in one or two vessels excluding the left main stem) or high (CAD in three or more vessels and poor left-ventricular function, or disease in the left main stem). We compare the three-state categorisation with a model where medium-and high-risk states are merged,  giving two states representing no CAD or CAD. While the three-state representation is typically used in the literature, it relies on having sufficient information about the differences between medium and high risk to justify separating them. Under the two models, the optimal diagnostic strategy at conventional cost-effectiveness thresholds, and extent of decision uncertainty, are different [17]. Table 3 shows published data used in the full model. The risk of death relative to no CAD differs (significantly) between the two risk groups, but the probability of nonfatal myocardial infarction, the costs and the utilities are similar between the medium-and high-risk groups. The 95% confidence intervals for the state-specific relative risks of death do not overlap, suggesting that they are different enough to merit separation in the model. For a more formal comparison, we derive the implicit data from which these relative risks were obtained: the numbers of people dying in 1 year, and associated denominators, for medium and high risk. Appendix 4 in ESM details how this is done. The problem can then be framed as a comparison of two statistical models for a pair of binomially distributed observations (126 deaths out of 571 in medium risk, and 259 out of 754 in high risk): one model with different probabilities of death, and one where the death probability is constrained to be the same, between medium and high risk. These models have AICs of 17.4 and 39.6, respectively, strongly favouring separate risk states (Table 3).
A similar analysis is performed for the risk of non-fatal myocardial infarction, which has overlapping confidence intervals between medium and high risk, though this does not necessarily imply a non-significant difference. An AIC difference of -0.6, however, mildly favours a common risk between the 'medium' and 'high' states.
The costs and utilities used for the medium-and high-risk states in the economic model (excluding the costs of revascularisation) were estimated from the subset of patients in the CECaT trial whose CAD severity was known. With only 19 of these patients in high risk and 59 in medium risk, it is not clear from the data in Table 3 whether we can assume that expected cost and utility are different between medium and high risk. To assess this formally, generalised linear regression models were fitted to the individual-level cost and utility outcomes by maximum likelihood in R [32], using a gamma distribution for the costs, and a truncated normal distribution for the utilities. The AIC marginally favours a model with different mean costs (AIC difference 1.4) and a model with common mean utility (AIC difference -1.4) between medium and high risk.
Thus, in this model, separating medium-and high-risk states is strongly justified based on their different mortality rates. Though within this structure, there is some evidence that constraining the myocardial infarction rates and utilities to be common between the states will lead to a better tradeoff between model fit and model complexity, or bias and precision. Appendix 7 in ESM provides an example R code for all the likelihood and AIC calculations in this example.

Discussion
Currently, state structure choices are made informally, based upon clinical opinion or availability of data, or compared through simple scenario analyses [2,[7][8][9]. In this article, we have developed a formal statistical basis to compare state structures in cost-effectiveness models. Specifically, two or more similar states in a transition model can be merged if they have the same consequences for a patient who enters them. The models are then compared by assessing a constraint on these consequences using standard statistical methods, if the parameters are Table 3 Published and derived data on parameters of the CECaT model of coronary artery disease, and AIC difference assessing the constraint that the corresponding parameters are equal between medium-and high-risk states (positive AIC difference favours different parameters) . We also showed this method to be valuable even if state structure uncertainties do not affect the current treatment decision as the value of further research, quantified by the EVPI, expected value of partial perfect information or the expected value of sample information, may be sensitive to structural choices [14]. Statistical methods to assess the equality of model parameters require that the data used to estimate those parameters are available, to form the likelihood. For transition probabilities, the number of individuals who are observed to move between each pair of states in a time period, and denominators are required. The Prescribing ANtiDepressants that will leAd to a clinical benefit (PANDA) study used randomised controlled trials but our methods apply to any source, including registries or cohort studies, which provide the data necessary to estimate transition probabilities. We recommend using the data to choose the appropriate state structure before building the full model. Individual patient data were not available for the CECaT model. W recreated the numerators and denominators by assuming that the split between risk groups was the same across randomised arms of the trial, which should approximately hold if randomisation was adequate. To aid such calculations, we recommend that data of this form are published routinely.
Constraints for state selection can also be applied to continuous time multi-state models, which have been advocated for use in health economic modelling [37,38], as we show in Appendix 5 in ESM, and to the selection of structures for patient-level simulation and heterogeneity models through the inclusion of covariates on the transition probabilities and comparing their effects between states. The principle should also extend to non-Markov multi-state models but this needs to be investigated. In a Bayesian model comparison, expert belief can be used by placing prior probabilities on parameters or model structures and combining with data via Bayes theorem.
Conversely, our method deals only with comparing multistate structures. Further research into formal statistical methods for other forms of structural uncertainty is required. The choice between continuous and discrete outcome models is difficult. A multi-state model for changes in disease severity is essentially a continuous outcome model, where ranges of the outcome are constrained to have equivalent costs, utilities and future disease progression. However, there is no routinely applicable method to constrain a multi-level regression model, for example, to be equivalent to a multi-state model. Consideration is also required for more complex models, such as dynamic transmission models in infectious diseases [6,39].

Conclusion
We have developed a formal method to parameterise state structure uncertainty using constraints on the parameters of the most complex model and have illustrated its wide applicability through examples in depression and CAD. Further research is required for structural uncertainty in non-multi-state cost-effectiveness models. Conflict of interest Howard Thom, Chris Jackson, Nicky Welton and Linda Sharples declare they have no financial or non-financial conflicts of interest that are directly relevant to the content of this article.
Data availability statement The datasets used for model comparison during the current study are included in this published article (within Table 3 for the heart disease example and in the code provided in Appendix 6 in ESM for the depression example). Further code and data, to generate the cost-effectiveness results, are available from the corresponding author on reasonable request.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.