1 Introduction

There is currently no guidance for determining the optimal schedule for collecting additional information regarding a decision to invest in a health program or technology [1, 2]. Current practice in the health decision science literature assumes that model parameters are fixed across cohorts and the value of additional information is calculated assuming the information-collection effort is initiated immediately [3,4,5]. However, in many cases the cost-effectiveness of a health program or technology – and, therefore, the value of additional information about one or more model parameters – may be changing over time because of trends affecting the cohort or the intervention [6]. In these cases, collecting additional information immediately may not be optimal and value-of-information calculations based on static parameter assumptions are likely to be biased. Planning over longer horizons is particularly important in health policy because, once established, clinical practice is difficult to change due to high switching costs (re-training and potentially new capital equipment expenditures), particularly if it appears that the level of service is being reduced [7].

In this paper we apply a stochastic dynamic programming approach to identify both the optimal time to change the current health intervention policy and the optimal time to collect decision-relevant information. We consider a hypothetical medical intervention for a cohort of patients. At each time, a new cohort of patients becomes eligible for the intervention and one parameter varies across the cohorts with imperfectly observable linear dynamics. We assume that the value of the intervention is linear in the dynamic parameter. In general, the (incremental) net monetary benefit of an intervention is linear in parameters with a one-time effect (e.g., the prevalence of a disease at one point in time or the outcome of a one-time screening test). When an effect accrues over time, such as for a reduction in the annual transition rate of a disease complication or death, linearity is often used as an approximation (see, e.g., [8]). At each time, the policy-maker can choose to invest in the medical intervention and/or to purchase sample information about the uncertain dynamic parameter. We demonstrate that information acquisition is best delayed until the signal is sufficiently likely to affect the optimal policy decision.

We apply this framework to the evaluation of hepatitis C virus (HCV) screening. Prior to the development of highly-effective treatments, HCV screening in the general population was not considered cost-effective [9] and universal screening was not recommended [10]. The advent of more effective therapy has changed the value of identifying infected individuals early to initiate treatment [11,12,13,14,15]. Recently released guidance by the Centers for Disease Control and Prevention (CDC) and the US Preventive Services Task Force (USPSTF) recommends one-time HCV screening for all individuals born between 1945 and 1965 [16, 17] although screening individuals born after 1965 may also be cost effective [13,14,15]. Based on our primary analysis of the National Health and Nutrition Examination Survey (NHANES), in the US general population, HCV prevalence is highest in people born around 1956 and declines thereafter at a rate of approximately 11% per birth year. Since HCV prevalence is decreasing across birth cohorts, HCV screening will only be cost-effective for a limited time or for a limited set of birth cohorts. We apply our model to simultaneously evaluate the optimal HCV-screening and information-acquisition policy.

Specifically, we apply our model to the policy decision of whether or not to perform one-time HCV screening in successive cohorts of healthy 50-year olds, who have not previously been tested for HCV, at a routine preventive health visit. Applying a traditional health economics framework, the policy-maker could decide today how many cohorts will be screened (e.g., each cohort of 50-year olds until those born in 1965 turn 50) or, to inform this decision, the policy-maker may seek additional information to be collected immediately. Our framework differs from the traditional paradigm in that each year the policy-maker makes a decision about whether to continue the one-time HCV screening program (whether or not to screen the new cohort of healthy 50-year olds) and whether to collect information about disease prevalence in this current cohort. If information is never collected, the optimal policy does not differ across frameworks. However, in our framework, the immediate decision is not limited to the decision of when to change policies, but it also includes when to collect information to inform a future change of policy. For example, the (immediately) optimal policy might be to screen each cohort of 50-year olds for the next 6 years and then collect information about HCV prevalence to inform future decision making. Delaying information acquisition until a time that the information is sufficiently likely to affect the decision increases the value of the information. In addition, from a practical perspective, collecting information years before it is likely to influence a policy change wastes immediate resources and, should something occur in the lag-time between the information-acquisition effort and the policy change, implementing the pre-determined policy change may not be optimal.

1.1 Related literature and contribution

The relevant literature spans technology adoption, dynamic decisions in healthcare, and the value of information in healthcare.

Technology adoption

In technology-adoption models, a decision-maker considers the adoption of a technology of unknown profitability. Jensen [18] introduced a model in which information about a new technology is costlessly observed and the decision-maker can decide to adopt the new technology at any point in time. McCardle [19] presented a model in which collecting information is associated with a fixed cost; in each period the decision-maker can defer and collect information, or make a final decision to accept or reject the new technology. The optimal policy in each period is characterized by two thresholds: if the expected benefit is above the upper threshold, it is optimal to adopt the technology; if the expected benefit is below the lower threshold, it is optimal to reject the technology; and, if the expected benefit is between these two thresholds the optimal strategy is to gather information. Uncertainty about the technology’s value decreases over time and the two thresholds converge to the cost of adoption. Smith and McCardle [20] provided several meta-results, some of which we use, describing how properties of the value function of a stochastic dynamic program are preserved and propagated through finite-horizon Markov-reward and decision processes. Ulu and Smith [21] extended this work by relaxing the assumption that the decision-maker’s value of the technology can be summarized by the expected benefit, and they use more general monotone-comparative-statics techniques in terms of likelihood orders to generalize the class of signals that are observed prior to making an adoption decision.

Another line of research considered technologies, like ours, with uncertain and changing value. Rosenberg [22] found that expectation of technological improvement may delay a firm’s irreversible technology investments. Bessen [23] calculated the option value of delay for such a problem. Kornish [24] considered the choice between two uncertain technologies where each is subject to a positive network effect and explored the impact of the network effect on the optimal adoption policy. Chambers and Kouvelis [25] formulated a technology-adoption problem incorporating expected learning-curve effects.

Stochastic dynamic programs in healthcare

Sequential decisions under uncertainty are common in healthcare [26, 27]. Most healthcare applications of stochastic dynamic programs have focused on optimizing the timing of interventions for an individual patient: the decision to accept or reject an offered kidney for transplantation [28]; the optimal treatment plan for mild spheroctosis [29]; the optimal surveillance and management of ischemic heart disease [30]; the optimal time to perform a living-donor liver transplant [31, 32]; the optimal time to initiate treatment for HIV [33, 34]; the optimal timing and frequency of HCV testing from the patient perspective [35]; the optimal use of statins in patients with type 2 diabetes [36, 37]; the optimal prostate biopsy referral [38]; and, optimal cancer screening programs [39, 40]. Dynamic programming has also been applied to complex appointment scheduling problems in healthcare, including problems with patients of different clinical types/priority [41, 42]; incorporating patient no-shows [43]; and problems of sequential appointment scheduling with the objective of closely adhering to a prescribed schedule (e.g., sequential chemotherapy appointments [44]) or with the objective of satisfying patient preferences [45, 46]. Fewer examples of application to population-level policy exist. Kornish and Keeney [47] and Özaltın et al. [48] formulated the influenza-strain selection problem in a finite-horizon optimal-stopping framework. Similar to our problem, the influenza-vaccine composition decision is also an optimal-stopping problem with information acquisition; however, it has many unique characteristics that distinguish it from the problem discussed here such as an inventory deadline (finite horizon), a product useful for one season only, and a time-consuming production process. Similar to many of the technology-adoption models discussed above but unlike our framework, in the influenza-vaccine composition models, information is collected in every period in which a final decision has not yet been made.

Health economics and value of information in healthcare

Cost-effectiveness analysis is an economic method for comparing the lifetime discounted costs and health benefits associated with two or more medical interventions or health programs [1, 2]. In theory, the optimal allocation of resources across a portfolio of health interventions is determined by solving a constrained optimization problem with the objective of maximizing health benefits subject to a budget constraint [49,50,51]. In reality, regional and national health policy bodies routinely compare the incremental cost effectiveness ratio of candidate interventions to a pre-determined threshold intended to approximate the shadow price of the budget to determine if the intervention is ‘cost-effective’ as one component of their policy-making process [52]. Cost-effectiveness analysis is widely used to evaluate general population screening for relatively rare conditions because these programs impose a small cost on everyone who is screened and provide substantive healthcare gains for only a small number of individuals who are identified (or identified earlier than they would be otherwise); calculating the population-level costs and benefits can require detailed natural history models, extensive model calibration and validation, and thorough analysis.

Bayesian decision theory approaches to value-of-information assessment were first introduced by Raiffa and Schlaifer [53]. Weinstein [54] proposed the widespread adoption of value-of-information analysis to research priority setting in health policy and medicine. Hornberger et al. [55], Claxton and Posnett [56], and Claxton [57] introduced a Bayesian approach to identifying the optimal trial sample size and to assessing the value of additional information for technology-adoption assessments. Several approaches to increasing the accuracy of value-of-information calculations continued to relax assumptions implicit in the original formulation (see examples in [58,59,60,61,62]). One common assumption in these studies is that the currently estimated per-person value of information can be applied to individuals in all future cohorts. Recognizing some of the implications of this assumption, Philips et al. [6] discussed the impact that intervention-horizon uncertainty, price changes, and technological development can have on the per-person value of information for future cohorts. They find that delaying information collection may be desirable but do not provide a framework for determining the optimal time to collect information.

Contribution

In this paper, we extend the technology-adoption literature by allowing for a technology that is changing in value over time, for the opportunity to ‘wait’ without collecting information, and for the possibility of optimally determining the collected amount of information in each period. We also incorporate the possibility of an imperfect information-collection technology. We broaden the scope of applications of stochastic dynamic programs in the area of healthcare in an important way – focusing on population policy rather than patient-level decisions. We extend the health decision science literature on value-of-information assessment by developing an approach to identify the optimal information-acquisition policy when model parameters are varying across cohorts. Finally, as an example, we apply our framework to the timely public policy problem of developing a population screening program for HCV. We find that considering the opportunity to collect information in the future leads to a substantially different policy recommendation than current guidelines because it explicitly considers and addresses the parameter uncertainty which is changing over time.

2 The model

A policy-maker faces recurring decisions for cohorts arriving at times t ∈{0,1,2,…} about whether to invest in a health intervention delivered once per cohort (of size N). By cohort we mean a group of individuals with a certain medical presentation (i.e., individuals with a new diagnosis of cancer) or of a certain status (i.e., individuals who turned 50 this year). The policy-maker’s objective is to maximize net monetary benefit from a societal perspective. The per-person incremental net monetary benefit (INMB) of performing the intervention compared to the status quo is assumed to be affine in an uncertain parameter \(\tilde {p}_{t}\) that varies across the cohorts, with realizations in [0,1] and known dynamics. So \(\text {INMB}_{t} = \theta \tilde {p}_{t} - \gamma \), for all t ≥ 0, where 𝜃 is the marginal INMB (with respect to the parameter \(\tilde {p}_{t}\)) and − γ is the fixed INMB, both measured on a per-person basis.

At the beginning of period t, the policy-maker simultaneously decides whether to invest in a medical intervention for the individuals in cohort t and whether to conduct a study of sample size n t over the period to obtain a better estimate of the uncertain parameter \(\tilde {p}_{t}\). Information, if sought, arrives at the end of the current period and is used, together with the known dynamics of \(\tilde {p}_{t}\), to inform the intervention decision for future cohorts. Let \(d_{t}\in \mathcal {D} = \{0, 1\}\) denote the intervention decision at time t, where d t = 0 indicates ‘No intervention’ and d t = 1 indicates ‘Intervention.’ The amount of information collected is measured in terms of the sample size \(n_{t} \in {\mathcal N}=\{0,\ldots ,N\}\); it is obtained at the cost K(n t ), where K(⋅) is an increasing function including a fixed and a variable cost when n t > 0 and K(0) = 0. Thus, at each time t the policy-maker implements the control \(u_{t}=(d_{t}, n_{t})\in \mathcal {D} \times \mathcal {N}\). The per-person current reward for the cohort in period t is

$$ g(\tilde{p}_{t}, u_{t}) = d_{t} (\theta \tilde{p}_{t} - \gamma) - \frac{K(n_{t})}{N}. $$
(1)

The application in Section 4 features the decision problem of when to stop a once-in-a-lifetime disease-screening program where \(\tilde {p}_{t}\) is the uncertain disease prevalence in the t-th cohort which, in expectation, is geometrically decreasing over time; 𝜃 > 0 denotes the marginal benefit of early diagnosis and treatment for an affected individual, γ > 0 is the per-person cost of the program, and the current-period INMB g is increasing in \(\tilde {p}_{t}\). Beyond our leading example, the framework can accommodate a wide variety of problems. As formulated, the uncertain parameter needs to lie in a compact interval (which can be mapped via bijection to [0,1]). Thus, the parameter can represent not only a probability but also other model parameters, such as a quality-of-life weight or cost. Additionally, our analysis assumes that the parameter value is decreasing over time. To model a situation where the expectation of the uncertain parameter is increasing (e.g., obesity prevalence), the problem can be formulated as one in which a parameter of opposite definition is decreasing (e.g., prevalence of individuals who are not obese). Our exposition involves an example of when to stop a health intervention. However, the framework can also be used in situations in which the decision-maker wishes to identify the optimal time to initiate a new intervention (e.g., when to adopt a new surgical technique). More broadly, our framework can be applied in settings in which the decision-maker wishes to identify the optimal time to stop the current intervention or initiate a new intervention; the uncertain parameter is geometrically increasing or decreasing across intervention cohorts; and the current-period reward function is linearly increasing or decreasing in the uncertain parameter. Examples are shown in Table 1.

Table 1 Examples of alternative cases in which our framework applies

2.1 The information-acquisition problem

The policy-maker’s prior belief about \(\tilde {p}_{t}\) at t = 0 is beta-distributed with distribution parameters x 0 = (a 0,b 0). The posterior distribution when a beta-density is updated in a Bayesian manner with information collected using an imperfect information-collection technology is a mixture of beta-densities. Thus, in general, the policy-maker’s prior beliefs about \(\tilde {p}_{t}\) at time t are in \(\mathcal {P}\) where \(\mathcal {P}\) denotes the set of measures which are a mixture of beta-densities. Specifically if \(\tilde {p}_{t} \in \mathcal {P}\), then there exists parameters \(x_{t,i} = (a_{t,i}, b_{t,i}) \in \mathbb {R}_{++}^{2}\) for all i where \( 1 \leqslant i \leqslant m, m \in \mathbb {R}_{++}\), and a set of non-negative weights ω i such that \({\sum }_{i=1}^{m} \omega _{i} = 1\), where the distribution of \(\tilde {p}_{t}\) is a mixture of beta-densities of the form \({\sum }_{i=1}^{m} \omega _{i} \text { beta}(a_{t,i}, b_{t,i})\).

The policy-maker has the option to update his beliefs about the parameter \(\tilde {p}_{t}\) by testing n t individuals at cost K(n t ). The information-collection technology has binary test characteristics q = (q 1,q 2), where q 1 is the sensitivity, q 2 is the specificity, and q 1 + q 2 > 1 (indicating the test is properly labeled). The terms ‘sensitivity’ and ‘specificity’ are often used to describe test accuracy in the medical literature. For clarity, we state their relationship to Type I and Type II error: ‘Specificity’ = 1 −‘Type I error’ = 1 −‘False positive rate’ and ‘Sensitivity’ = 1 −‘Type II error’ = 1 −‘False negative rate’. The number of positive samples is an uncertainty \(\tilde {v_{t}}\) with realization v t ∈{0,…,n t }. Based on the collected information the policy-maker updates his beliefs about \(\tilde {p}_{t}\) in a Bayesian manner.

Proposition 1

If the policy-maker’s prior belief f p (⋅)is a mixture of beta-densities, i.e., \(f_{p} \in \mathcal {P}\) , then for any number of positive observations \(\tilde {v}_{t} = v_{t}\) from n t samples, the Bayesian posterior belief f p|v (⋅|v t ) is also a mixture of beta-densities, i.e., f p|v is in \(\mathcal {P}\) .

Proof

See Appendix A.1. □

If \(\tilde {p}_{t}\) is a mixture of m ≥ 1 beta-densities and if the information-collection technology is imperfect (i.e., \(\min \{q_{1},q_{2}\}<1\)), then the true posterior distribution is also a mixture of beta-densities, containing between m + n t and m × (n t + 1) unique beta-distributions (see Appendix A.2.1). The resulting probability density function (pdf) is

$$\begin{array}{@{}rcl@{}} f_{p|v}\!(p | x_{t},\! n_{t},\!v_{t},\! q)\!\!&=&\!\! \sum\limits_{j=0}^{v_{t}} \sum\limits_{k=0}^{n_{t} -v_{t}} \sum\limits_{i=1}^{m} \omega^{\prime}_{j, k, i}\\ &&\times\text{beta}(a_{t,i}\!\,+\,j\,+\,k,\! b_{t,i}\,+\,n_{t}\!\,-\,j\,-\,k ), \end{array} $$
(2)

with updated weights

$$\omega^{\prime}_{j, k, i} = \frac{ \frac{ \omega_{i} {q_{1}^{j}} (1-q_{2})^{v_{t}-j} (1-q_{1})^{k} q_{2}^{n_{t}-v_{t}-k} {\Gamma}(a_{t,i}+b_{t,i}) {\Gamma}(a_{t,i} +j+k) {\Gamma}(b_{t,i} +n_{t}-j -k)} { {\Gamma}(j+1) {\Gamma}(v_{t}-j+1) {\Gamma}(k+1) {\Gamma}(n_{t}-v_{t}-k+1) {\Gamma}(a_{t,i}){\Gamma}(b_{t,i}) {\Gamma}(a_{t,i} + b_{t,i} +n_{t})}} { \sum\limits_{r=0}^{v_{t}} \sum\limits_{s=0}^{n_{t} - v_{t}} \sum\limits_{i=1}^{m} \frac{ \omega_{i} {q_{1}^{r}} (1-q_{2})^{v_{t}-r} (1-q_{1})^{s} q_{2}^{n_{t} - v_{t} - s} {\Gamma}(a_{t,i} + b_{t,i}) {\Gamma}(a_{t,i} +r+s) {\Gamma}(b_{t,i} +n_{t}-r -s)}{\Gamma(r+1) {\Gamma}(v_{t}-r+1) {\Gamma}(s+1) {\Gamma}(n_{t}-v_{t}-s+1){\Gamma}(a_{t,i}) {\Gamma}(b_{t,i}) {\Gamma}(a_{t,i} + b_{t,i} +n_{t})}}. $$

Explicit expressions for the conditional mean and variance, μ p|v and σ p|v2, are provided in Appendix A.2.2.

Remark 1

If \(\tilde {p}_{t}\) follows a mixture of m ≥ 1 beta-densities and the information-collection technology is perfect (i.e., q 1 = q 2 = 1), then the distribution of sample information, \(\tilde {v}_{t}\), is a mixture of m beta-binomial distributions with the same weights ω i . Updating results in a posterior distribution that is a mixture of m beta-densities with pdf

$$\begin{array}{@{}rcl@{}} f_{p|v}(p | x_{t},\! n_{t},\! v_{t},\!q \!\!&=&\!\!(1,1))\\ \!\!&=&\!\! \sum\limits_{i=1}^{m} \omega{\prime}_{i} \text{beta}(a_{t,i}\,+\,v_{t} , b_{t,i}\,+\,n_{t}\,-\,v_{t} ), \end{array} $$
(3)

with updated weights

$$\omega^{\prime}_{i} = \frac{\omega_{i} \frac{\Gamma(a_{t,i}+b_{t,i}){\Gamma}(a_{t,i} +v_{t}){\Gamma}(b_{t,i}+n_{t}-v_{t})}{\Gamma(a_{t,i}){\Gamma}(b_{t,i}){\Gamma}(a_{t,i}+b_{t,i}+n_{t})}}{\sum\limits_{j=1}^{m} \omega_{j} \frac{\Gamma(a_{t,j}+b_{t,j}){\Gamma}(a_{t,j} +v_{t}){\Gamma}(b_{t,j}+n_{t}-v_{t})}{\Gamma(a_{t,j}){\Gamma}(b_{t,j}){\Gamma}(a_{t,j}+b_{t,j}+n_{t})}}, $$

for all i ∈{1,…,m}.

2.2 Approximate Bayesian inference

For practically relevant sample sizes n t and an imperfect information-collection technology, the number of beta-densities in the posterior distribution can become very large, thus requiring approximation. The need for distributional approximations in decision models has been recognized by Smith who proposed moment matching to replace continuous distributions by appropriate discrete ones [63]. More recently, moment-matching methods have also been used in a Markovian setting, to approximate vector-autoregressions [64]. In our Markov dynamic programming setting, we apply moment matching to approximate the exact posterior distribution which is a mixture of beta-densities with a single beta-distribution. This greatly simplifies the belief propagation compared to dealing with mixtures of beta-densities which feature an increasingly large number of coefficients with each information-collection effort and ultimately an infinite-dimensional state space.

Thus, instead of carrying forward full distribution information about the posterior mixture of beta-densities caused by an imperfect information-collection technology, the policy-maker’s posterior belief about \(\tilde {p}_{t}\) is approximated by a single beta-distribution with the same mean and variance as the exact posterior distribution. The policy-maker’s prior belief is represented by the distribution parameters x t = (a t ,b t ) and the posterior belief incorporating any information collected at time t is represented by the updated parameters \(\hat {x}_{t} = (\hat {a}_{t}, \hat {b}_{t})\). Using the mean and variance of the exact posterior distribution, μ p|v and σ p|v2, the approximate posterior belief parameters are determined using the one-to-one relationship between the standard parameters of the beta-distribution and its mean and varianceFootnote 1. We let ψ(x t ,n t ,v t ,q) denote the function that generates the approximating parameters, with

$$\begin{array}{@{}rcl@{}} \hat{x}_{t} &=& \left[ \begin{array}{c} \hat{a}_{t} \\ \hat{b}_{t} \end{array} \right] \,=\, \psi(x_{t}, n_{t}, v_{t}, q)\\ &=& \left. \left(\frac{\mu_{p|v}(1\,-\,\mu_{p|v})}{\sigma^{2}_{p|v}} \,-\, 1 \right)\left[ \begin{array}{c} \mu_{p|v} \\ 1\,-\, \mu_{p|v} \end{array} \right]\right|_{(x_{t},n_{t},v_{t},q)}. \end{array} $$

In the case of a perfect information-collection technology, the preceding relations describe the policy-maker’s posterior beliefs exactly.

Mixtures of beta-distributions can be fitted to any continuous distribution on [0,1]. Thus, a single beta-distribution with the same mean and variance as a distribution formed by the mixture of beta-densities, will not always provide a satisfactory approximation. However, we focus on the special case where the time-t belief \(\tilde {p}_{t}\) has been obtained via Bayesian updating from a single beta-prior. In this special case, approximating the mixture of beta-densities with a single beta-distribution with the same mean and variance maintains unimodalityFootnote 2 and stationarity of the state space over time.

We assessed the approximation quality using simulation in the policy-relevant region for our application (Appendix A.3). We found that the maximum distance between the cumulative density function of the exact posterior distributions and that of the approximation with matching mean and variance were generally small (< 2%), but became large when the mean was approaching zero and the standard deviation was relatively large. The quality of the approximation was very good (< 0.5%) when the mean was greater than 2%. We deemed the approximation to be of sufficiently high quality for our numerical analysis because our initial conditions and predicted trajectory without information acquisition rely on the regions in which the approximation is good. Also, because of relatively high fixed costs associated with information acquisition, optimal sample sizes in our numerical analysis tended to be sufficiently large that information would likely only be collected once which reduces concerns about compounding the approximation error over successive information-collection efforts.

2.3 System dynamics

The belief state x t , containing the parameters of the distribution of \(\tilde {p}_{t}\), represents the policy-maker’s current beliefs about the uncertain parameter and follows a law of motion of the form

$$ x_{t+1} = \phi(\hat{x}_{t}) = \left[ \begin{array}{ll} z & 0\\ 1-z & 1 \end{array} \right] \hat{x}_{t} , $$
(4)

where z ∈ (0,1) is the decay rate. These dynamics imply a geometrically decreasing expected value, increasing coefficient of variation, and decreasing variance for \(\mu (x_{0}) \leqslant \frac {1}{1+z}\) (Fig. 1). In the mean-variance space, the equivalent state dynamics become

$$\begin{array}{@{}rcl@{}} \mu(x_{t+1})&=& z \mu(\hat{x}_{t}) \ \ \ \text{and} \ \ \ \sigma^{2}(x_{t+1})\\ &=& \sigma^{2}(\hat{x}_{t}) \left(z + z (1-z) \left(\frac{\mu(\hat{x}_{t})}{1-\mu(\hat{x}_{t})} \right)\right). \end{array} $$
Fig. 1
figure 1

Sample state trajectory with decay z = 0.8, with (dashed) and without (solid) information acquisition at time t (for n t = 50), respectively

Derivations of these equations are presented in Appendix A.4. The features of these dynamics can represent a wide variety of settings in which the expectation of a parameter is geometrically decreasing over time (e.g., a health condition that is decreasing in prevalence over time; see Section 4). To model a situation where the expectation (and variance) of the uncertain parameter is increasing (e.g., obesity prevalence), the problem can be re-formulated as one in which a parameter of opposite definition is decreasing (e.g., prevalence of individuals who are not obese).

2.4 The policy-maker’s problem

Given a social discount factor δ ∈ (0,1), the policy-maker’s objective is to maximize the net present value of the stream of expected INMBs, given the initial belief x 0 = (a 0,b 0) and admissible policy decisions \(U\in {\mathcal {U}} = \left \{(u_{t})_{t\in {\mathbb N}} : u_{t}=(d_{t},n_{t})\in {\mathcal D}\times {\mathcal N} \right \}\). To achieve the objective, the policy-maker seeks to find the best of all possible policies π t (⋅), t ≥ 0, with u t = π t (x t ) for all \(x_{t}\in \mathbb {R}_{++}^{2}\), which at each time t maps the state space to admissible current-period actions u t , so that the implemented path of actions U = (u 0,u 1,…) lies in the control-constraint set \(\mathcal {U}\). The number of positive observations in the testing sample of n t is a random variable \(\tilde {v_{t}}(n_{t})\) with realization v t ∈{0,…,n t }. Based on the collected information the policy-maker updates his beliefs about \(\tilde {p}_{t}\) in an (approximate) Bayesian manner using the function ψ(x t ,n t ,v t ,q). Because of the decreasing trend of the uncertain parameter (z < 1), it is never optimal to restart an optimally stopped program.Footnote 3 We consider stationary policies \(\pi :\mathbb {R}_{++}^{2}\to {\mathcal D}\times {\mathcal N}\) to solve the optimal control problem

$$\begin{array}{@{}rcl@{}} && \max_{\pi(\cdot)} \quad \mathbb{E}\left[ {\sum}_{t=0}^{\infty} \delta^{t} \left.g(\tilde{p}_{t}, \pi(x_{t}))\right|x_{0}\right],\\ &&\text{subject to} \quad x_{t+1} = \phi(\psi(x_{t}, n_{t}, \tilde{v}_{t}(n_{t}), q)), \ \ \ x_{0} \ \ \ \text{given}, \\ && u_{t} = \pi(x_{t})\in{\mathcal D}\times{\mathcal N}. \end{array} $$
(5)

Provided the value function V (x) satisfies the Bellman equation,

$$\begin{array}{@{}rcl@{}} V(x) &=& \max\limits_{(d,n) \in \mathcal{D}\times{\mathcal N}} \{ d (\theta \mu(x) - \gamma ) - \frac{K(n)}{N}\\ &&\quad\quad\quad\quad\quad+ \delta \mathbb{E}[V(\phi(\psi(x, n, \tilde{v}(n),q)))\\ &&\quad\quad\quad\quad\quad\times| x, (d, n)] \}, \end{array} $$
(6)

for all admissible states \(x\in \mathbb {R}_{++}^{2}\), the corresponding maximizer π (x) on the right-hand side defines an optimal policy.

Remark 2

To reflect the policy-maker’s ongoing concern for the health-intervention decision, the problem is formulated in an infinite-horizon setting. Given a time-invariant system, this implies that the optimal policy can be described as a mapping from states to actions, without explicit consideration of time. If more information about the system becomes available over time, for example, relating to the decay rate in the system dynamics (see Eq. 4), then it is possible for the policy-maker to re-solve the problem and update the policy accordingly.

3 Dynamic healthcare decisions

3.1 Policies without information acquisition

If information is prohibitively costly or practically infeasible to collect, Eq. 6 simplifies to

$$V_{\text{NoInfo}}(x) = \max_{d \in \{0,1\}} \{ d (\theta \mu(x) - \gamma ) + \delta V_{\text{NoInfo}}(\phi(x))\}, $$

for all \(x\in \mathbb {R}_{++}^{2}\), as there is no Bayesian updating and therefore ψ reduces to an identity map. For all states x for which the optimal strategy is to not do the intervention, this action remains optimal in the future because of the decreasing trend of \(\tilde {p}_{t}\). Indeed, since for z ∈ (0,1), μ(ϕ(x)) = z μ(x) < μ(x), we have that for all states where V (x) = 0, it is also the case that V (ϕ(x)) = 0. Hence, for \(\mu (x) \leq \frac {\gamma }{\theta }\) it is optimal to stop the intervention. This defines a threshold policy of the form

$$ d^{*} = \left\{ \begin{array}{ll} 0, & \text{if } \mu(x_{t}) \leq \frac{\gamma}{\theta},\\ 1, & \text{otherwise,} \end{array} \right. $$
(7)

for all t ≥ 0. Restricting attention to the interesting case where \(\mu (x_{0})\geq \frac {\gamma }{\theta }\) and using the fact that μ(x t ) = z t μ(x 0), we can identify the optimal time T(x 0) to stop the intervention, which is the first period in which the intervention has a nonpositive expected INMB (see Appendix A.5):

$$ T(x_{0}) = \left\lceil \frac{1}{\ln(z)} \ln \left(\frac{\gamma}{\theta \mu(x_{0})} \right) \right\rceil . $$
(8)

Hence, given any initial state x, the value of implementing the optimal stopping policy for t ∈{0,...,T(x) − 1} is

$$\begin{array}{@{}rcl@{}} V_{\text{NoInfo}}(x) &=& \sum\limits_{t=0}^{T(x)-1} \delta^{t}\left(\theta z^{t} \mu(x) - \gamma \right)\\ &=& \theta\mu(x) \left(\frac{1-(\delta z)^{T(x)}}{1-\delta z} \right) - \gamma \left(\frac{1-\delta^{T(x)}}{1-\delta} \right).\\ \end{array} $$
(9)

Proposition 2

When information is prohibitively costly or practically infeasible to collect, the optimal value function V NoInfo(x t ) is non-decreasing and convex in μ(x t ) .

Proof

See Appendix A.6. □

Remark 3

The above result depends only on the decay in the mean of the uncertain parameter distribution and is otherwise distribution-free. In other words, it does not depend on the policy-maker’s beliefs other than that \(\tilde {p}_{t}\) is expected to decrease over time.

3.2 Policies with information acquisition

When the policy-maker has the option to acquire information, the value function is determined by the Bellman equation (Eq. 6). Its properties in the no-information case (Proposition 2) carry over to the more general situation.

Proposition 3

The optimal value function V (x t ) is nondecreasing and convex in μ(x t ) , and nondecreasing in σ 2(x t ) .

Proof

See Appendix A.7. □

3.2.1 Special case: one-time information collection

Assume for now that information can be collected at most once. Given a one-time size- η experiment (with η ≥ 1) and, briefly, ignoring the cost of information collection K(η), the value with information exceeds the no-information value,

$$\begin{array}{@{}rcl@{}} \theta \mu(x) &-& \gamma + \delta \mathbb{E}[V_{\text{NoInfo}}(\phi(\psi(x,n_{0} = \eta, \tilde{v}(\eta),q)))]\\ &-& V_{\text{NoInfo}}(x) > 0 , \end{array} $$

as a consequence of Jensen’s inequality. This insight is also useful for the comparison of experiments. A higher confidence in the information, i.e., for a larger sample size and/or better test characteristics, produces a mean-preserving spread of the random-variable \(\mu (\phi (\psi (x_{t}, n_{0}=\eta , \tilde {v}_{t}(\eta ), q)))\) in the original experiment, and thus, by the convexity of the (monotone) value function and second-order stochastic dominance, a larger value with information.

Because of the monotone system dynamics, the optimal time to collect information of sample size η, at cost κ(η), is obtained by finding a period k where information acquisition is preferred to waiting until the next period, k + 1. In other words, find the smallest k for which

$$V(x_{k}|n_{k}=\eta) \geq V(x_{k+1} = \phi(x_{k})|n_{k+1}=\eta), $$

or equivalently

$$\begin{array}{@{}rcl@{}} &&\theta z^{k}\mu(x_{0}) + \frac{\delta}{1-\delta z}\\ &&\times\left(\mathbb{E}\left[V_{\text{NoInfo}}(\phi(\psi(x_{k},n_{k}=\eta,\tilde{v}(\eta),q)))]\right.\right.\\ &&-\left.\left. \delta \mathbb{E}[V_{\text{NoInfo}}(\phi(\psi(\phi(x_{k}),n_{k+1}=\eta,\tilde{v}(\eta),q)))\right] \right) \\ &\geq& \frac{1-\delta}{1-\delta z} \left(\gamma+\kappa(\eta)\right). \end{array} $$

The positivity of the right-hand side of the last inequality indicates that information acquisition may, on certain trajectories, never be optimal. This is confirmed in our application in Section 4, where the stopping region and the region with information acquisition have a common boundary, transversal to expected state trajectories.

3.2.2 General case: information collection in any period

Based on Proposition 3, the intervention is desirable for greater μ(x t ) and greater σ(x t ); the latter increases the upside of the policy-maker’s asymmetric (convex) payoffs, as if holding a call option. The dynamics presented in Eq. 4, with decreasing expectation and decreasing variance, imply monotonicity of the intervention decision, \(d_{t+1} \leqslant d_{t}\).

Corollary 1

Consider x t(1), x t(2)with μ(x t(1)) < μ(x t(2))and σ 2(x t(1)) = σ 2(x t(2)), then if it is optimal to do the intervention with μ(x t(1)), it is also optimal to do the intervention with μ(x t(2)).

Proof

See Appendix A.8. □

Corollary 2

Consider x t(1), x t(2)with μ(x t(1)) = μ(x t(2))and σ 2(x t(1)) < σ 2(x t(2)), then if it is optimal to do the intervention with σ(x t(1)), it is also optimal to do the intervention with σ(x t(2)).

Proof

See Appendix A.9. □

A direct consequence of Proposition 3 and Corollaries 1 and 2 is that an optimal policy, as a map from states to actions, features three regions (Fig. 2). We describe, in detail, the features of the optimal policy for the case of an optimal stopping problem (Fig. 2A). In region I, an optimal policy is ‘no intervention (and do not sample).’ In region II, an optimal policy is ‘do intervention and sample n t individuals.’ In region III, an optimal policy is to ‘do intervention and do not sample.’

Fig. 2
figure 2

Policy regions for a an optimal-stopping problem and b an optimal-starting problem. In either case, the initial belief is in region II or III; over time, the belief moves towards region I

The boundary between regions I and III is \(\frac {\gamma }{\theta }\) (Section 3.1). For \(0 \leqslant \mu (x_{t}) \leqslant \frac {\gamma }{\theta }\), the policy-maker is indifferent between ‘no intervention (and do not sample)’ and ‘do intervention and sample n t individuals’ when the rewards of the two regions are equal:

$$ 0 = \theta \mu(x_{t}) - \gamma -\kappa(n_{t}) + \delta \mathbb{E}[V(\phi(\psi(x_{t},n_{t}, \tilde{v}_{t},q)))]. $$
(10)

Focusing on the region \(\frac {\gamma }{\theta } \leqslant \mu (x_{t}) \leqslant 1\), the policy-maker is indifferent between ‘do intervention and sample n t individuals’ and ‘do intervention and do not sample’ when the rewards of the two regions are equal. Removing common terms from each side, this occurs when

$$ - \kappa(n_{t}) + \delta \mathbb{E}[V(\phi(\psi(x_{t},n_{t}, \tilde{v}_{t}, q)))] = \delta V(\phi(x_{t})). $$
(11)

For each σ 2(x t ), there can exist more than one μ(x t ) where \(\frac {\gamma }{\theta } < \mu (x_{t}) \leqslant 1\) satisfying Eq. 11 because \(V(\phi (\psi (x_{t},n_{t},\tilde {v}_{t},q)))\) is increasing, but neither concave or convex, in v t . The existence of the section of region III between regions I and II (the location of point A) can be obtained using intuition. Consider two points, A and B, with the same standard deviation (Fig. 2). Compared to point B, if information were to be gathered at point A, the distribution of possible posterior states includes a higher proportion of states in region I (with a reward of 0) and a lower proportion of high-reward states (those with high mean and high standard deviation) and, therefore, information acquisition is less likely to yield a value exceeding its cost. Now consider two points, A and C, with the same mean. Compared to point C, if information were to be gathered at point A, the distribution of possible posterior states is narrower. In both of these cases, increased spread on the side of low mean has no impact on the expectation and increased spread into the high-reward states substantially increases expectation. Therefore, information acquisition is more likely to yield a value exceeding its cost for the state with higher standard deviation.

Proposition 4

For a fixed sample size η (so n t ∈{0,η}for all t), misclassification in the information-collection technology decreases the value function and reduces the number of states for which information acquisition is optimal.

Proof

See Appendix A.10. □

This result is consistent with Blackwell’s result that a less informative signal cannot increase the value of a single-person decision problem [66].

4 Application

4.1 Background and motivation

Chronic HCV infection is a slowly progressing blood-borne disease that causes liver fibrosis, cirrhosis, and liver cancer. It is the principal cause of death from liver disease and the leading indication for liver transplantation in the United States (US) [67, 68]. Between 2.7 and 5.2 million Americans (1.1% to 2.1% of the adult population) are chronically infected with HCV [69, 70]. In the non-injection drug using US population, prevalence peaks in the 1945 to 1965 birth cohorts and decreases thereafter (Fig. 3). Approximately half of all chronically infected individuals are unaware of their disease status [71].

Fig. 3
figure 3

Prevalence of HCV by birth year in a men and b women. Estimated using the National Health and Nutrition Examination Survey (NHANES) (1999-2010). For details, see Appendix A.11.3

Recent model-based analyses concluded that one-time screening of individuals born between 1945 and 1965 is cost-effective [11,12,13,14,15] and the CDC and USPSTF recently released new guidance in support of one-time screening of these birth cohorts [16, 17]. Several studies indicate that screening individuals born later than 1965 is also likely to be cost-effective [13,14,15]. Since HCV prevalence is decreasing in birth year after the 1956 birth cohort (Fig. 3), there may be a time at which screening is no longer cost-effective. To improve the decision about the best time to stop screening, additional information about prevalence of the current and future cohorts may be desirable. However, standard approaches to finding the value of information do not usually include the option to delay the information acquisition.

Note that the population we model were predominantly infected decades ago [72, 73] and do not have ongoing risk factors for HCV re-infection. Many historically significant modes of disease incidence have been virtually eliminated including transmission by surgical or other hospital equipment prior to modern sterilization procedures and blood transfusion [73, 74]. Injection equipment sharing among people who actively use injection drugs (PWID) is currently the principal cause of HCV transmission [76]. Although a history of injection drug use is relatively common among individuals with chronic HCV infection (approximately 40% [71]), re-infection and disease transmission to others via injection drug use are not an ongoing risk for a large proportion of these individuals as three-quarters of HCV infected individuals with a self-reported history of injection drug use report last injecting greater than 5 years ago (median time since last injection = 20 years) [75]. Our model does not include PWID and so we do not consider the possibility of re-infection. PWID are a high-risk population and guidelines, separate from those otherwise discussed here, recommend routine annual HCV screening in this population [77].

We now apply the stochastic dynamic programming framework developed in Section 2 to the case of one-time HCV screening at a routine medical appointment at age 50 for successive birth cohorts. We consider screening at age 50 because one-time screening at this age had the lowest incremental cost-effectiveness ratio in an analysis of single birth cohort screening [14]. Waiting to perform a one-time screening in older individuals is less cost-effective because their disease may have progressed further and treatment is less effective in more severe disease states. One-time screening of younger individuals is less cost-effective because younger individuals are further away from the long-term consequences of HCV which screening and treatment hope to avoid. We transform the unbounded state space in terms of x t = (a t ,b t ) to the compact policy-relevant space μ(x t ) and σ(x t ). Using value iteration implemented in R version 2.15.0 [78], we numerically determine an optimal HCV-screening and information-collection policy for US adults.

At each time, we consider the actions of ‘do not screen for HCV and do not collect information about HCV prevalence in the current cohort;’ ‘screen for HCV and collect sample information about HCV prevalence in the current cohort;’ ‘screen for HCV and do not collect information about HCV prevalence in the current cohort.’ We compare this optimal strategy to the policies identified by various alternative approaches: a slightly modified version of the new CDC and USPSTF recommendation; an optimal policy without information acquisition; and an optimal policy with (possibly immediate) information acquisition. A policy of HCV screening does not inherently provide additional information about HCV prevalence to policy-makers, because only positive test outcomes are reported to the CDC and the reason for the medical test is private health information (the test may have been performed for a reason other than routine screening at age 50). Estimating prevalence among asymptomatic individuals seeking routine preventive medical care therefore requires a study with random sampling of those individuals. The (quasi-)linearity of INMB t for this example is established in Appendix A.11.1. Parameter values and ranges used in sensitivity analysis are presented in Table 2. Details of parameter estimation are presented in Appendix A.11.2 and A.11.3.

Table 2 Model parameter values and range used in sensitivity analysis

4.2 Results

For the purposes of our analysis, we assume the current time to be the year 2010 and the initial cohort to be born in 1960.

4.2.1 Policies identified by alternative approaches

The expected value of the CDC and USPSTF recommendation was obtained by substituting T = 6 into Eq. 9. The sum of the discounted expected INMBs for screening 6 cohorts at age 50, until the 1965 birth cohort turns 50 years of age, is $399.1 million for men and $15.4 million for women (Table 3). The large difference between men and women is attributable to higher HCV prevalence and higher marginal INMB of early diagnosis and treatment in men.

Table 3 Comparison of optimal policies indicated by various analytic approaches for men with initial belief μ(x 0) = 0.0310 and σ(x 0) = 0.0035 and women with initial belief μ(x 0) = 0.0135 and σ(x 0) = 0.0019

We identify the threshold prevalence value below which the HCV-screening program should be terminated and the best time to terminate the screening program, assuming no opportunity to collect information using Eqs. 78. In men, the program should be terminated when prevalence falls below 0.4%, which will occur in 18 years (95%CI: 16-19 years). In women, the program should be terminated when prevalence falls below 0.1%, which will occur in 3 years (95%CI: 0-5 years). The expected INMB of these policies is $566.5 million for men and $21.7 million for women (Table 3).

The traditional approach to value-of-information assessment in the health policy literature assumes immediate information collection [3, 4]. For men and women, we find the optimal sample sizes to be 910 and 4,930 individuals from the current cohort, respectively (Fig. 4). The expected INMB of immediate information followed by the optimal policy based on the information collected increases by $20,000 for men and $600,000 for women. Women have a greater value of immediate information because they are closer to the intervention stopping region threshold and, therefore, immediate information is more likely to result in a policy change.

Fig. 4
figure 4

The value of collecting sample information immediately for various sample sizes. The gain in the expected INMB of the policy (\( \theta \mu (x_{0}) - \gamma + \delta \mathbb {E}[V_{\text {NoInfo}}(\phi (\psi (x_{0}, n_{0} = \eta , \tilde {v}(\eta ),q)))] \)) , the cost of information (κ(η)), and the net gain in the expected INMB of an HCV-screening policy if sample information is collected in the first period only as a function of sample size for a men with initial belief μ(x 0) = 0.031 and σ(x 0) = 0.0035 and b women with initial belief μ(x 0) = 0.0135 and σ(x 0) = 0.0019

4.2.2 Model results

Implementing the full model, we considered the possibility of collecting sample information at each decision period. For computational and illustrative reasons, we restricted the policy-maker’s choice to two sample sizes \(\mathcal {N} \in \{0, \eta \}\). We considered several possible values for η (2000, 2500, 3000, ..., 8000) and we present the results for the sample size that maximized the value at the initial condition for each gender. We also performed analyses using multiple study sample size levels available at each period. We do not present these analyses, as they led to the same optimal policies indicating that our restriction to two sample sizes was not material for this application.

The optimal policy is characterized by the three main regions described in Section 3.2.2 (Fig. 5a). At low prevalence and relatively low uncertainty, it is optimal to not screen and not collect information. At high prevalence, it is optimal to screen and not collect information. At prevalence close to the \(\frac {\gamma }{\theta }\) threshold and relatively high uncertainty, it is optimal to both screen and collect information.

Fig. 5
figure 5

a Optimal policy given any current belief about HCV prevalence and the opportunity to sample 4,000 men (left) and 4,500 women (right) at any time. b Time to the next policy action for men (left) and women (right). For states below the solid line, the next action is to stop screening. For states above the solid line, the next action is to collect information. c The marginal value of collecting information, 4,000 samples for men (left) and 4,500 samples for women (right), in the current period.

For each state in the region where it is optimal to screen without information acquisition, we can identify the optimal next action and the time when it should occur (Fig. 5b). We subdivide this region by a solid line. Above the solid line, which is the region with higher uncertainty, it is optimal to screen without information acquisition for a specified number of periods and then to collect information. In the region with lower uncertainty, it is optimal to screen without information acquisition for a specified number of periods and then to stop screening without ever collecting information. The current prevalence estimates for men and women indicate that it is optimal to screen without information collection for 16 years and 1 year, respectively, and then to collect sample information to inform the next action. The expected INMBs of these policies are $567.9 million and $22.5 million for men and women, respectively (Table 3).

For each state, we also computed the marginal value of collecting a specific amount of information (Fig. 5c). The marginal value of information in the current period is near-zero for states in which collecting information in the future is optimal. Consistent with our expectations, in the ‘Screen and Collect Information’ region, the marginal value of information is greatest close to the \(\frac {\gamma }{\theta }\)-threshold and increases with uncertainty. In the ‘Screen and Do Not Collect Information’ region, the value of information is highest along the boundary that divides the region into points with trajectories leading to information collection and points with trajectories leading to ‘No Screening’ without information collection.

Sensitivity analysis identified that the general conclusions of our numerical analysis are robust to uncertainty in the inputs (details in Appendix A.11.4).

4.3 Discussion of application

Evaluating an HCV-screening policy over its entire lifecycle using a stochastic dynamic programming approach has led to several important policy-relevant insights. Our analysis indicates that recommendations by the CDC and USPSTF to screen individuals born between 1945 and 1965 at their next routine medical visit are conservative for men. Specifically, our analysis shows that, for men, screening should continue until at least the 1976 birth cohort turns 50 (in 2026), at which point 4,000 individuals should be sampled to inform about the continuation of the program. Screening men at least 10 years longer will enable early diagnosis in an estimated 50,500 additional individuals, thus preventing an expected 767 additional liver cancers and about 212 additional liver transplants. For women, we find that a large information-acquisition effort should take place when the 1961 birth cohort turns 50 (in 2011),Footnote 4 as it is likely not cost-effective to screen women, per guidance, to the 1965 cohort because of relatively low prevalence (Fig. 3) and slower disease progression in women [85]. Compared to the CDC and USPSTF recommendation, our model increases the expected INMB by $168.8 million in men and $7.1 million in women.

Our analysis has several limitations. First, we assume only the current cohort can be sampled to learn about subsequent cohorts, relying on the correlation between cohorts (as implied by the system dynamics). In practice, for our example, it is possible to sample the next cohort (49-year olds) directly. We chose this assumption because the individuals who make up the ‘next cohort’ are typically unknown (e.g., the next cohort of patients with a heart attack, the next cohort of pregnant women, or the next cohort of cancer patients). Second, we consider one-time screening at age 50 based on a cost effectiveness analysis of once-in-a-lifetime HCV screening [14]. However, this analysis (and, consequently, ours) assumed that the cohort being screened has not been previously screened. Our model does not identify the optimal age at which to perform one-time screening. Third, we assumed that the individuals who attend a preventive health exam and participate in recommended HCV screening are an unbiased sample from the cohort–that is, individuals are not more or less likely to attend their preventive health exam if they are HCV-positive. However, if individuals at higher-risk of HCV disproportionately self-select for general population screening, then we have underestimated the duration for which screening will be cost-effective. If individuals at lower-risk disproportionately self-select for screening (often called the “worried well”), then we have overestimated the duration for which screening will be cost-effective. Fourth, we focus on HCV screening policy in the non-injection drug using population only because they were the focus of the recent change in HCV screening policy. Finally, while uncertainty (and related information acquisition) with respect to model parameters other than prevalence can be treated in an analogous manner, the details are left for future work.

5 Conclusion

Our analysis shows that when parameters vary across intervention cohorts, it may be optimal to delay information acquisition. This is a significant improvement over the current paradigm which only considers one-time immediate information collection. More specifically, we provide a framework for optimal information acquisition, in terms of timing and precision of the acquired signal (sample size). Further, we incorporate misclassification from an imperfect information-collection technology into our framework, which is an important real-life complexity of information gathering that adds substantial analytical difficulty.

The common assumption that the per-person value-of-information remains constant for future cohorts may result in a significant error when estimating the population value of additional information. It may indicate immediate expensive information collection when, incorporating the system dynamics, the optimal action is to collect information in the future or never at all. When a parameter is evolving across intervention cohorts, ignoring the opportunity to wait and collect information in the future, when the information collected is more likely to result in action, is a missed opportunity for increased efficiency. As seen in our example, adding the option of delaying information acquisition until a time when the signal is more likely to justify a policy shift can increase the expected value compared to a policy of immediate information collection. The dynamic programming framework developed in this paper enables an accurate assessment of the marginal value of additional information and identifies an optimal information-acquisition policy.

In this work, we assumed that the dynamics are monotonically increasing or decreasing and that they are deterministic. In future work, we plan to consider the more realistic assumption of uncertainty in the dynamics. This would then enable learning about the evolution of the parameters, rather than just their current state. Furthermore, our model does not consider the possibility of intervening on a cohort at a different time in the course of their disease or lives (i.e., at an earlier or later age) or the possibility of the intervention modifying the population-level dynamics. Although true for our application, this latter assumption does not hold in general for an infectious disease. Including the additional benefits of reduced disease transmission from prevention and treatment interventions may generate more near-term benefits and may dramatically alter the value of the intervention over time.

With strained resources for health programs and population-health monitoring, this type of analysis may ensure an optimal implementation horizon for health programs together with guidance on when and how much information should be collected to inform health-program adjustments. Beyond health, many application areas face limited resources for investment and information acquisition, high-quality decision-relevant information is often difficult or expensive to collect, and population or environmental trends influence the preferences and behavior of customers across industries. Facing a dynamic consumer, competitive, or physical environment, the optimal timing of high-quality information acquisition may provide competitive advantage.