1 Introduction

Value of information analysis (VoI) is a means of valuing the expected gain from reducing uncertainty through some form of data collection exercise (e.g., a trial or epidemiological study). As such, it is a tool which can be used to assess the cost effectiveness of alternative research projects.

The expected value of a research project is the expected reduction in the probability of making the ‘wrong’ decision multiplied by the average consequence of being ‘wrong’ (the ‘opportunity loss’ of the decision, defined in Sect. 2.1 below). This is compared with the expected cost of the research project. If the expected value exceeds the (expected) cost then the project should be undertaken. If not, then the project should not be undertaken: the (expected) value of the resources consumed by the project exceeds the (expected) value of the information yielded.

VoI is based firmly within a Bayesian statistical framework where probability represents degrees of belief about plausible values for a parameter rather than the long run relative frequency with which an event occurs (as is the case in the frequentist approach). The key concept in Bayesian analysis is the updating of a prior belief about plausible values for a parameter with the support for likely values of that parameter drawn from sampled data (the distribution of which is known as the likelihood function) to form a posterior belief using Bayes theorem [1]. For this reason, Bayesian analysis is sometimes referred to as posterior analysis [2]. VoI requires prediction of the likelihood function conditional on the prior to generate an expected posterior distribution. In lay terms, the results of a data collection exercise (e.g., clinical trial) are predicted based on current knowledge. These are combined with the current knowledge to predict the state of knowledge after the data are collected. It is thus sometimes referred to as preposterior analysis.

The inclusion of VoI as a part of health economic evaluations is increasing [312]. This is useful to direct future research effort to where it can achieve the greatest expected return for finite funding. Its primary use is to determine the optimal sample size for a study based on the marginal gain from an additional trial enrolee compared with the marginal cost. The optimal point is where the marginal cost is equal to the (value of the) marginal gain, a concept directly analogous to the profit maximising condition in the theory of the firm.

The purpose of this paper is to describe briefly the origins of VoI methods and to provide a step-by-step guide to calculation. This manuscript focuses on an analytic approach. However, a numeric (simulation) approach is described in Appendix 1. Spreadsheets with worked examples are also provided as online appendices (see Electronic Supplementary Material [ESM]).

2 Concepts/Descriptive Approach

2.1 The Core Theory

The origins of VoI lie in the work of Raiffa and Schlaifer on statistical decision theory at Harvard [2, 13, 14]. The starting point is that there is some objective function to be maximised, and a choice between courses of action leading to uncertain payoffs with respect to the objective function. It is possible to invest in research to reduce uncertainty in the payoffs, but such information is costly and will thus have a negative impact on the payoff. The question then is whether the decision should be made on current information or whether it is worth investing in additional information to reduce uncertainty before then revisiting the decision.

The payoff can be any outcome such as profit, output or revenue, or broader, less tangible concepts such as happiness, welfare or utility. Likewise, the research can be anything that reduces uncertainty in the payoffs. For example, suppose a medical supplies firm wishes to maximise its profits. It wishes to invest in new manufacturing facilities leading to a much higher level of output allowing it to expand into new markets. However, this will only be profitable if demand is sufficiently high for its product. If demand is lower than expected, sales will be insufficient to make the investment profitable. In this case the objective function is profit, which is uncertain due to uncertainty in demand. The firm can make its decision to invest or not in the new facility now, or it can delay the decision (i.e., maintain the current level of output) and conduct market research to reduce uncertainty in demand and then make its investment decision. The expected cost of the ‘delay and research’ strategy is the cost of the research itself plus any expected foregone increase in profits had the investment decision been made immediately. The expected value of the strategy is the reduction in expected loss through a reduced probability of making the ‘wrong’ investment decision.

The same logic also applies to individual decision making. Suppose a utility-maximising consumer is faced with a choice of beers at a bar. The consumer could make the decision as to which to purchase at random. Alternatively he or she could invest in research (request a sample of each) and make the decision based on that new information. The cost of such research is the delayed enjoyment of a beer (assuming zero cost and utility from the sampling process itself), but the benefit is reduced uncertainty as to which is preferred, and hence a higher probability of identifying a preferred beer and thus gaining the most benefit (maximising utility).

In both examples, the principles and questions are the same: does the value of the additional information outweigh its cost? In the former, does the expected profit from a strategy of research followed by investment decision exceed the expected profit from the investment decision now; in the latter, does the expected utility from sampling the range followed by making a decision exceed the expected utility from choosing one at random without sampling.

The key measurements in VoI are the expected value of perfect information (EVPI), expected value of sample information (EVSI) and the expected net gain of sampling (ENGS, sometimes termed the expected net benefit of sampling, ENBS). The expected value of perfect parameter information (EVPPI) is also sometimes defined. This is the value of eliminating uncertainty in one or more input parameter(s) of the objective function. (Note the EVPPI is also sometimes termed the expected value of partial perfect information).

Where there are only two courses of action, A or B, the decision is most easily represented by calculating the incremental expected payoff of one option compared with the other; that is, the expected payoff with option B less the expected payoff with A. The expected incremental net payoff (or incremental net benefit) and its associated uncertainty can be plotted as per Fig. 1. A cash value (e.g., profit) is used for the payoff in this example, but the principles are the same whether the payoff is cash, utility or some other metric. The incremental payoff is referred to from hereon as the incremental net benefit (INB), and denoted ‘∆B’ in subsequent equations.

Fig. 1
figure 1

Distribution of incremental net benefit (primary vertical axis) and loss function (secondary vertical axis). The distribution of incremental net benefit (ΔB) is indicated by the blue curve. The loss function is indicated by the black line. As mean incremental net benefit is positive the decision should be to adopt. The loss is zero to the right of the origin, and equal to −ΔB to the left. The proportion of the area under the curve shaded is the probability of a loss given a decision to adopt

Based on current information, the expected INB is positive (+£300 in Fig. 1). The decision should therefore be in favour of option B. However, due to uncertainty there is a probability that the decision is wrong, represented by the shaded area in Fig. 1. If it turns out that the INB is actually say, −£250, the wrong decision will have been made: the payoff would have been £250 higher had the decision been to go with option A; the loss (termed the opportunity loss) is therefore £250. Likewise, if the INB was actually −£500, the opportunity loss is £500.

The opportunity loss can therefore be plotted in relation to a secondary y-axis as a −45° line from −∞ to zero (Fig. 1). If it turns out that INB is, say, +£100, or indeed any positive value, there is no opportunity loss as the decision to go with option B was the correct decision. The loss function therefore kinks at the origin and coincides with the x-axis at values greater than zero.

In simple terms, the probability of being ‘wrong’ multiplied by the average consequence of being wrong (the opportunity loss) is the expected loss associated with uncertainty, or equivalently the expected gain from eliminating uncertainty, which is the EVPI.

This logic can be demonstrated most clearly with a discrete approximation. In Fig. 2a, the continuous distribution shown in Fig. 1 is approximated by two possible discrete payoffs: a 23 % probability of incurring a loss of (approximately) £500, and a 77 % probability of a gain of (approximately) £500. The expected payoff (i.e. INB) is therefore 0.23 × −500 + 0.77 × 500 = £270, and the expected loss 0.23 × 500 = £115.

Fig. 2
figure 2

a, b and c Discrete approximation for calculating the expected loss associated with a decision with uncertain payoffs

In Fig. 2b, the same decision problem is divided into four discrete payoffs of (approximately) −£750, −£250, +£250 and +£750, with associated probabilities of 2.3, 20.4, 46.5 and 30.8 %, respectively. The expected INB is therefore 0.023 × −750 + 0.204 × −250 + 0.465 × 250 + 0.308 × 750 = £279, with an expected loss of 0.023 × 750 + 0.204 × 250 = £68. In Fig. 2c, the problem is further subdivided, yielding an expected INB of £298 and expected loss of £52. Continual subdivision of the problem until each discrete column is an ‘infinitesimal strip’ equates to the continuous case as illustrated in Fig. 1 (an expected value of £300 and expected loss of £52).

Suppose some research activity can be undertaken which will reduce uncertainty in the INB (i.e., reduce decision uncertainty). The results of this research can be predicted with the likelihood function: the most likely value of the sampled INB is the prior INB. Given knowledge of the standard deviation of INB, the expected reduction in standard error from a study of a given size can be calculated when the prior is combined with the predicted sample results. This will ‘tighten’ the distribution and thus reduce the probability of making the wrong decision (proportion of the probability mass represented by the shaded area in Fig. 3), hence reducing the expected loss associated with uncertainty. (Note the pre-posterior mean will always equal the prior mean as the most likely value for the sample mean is the prior mean).

Fig. 3
figure 3

Prior and predicted posterior distribution of incremental net benefit. The blue line indicates the prior distribution of incremental net benefit (ΔB), with the red line indicating the predicted posterior. The expected reduction in probability of a loss is equal to the shaded proportion of the area under the prior distribution function

The expected reduction in expected loss is the expected gain from that sample information, or the EVSI.

A small research study will yield a small EVSI, whilst a larger study will yield a bigger EVSI. But a larger study will also cost more than a smaller one. The difference between the EVSI and the cost of the study is the ENGS. The sample size that maximises the ENGS by definition maximises the expected return on investment and is the optimal size for a research study.

2.2 Application to Decision Making in the Healthcare Field

The principles were first adapted to the healthcare field by Thompson [15], with substantial development undertaken by, among others, Claxton, Briggs, Willan and Eckermann [1618]. VoI is probably most usefully considered as a step in the iterative approach to decision making and research [1923]. This comprises firstly defining the decision problem followed by systematic review of all relevant evidence, which is then combined together in a decision model. Point estimate results of the decision model are used to inform the adoption decision whilst decision uncertainty is used to inform the research decision. If new research is deemed worthwhile, it should be undertaken and the results fed back into the systematic review, at which point the cycle is repeated. Of importance in this approach is the existence of two distinct decisions: the adoption decision and the research decision. As stated above, the adoption decision should be made on expected values alone, whilst uncertainty is used to inform whether it is worth obtaining additional information to reduce that uncertainty.

For example, suppose a new treatment were proposed for a disease to replace existing therapy. The decision problem is whether to adopt the new treatment in place of old. Economic theory would suggest this should be made on the basis of whether it represents a net gain to society, taking into account the opportunity cost of the new treatment (that is, the value of health foregone elsewhere in the system to make way for the new treatment). This is measured by the incremental net monetary benefit of the new treatment, and is simply a rearrangement of the incremental cost-effectiveness ratio decision rule (Eq. 1) [24]. This becomes the objective function to be maximised (Eq. 2). Note that the equation can also be expressed in terms of the incremental net health benefit by dividing both sides of the equation by λ (the value placed on a unit of health gain), but net monetary benefit is more practical to work with (the former leads to divide by zero errors when λ = 0).

At this point, it is not specified whether the estimate of INB is derived from a single trial or from a decision model based on a synthesis of all relevant evidence. In order to fully reflect current decision uncertainty, the latter is preferable. However, depending on the decision question and state of current knowledge, a single clinical trial with piggybacked economic evaluation may be an appropriate source of data: Eq. 2 shows INB as a function of incremental cost and outcome alone (as well as the value of a unit of outcome, λ) without specifying how those two parameters are generated.

3 Step-by-Step Calculation

There are two methods by which the VoI statistics can be calculated: analytically, requiring assumptions of normality amongst parameters, and numerically (via simulation), which, whilst relaxing the normality assumptions (allowing alternative parametric forms), can be very burdensome requiring many hours of computer processing time to calculate. The analytic method is most frequently performed on economic evaluations conducted alongside clinical trials, whilst the numeric approach is more often associated with decision models, although in principle either can be applied to either situation. A step-by-step approach to the analytic approach follows, with a description of the simulation approach in Appendix 1. Spreadsheets with the calculations are provided in the ESM, Appendices 2 and 3.

The analytic solution illustrated here assumes mean INB is a simple linear combination of incremental mean cost and outcomes as per Eq. 2. Outcomes are assumed to be measured in quality-adjusted life-years (QALYs) throughout and a threshold of £20,000 per QALY gained is assumed unless otherwise stated. Where sample data provide the source of the priors, calculation of mean and variance of mean INB and its components are as follows:

Individual observations on cost and QALYs are denoted with lower-case letters, and means with upper-case (Eqs. 3, 4), with sample variances and covariance (denoted with lower-case letters) in Eqs. 57. The net benefit of patient i in arm j is defined as the value of the QALYs gained by that patient less the cost (Eq. 8). Mean net benefit in arm j can be defined either as the sum of per patient net benefit divided by the number of observations or as the difference between the value of mean QALYs and cost (Eq. 9). Likewise, the sample variance of net benefit in arm j can be defined either from the individual observations on b, or as the sum of the sample variances of QALYs and cost less twice the covariance (Eq. 10).

Variances of means (denoted with capital letters) are equal to the sample variances divided by the sample size (Eqs. 1114). Note the square root of the sample variance is the standard deviation (a measure of the dispersion of individual observations around the mean) and the square root of the variance of the mean is the standard error (a measure of uncertainty in the estimate of the mean). As per Eq. 10, the variance of mean net benefit can be expressed either as the sample variance of net benefit divided by the sample size, or the sum of the variances less twice the covariance of mean QALYs and cost (Eq. 14).

Mean incremental cost and QALYs are simply the difference between the cost and QALYs in each arm, respectively (Eqs. 1516). INB can be expressed likewise (Eq. 17), or as previously defined in Eq. 2. The variances of mean incremental cost and QALYs and the covariance between the mean increments are simply the sum of the respective (co)variances in each arm (Eqs. 1820). The variance of mean INB can be expressed either as the sum of the variances of mean net benefit, or as the sum of each component (QALYS and cost) less twice the covariance (Eq. 21). Noting that the correlation coefficient between mean incremental cost and QALYs is defined as the covariance of the means divided by the product of the standard errors (Eq. 22), Eq. 21 can be re-written as per Eq. 23. (This is a more useful expression for calculating the EVPPI, see below). The parameters defined in Eqs. 1523 form the respective priors, denoted with the subscript ‘0’ (Eq. 24).

3.1 Equation Set 1a: Mean Incremental Net Benefit

$$\frac{{C_{T} - C_{C} }}{{E_{T} - E_{C} }} = \frac{\Delta C}{\Delta E} \le \lambda$$
(1)

where: C j = mean cost per patient of intervention j, E j = mean outcome (e.g., QALYs gained) per patient from intervention j, ΔX = X 2 − X 1, λ = value placed on/maximum willingness to pay for a unit of outcome.

$$\lambda \Delta E - \Delta C = \Delta B$$
(2)

3.2 Equation Set 1b: Derivation of Prior Estimates of Means and Variances of Means from Sample Data

3.2.1 Sample Means and Sample Variances/Covariance by Treatment Arm

$$C_{j} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} c_{i,j} }}{{n_{j} }}$$
(3)
$$E_{j} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} e_{i,j} }}{{n_{j} }}$$
(4)
$$v\left( {c_{j} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} \left( {c_{i,j} - C_{j} } \right)^{2} }}{{\left( {n_{j} - 1} \right)}}$$
(5)
$$v\left( {e_{j} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} \left( {e_{i,j} - E_{j} } \right)^{2} }}{{\left( {n_{j} - 1} \right)}}$$
(6)
$${\text{Cov}}\left( {e_{j} ,c_{j} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} \left( {e_{i,j} - E_{j} } \right)\left( {c_{i,j} - C_{j} } \right)}}{{\left( {n_{j} - 1} \right)}}$$
(7)

where: \(c_{i,j}\) = cost of patient i in arm j (j = T, treatment or C, control), \(e_{i,j}\) = QALYs gained by patient i in arm j, \(C_{j}\) = mean cost per patient in arm j, \(E_{j}\) = mean QALYs per patient in arm j, \(n_{j}\) = sample size in arm j

$$b_{i,j} = \lambda e_{i,j} - c_{i,j}$$
(8)
$$B_{j} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} b_{i,j} }}{{n_{j} }} = \lambda E_{j} - C_{j}$$
(9)
$$v\left( {b_{j} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{j} }} \left( {b_{i,j} - B_{j} } \right)^{2} }}{{\left( {n_{j} - 1} \right)}} = \lambda^{2} v\left( {e_{j} } \right) + v\left( {c_{j} } \right) - 2\lambda {\text{Cov}}\left( {e_{j} ,c_{j} } \right)$$
(10)

3.2.2 Variance of Means by Treatment Arm

$$v\left( {C_{j} } \right) = \frac{{v\left( {c_{j} } \right)}}{{n_{j} }}$$
(11)
$$v\left( {E_{j} } \right) = \frac{{v\left( {e_{j} } \right)}}{{n_{j} }}$$
(12)
$${\text{Cov}}\left( {E_{j} ,C_{j} } \right) = \frac{{{\text{Cov}}\left( {e_{j} ,c_{j} } \right)}}{{n_{j} }}$$
(13)
$$v\left( {B_{j} } \right) = \frac{{v\left( {b_{j} } \right)}}{{n_{j} }} = \lambda^{2} v\left( {E_{j} } \right) + v\left( {C_{j} } \right) - 2\lambda {\text{Cov}}\left( {E_{j} ,C_{j} } \right)$$
(14)

3.2.3 Increments: Means and Variance of Mean

$$\Delta C = C_{T} - C_{C}$$
(15)
$$\Delta E = E_{T} - E_{C}$$
(16)
$$\Delta B = B_{T} - B_{C}$$
(17)
$$v\left( {\Delta C} \right) = v\left( {C_{T} } \right) + v\left( {C_{C} } \right)$$
(18)
$$v\left( {\Delta E} \right) = v\left( {E_{T} } \right) + v\left( {E_{C} } \right)$$
(19)
$${\text{Cov}}\left( {\Delta E,\Delta C} \right) = {\text{Cov}}\left( {E_{T} ,C_{T} } \right) + {\text{Cov}}\left( {E_{C} ,C_{C} } \right)$$
(20)
$$v\left( {\Delta {\text{B}}} \right) = v\left( {B_{T} } \right) + v\left( {B_{C} } \right) = \lambda^{2} v\left( {\Delta E} \right) + v\left( {\Delta C} \right) - 2\lambda {\text{Cov}}\left( {\Delta E,\Delta C} \right)$$
(21)
$$\rho \left( {\Delta E,\Delta C} \right) = \frac{{{\text{Cov}}\left( {\Delta E,\Delta C} \right)}}{{\sqrt {v\left( {\Delta {\text{E}}} \right)} \sqrt {v\left( {\Delta C} \right)} }}$$
(22)
$$v\left( {\Delta {\text{B}}} \right) = \lambda^{2} v\left( {\Delta E} \right) + v\left( {\Delta C} \right) - 2\lambda \rho \left( {\Delta E,\Delta C} \right)\sqrt {v\left( {\Delta {\text{E}}} \right)} \sqrt {v\left( {\Delta C} \right)}$$
(23)
$$X_{0} = X$$
(24)

where X = ΔC, ΔE, ΔB, vC), vE), Cov(ΔE, ΔC), vB), ρE, ΔC)

3.3 Expected Value of Perfect Information

The EVPI is calculated as per Eq. 25. Note, if mean INB (∆B) is positive then the indicator function in Eq. 25 reduces the second term in the equation to zero, and the EVPI is \(\mathop \smallint \nolimits_{ - \infty }^{0} - \Delta Bf_{0} \left( {\Delta B} \right)d_{\Delta B}\): the integral is from −∞ to zero because if the ‘true’ value of b is greater than zero, then the correct decision has been made and there is thus no opportunity loss. However, if the ‘true’ value of b is actually negative, then the wrong decision has been made, and the loss is −∆B.

The per-patient EVPI is multiplied by N, the total present and (discounted) future population who could benefit from the information. Depending on the disease, this may comprise the current prevalence, plus the incidence over an ‘appropriate’ time horizon, discounted at an ‘appropriate’ rate (Eq. 26). If INB is assumed to be normally distributed, the EVPI can be estimated via the unit normal linear loss integral (UNLLI, or standardised loss, denoted L N*; Eq. 27) [2, 18]. Briefly, the standardised loss evaluated at z is the difference between y and z (where y > z) multiplied by the probability of observing that difference in a standard normal variable, summed over all possible values of y from z to ∞ (this is the process illustrated in Fig. 2 but for a standard normal variable). Equation 28 rearranges this into a more readily computable form, where z is the absolute normalised mean INB, \(\frac{{\left| {\Delta B_{0} } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}\). The standardised loss is a function of this, the standard normal probability density function, \(\phi \left( z \right)\) and cumulative distribution function, \(\varPhi \left( z \right)\) (Eqs. 2930). A good non-technical explanation of loss functions is provided in the Appendix to Cachon and Terwiesch [25].

3.3.1 Equation Set 2: Expected Value of Perfect Information

$${\text{EVPI}}_{0} = N\left[ {I\left\{ {\Delta B_{0} \ge 0} \right\}\mathop \smallint \limits_{ - \infty }^{0} - \Delta Bf_{0} \left( {\Delta B} \right)d_{\Delta B} + I\left\{ {\Delta B_{0} < 0} \right\}\mathop \smallint \limits_{0}^{\infty } \Delta Bf_{0} \left( {\Delta B} \right)d_{\Delta B} } \right]$$
(25)

where N = beneficial population:

$$N = P_{0} + \mathop \sum \limits_{t = 0}^{T} \frac{{I_{t} }}{{\left( {1 + r} \right)^{t} }}$$
(26)

P 0 = prevalent population at time t = 0,

I t  = incident population at time t,

r = discount rate,

I{.} is the indicator function which returns 1 if the condition {} is satisfied, otherwise 0,

f 0(∆B) = prior density function of ∆B.

$${\text{EVPI}}_{0} = N\sqrt {v\left( {\Delta B} \right)_{0} } L_{{N^{*} }} \left( {\Delta B_{0} ,\sqrt {v\left( {\Delta B} \right)_{0} } } \right)$$
(27)

where:

$$\begin{gathered} L_{{N^{*} }} \left( {\Delta B_{0} ,\sqrt {v\left( {\Delta B} \right)_{0} } } \right) = \mathop \smallint \limits_{{\frac{{\left| {\Delta B_{0} } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}}}^{\infty } \left( {y - \frac{{\left| {\Delta B_{0 } } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}} \right)\phi \left( y \right)dy \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad = \phi \left( {\frac{{\left| {\Delta B_{0} } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}} \right) - \frac{{\left| {\Delta {\text{B}}_{0} } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}\left[ {\varPhi \left( { - \frac{{\left| {\Delta B_{0} } \right|}}{{\sqrt {v\left( {\Delta B} \right)_{0} } }}} \right) - I\left\{ {\Delta B_{0} < 0} \right\}} \right] \hfill \\ \end{gathered}$$
(28)

\(\phi (z)\) = standard normal pdf evaluated at z (Eq. 29)

\(\Phi (z)\) = standard normal cdf evaluated from −∞ to z (Eq. 30)

$$\phi \left( z \right) = \frac{1}{{\sqrt {2\pi } }}e^{{ - \left( {\frac{{z^{2} }}{2}} \right)}}$$
(29)
$$\Phi \left( z \right) = \frac{1}{{\sqrt {2\pi } }} \int_{ - \infty }^{z} e^{{ - \left( {\frac{{z^{2} }}{2}} \right)}}$$
(30)

3.3.2 Example

Suppose a trial-based economic evaluation comparing Control with Treatment yielded the following:

Mean INB ∆B 0 = £1,000.

Standard Error of Mean INB \(\sqrt {v(\Delta B)_{0} }\) = £1,500.

Further suppose the present and future beneficial population totals 10,000 patients. As ∆B 0 is greater than zero, the decision would be to adopt Treatment in place of Control. The EVPI would establish whether there could be a case for repeating the trial to reduce decision uncertainty, v(∆B)0.

Therefore the EVPI (Eq. 27) is:

$$\begin{gathered} {\text{EVPI}}_{0} = 10,000 \times 1,500\left( {\phi \left( {\frac{{\left| {1,000} \right|}}{1,500}} \right) - \frac{{\left| {1,000} \right|}}{1,500}\left[ {\varPhi \left( { - \frac{{\left| {1,000} \right|}}{1,500}} \right) - I\left\{ {1,000 < 0} \right\}} \right]} \right) \hfill \\ = 10,000 \times 1,500 \times \left( {0.3194 - 0.6667 \times \left[ {0.2525 - 0} \right]} \right) \hfill \\ = \pounds2.267\,{\text{m}} \hfill \\ \end{gathered}$$

The code to implement this in Microsoft Excel is provided in the ESM, Appendix 2, Sheet 1, Cells B2:D9.

3.4 Expected Value of Perfect Parameter Information

The EVPPI can be estimated by assessing the impact of reducing the standard error of a particular parameter to zero on the reduction in standard error of overall INB. In other words, the EVPPI is the (expected) reduction in expected loss from the reduction in overall decision uncertainty attributable to eliminating uncertainty in a particular parameter.

For example, if ΔC were to be known with certainty, then the posterior variance of ΔC, vC)1 would equal 0. Noting that vE)1 = vE)0 and ρE, ΔC)1 = ρE, ΔC)0, the posterior variance of ΔB, denoted vB)1, is simply the prior estimate of the variance of ΔE (denoted ΔE 0 and converted into monetary units with λ 2, Eqs. 30, 31). The (expected) reduction in variance of ΔB conditional on vC)1 = 0, denoted \(v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}}\), is therefore the difference between prior and (expected) posterior variance of ΔB (Eq. 32) and the EVPPI calculated as per Eq. 33 (compare this with Eq. 27, where \(v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}}\) is substituted in place of \(v\left( {\Delta B} \right)_{0}\)). The equivalent is true for the value of eliminating uncertainty in ΔE, where the reduction in uncertainty is as per Eq. 34.

3.4.1 Equation Set 3: Expected Value of Perfect Parameter Information

$$v\left( {\Delta B} \right)_{1} = \lambda^{2} v\left( {\Delta E} \right)_{1} + v\left( {\Delta C} \right)_{1} - \lambda 2\sqrt {v\left( {\Delta E} \right)_{1} } \sqrt {v\left( {\Delta C} \right)_{1} } \rho \left( {\Delta E,\Delta C} \right)_{1}$$
(31)
$$\therefore\; v\left( {\Delta B} \right)_{{1|v\left( {\Delta C} \right)_{1} = 0}} = \lambda^{2} v\left( {\Delta E} \right)_{1} + 0 - 0$$
(32)

where: \(v\left( X \right)_{1}\) = predicted posterior (i.e. preposterior) variance of mean of X

$$\begin{gathered} v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}} = v\left( {\Delta B} \right)_{0} - v\left( {\Delta B} \right)_{{1|v\left( {\Delta C} \right)_{1} = 0}} \hfill \\ \quad \quad \quad \quad \quad \quad = \lambda^{2} v\left( {\Delta E} \right)_{0} + v\left( {\Delta C} \right)_{0} - \lambda 2\sqrt {v\left( {\Delta E} \right)_{0} } \sqrt {v\left( {\Delta C} \right)_{0} } \rho \left( {\Delta E,\Delta C} \right)_{0} - \lambda^{2} v\left( {\Delta E} \right)_{0} \hfill \\ \quad \quad \quad \quad \quad \quad = v\left( {\Delta C} \right)_{0} - \lambda 2\sqrt {v\left( {\Delta E} \right)_{0} } \sqrt {v\left( {\Delta C} \right)_{0} } \rho \left( {\Delta E,\Delta C} \right)_{0} \hfill \\ \end{gathered}$$
(33)
$$EVPPI_{\Delta C} = N\sqrt {v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}} } L_{N*} \left( {b_{0} ,\sqrt {v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}} } } \right)$$
(34)

where L N* is calculated as per Eq. 28

$$\begin{gathered} v\left( {\Delta B} \right)_{{s|v\left( {\Delta E} \right)_{1} = 0}} = v\left( {\Delta B} \right)_{0} - v\left( {\Delta B} \right)_{{1|v\left( {\Delta E} \right)_{1} = 0}} \hfill \\ \quad \quad \quad \quad \quad \quad = \lambda^{2} v\left( {\Delta E} \right)_{0} + v\left( {\Delta C} \right)_{0} - \lambda 2\sqrt {v\left( {\Delta E} \right)_{0} } \sqrt {v\left( {\Delta C} \right)_{0} } \rho \left( {\Delta E,\Delta C} \right)_{0} - v\left( {\Delta C} \right)_{0} \hfill \\ \quad \quad \quad \quad \quad \quad = \lambda^{2} v\left( {\Delta E} \right)_{0} - \lambda 2\sqrt {v\left( {\Delta E} \right)_{0} } \sqrt {v\left( {\Delta C} \right)_{0} } \rho \left( {\Delta E,\Delta C} \right)_{0} \hfill \\ \end{gathered}$$
(35)

3.4.2 Example

Continuing the previous example, suppose the standard error of INB is a function of the standard errors of ΔE and ΔC as per Eq. 23, with a threshold of λ = £20,000:

Mean INB ΔB 0 = £1,000.

Standard error of mean incremental QALYs \(\sqrt {v(\Delta E)_{0} }\) = 0.036.

Standard error of mean incremental cost \(\sqrt {v(\Delta C)_{0} }\) = £1,000.

Correlation coefficient between mean incremental QALYs and cost \(\rho (\Delta E,\Delta C)_{0}\) = −0.5.

The standard error of mean INB is now calculated as (Eq. 23):

$$\sqrt {{\text{v}}\left( {\Delta B} \right)_{0} } = \sqrt {20,000^{2} \times 0.036^{2} + 1,000^{2} - 20,000 \times 2 \times 0.036 \times 1,000 \times - 0.5} = {\text{\pounds}}1,500$$

If uncertainty in ΔC were eliminated, then \(v(\Delta C)_{1}\) = 0 by definition. Therefore as per Eq. 31, \(\sqrt {{\text{v}}\left( {\Delta B} \right)_{1} } = \sqrt {\lambda^{2} v(\Delta E)_{0} } = \sqrt {20,000^{2} \times 0.036^{2} } = {\text{\pounds}}724.75\).

The overall reduction in the standard error of INB from elimination of uncertainty in ΔC is thus (Eq. 32):

$$\sqrt {v\left( {\Delta B} \right)_{{s|v\left( {\Delta C} \right)_{1} = 0}} } = \sqrt {v\left( {\Delta {\text{B}}} \right)_{0} - v\left( {\Delta {\text{B}}} \right)_{{1|v\left( {\Delta C} \right)_{1} = 0}} } = \sqrt {1,500^{2} - 724.75^{2} } = \pounds1,313$$

The EVPPI is then (Eq. 33):

$$\begin{gathered} {\text{EVPPI}}_{\Delta C} = 10,000 \times 1313\left( {\phi \left( {\frac{{\left| {1,000} \right|}}{1,313}} \right) - \frac{{\left| {1,000} \right|}}{1,313}\left[ {\varPhi \left( { - \frac{{\left| {1,000} \right|}}{1,313}} \right) - I\left\{ {1,000 < 0} \right\}} \right]} \right) \hfill \\ \quad \quad \quad \quad = 10,000 \times 1,313 \times \left( {0.2985 - 0.761 \times \left[ {0.2232 - 0} \right]} \right) = \pounds1.689\,{\text{m}} \hfill \\ \end{gathered}$$

Note the calculations presented here are subject to rounding errors: ESM Appendix 2, Sheet 1, Cells G2:I21 provides relevant Excel code and precise figures.

3.5 Expected Value of Sample Information

The predicted posterior EVPI, EVPI1, is uncertain as it is conditional on the trial data, which are unknown. Therefore the expected EVPI1 is the EVPI1 associated with a particular sample result (denoted \(\Delta B_{s}\)), multiplied by the probability of observing that result, summed over all possible values of \(\Delta B_{s}\) (Eq. 35). The predicted distribution of \(\Delta B_{s}\), denoted \(\hat{f}\left( {\Delta B_{s} } \right)\), is the likelihood function for different values of \(\Delta B_{s}\). The EVSI is thus the difference between prior EVPI and expected posterior EVPI, which is then multiplied by the patient population, N, less those enrolled in the study, 2n s as (depending on the nature of the disease) they cannot benefit from the information (Eq. 36).

Willan and Pinto [26] provide a comprehensive approach to calculating the EVSI. A simpler notation can be derived from Eq. 27 replacing \(\sqrt {v(\Delta B)_{0} }\) with the reduction in standard error of INB from a trial of sample size n s per arm, \(\sqrt {v(\Delta B)_{s,n} }\) and the potentially beneficial population is the total population less those enrolled in the study (Eq. 37) [2]. Thus,\(v(\Delta B)_{s,n}\) is the difference between prior and (expected) posterior variance of mean INB and is calculated as per Eq. 38. n 0 is the prior sample size which may be known where there are actual prior data or inferred by rearranging Eq. 14 (i.e., the ratio of the sample variance and variance of the mean).

Where v(b T ) and v(b C ) and hence vb) (Eq. 39) are unknown, appropriate estimates may be obtained from the literature in related disease areas or from expert opinion, as is common practice when undertaking conventional power calculations.

3.5.1 Equation Set 4: Expected Value of Sample Information

$$E\left( {{\text{EVPI}}_{1} } \right) = \mathop \smallint \limits_{ - \infty }^{\infty } {\text{EVPI}}_{1} \hat{f}\left( {\Delta B_{s} } \right)d_{{\Delta B_{s} }}$$
(36)
$$E_{{\Delta B_{s} }} \left( {{\text{EVSI}}\left( {n_{s} ,\Delta B_{s} } \right)} \right) = \left( {N - 2n_{s} } \right)\left[ {{\text{EVPI}}_{0} - E\left( {{\text{EVPI}}_{1} } \right)} \right]$$
(37)

where: \(n_{s}\) = number of observations per arm.

$${\text{EVSI}}_{n} = \left( {N - 2n_{s} } \right)\sqrt {v\left( {\Delta B} \right)_{s,n} } L_{N*} \left( {\Delta B_{0} ,\sqrt {v\left( {\Delta Bb} \right)_{s,n} } } \right)$$
(38)

where L N* is calculated as per Eq. 28, substituting \(v\left( {\Delta B} \right)_{s,n}\) in place of \(v\left( {\Delta B} \right)_{0}\).

$$\begin{aligned} v\left( {\Delta B} \right)_{s,n} &= v\left( {\Delta B} \right)_{0} - v\left( {\Delta B} \right)_{1} \hfill \\ &= v\left( {\Delta B} \right)_{0} - \frac{{v\left( {\Delta b} \right)}}{{n_{0} + n_{s} }} \hfill \\ &= v\left( {\Delta B} \right)_{0} - \left( {\frac{{v\left( {\Delta b} \right)}}{{\frac{{v\left( {\Delta b} \right)}}{{v\left( {\Delta B} \right)_{0} }} + n_{s} }}} \right) \hfill \\ &= v\left( {\Delta B} \right)_{0} - \left( {\frac{1}{{v\left( {\Delta B} \right)_{0} }} + \frac{{n_{s} }}{{v\left( {\Delta b} \right)}}} \right)^{ - 1} \hfill \\ \end{aligned}$$
(39)

where: \(n_{0}\) is the sample size associated with the prior \(v\left( {\Delta b} \right)\)is the sum of the sample variances of b in each arm:

$$v\left( {\Delta b} \right) = v\left( {b_{T} } \right) + v\left( {b_{C} } \right)$$
(40)

where \(v\left( {b_{T} } \right)\) and \(v\left( {b_{C} } \right)\) are calculated as per Eq. 10.

3.5.2 Example

Continuing the example above, suppose v(b T ) = v(b C ) = £50,000,000, thus vb) = £100,000,000 (obtained either from previous studies as per Eq. 39 or via elicitation as described above). Let λ = £20,000 and suppose a study of sample size n = 100 per arm is proposed. First calculate the (expected) reduction in variance of mean INB (Eq. 38):

$$\sqrt {v\left( {\Delta B} \right)_{s,100} } = 1,500^{2} - \left( {\frac{1}{{1,500^{2} }} + \frac{100}{100,000,000}} \right)^{ - 1} = {\text{\pounds}}1,248$$

The EVSI is then the unit normal loss multiplied by the reduction in standard error and by the beneficial population as shown previously (Eq. 37):

$$\begin{gathered} {\text{EVSI}} = \left( {10,000 - 2 \times 100} \right) \times 1,248 \times L_{{N^{ *} }} \left( {1,000, 1,248} \right) \hfill \\ = \left( {10,000 - 2 \times 100} \right) \times 1,248\left( {\phi \left( {\frac{{\left| {1,000} \right|}}{1,248}} \right) - \frac{{\left| {1,000} \right|}}{1,248}\left[ {\varPhi \left( { - \frac{{\left| {1,000} \right|}}{1,248}} \right) - 0} \right]} \right) \hfill \\ = 9,800 \times 1,248 \times 0.1202 \hfill \\ = {\text{\pounds}}1.467\,{\text{m}} \hfill \\ \end{gathered} .$$

As with the previous examples, the numbers presented here are subject to rounding errors. Full working and Excel code is in ESM, Appendix 2, Sheet 1, Cells B12:D20.

3.6 Expected Net Gain of Sampling

The expected net gain of sampling is the expected gain from the trial (i.e., EVSI) less the cost of sampling (total cost [TC], Eqs. 40, 41). Note that both the EVSI and TC (and thus ENGS) are functions of n. The calculations should be repeated for a wide range of values of n s , and the optimal n s (denoted n*) is that which maximises the ENGS.

3.6.1 Equation Set 5: Expected Net Gain of Sampling

$${\text{TC}}_{n} = C_{f} + 2n_{s} C_{v} + n_{s} \left| {\Delta B_{0} } \right|$$
(41)

where C f is the fixed cost of sampling and C v is variable (per patient) cost of sampling

$${\text{ENGS}}_{n} = {\text{EVSI}}_{n} - {\text{TC}}_{n}$$
(42)

3.6.2 Example

Suppose the fixed costs of a trial totalled £50,000 and a variable cost of £250 per patient enrolled. A trial of size n = 100 per arm would therefore cost (Eq. 40):

$${\text{TC}}_{n = 100} = 50,000 + 2 \times 100 \times 250 + 100 \times 1,000 = \pounds200,000$$

The ENGS of a trial of 100 patients in each arm is thus £1.467 m −£0.2 m = £1.267 m. As this is greater than zero, this trial would be worthwhile; however, the calculations should be repeated for a range of values of ns to identify the ENGS-maximising n s (denoted n*). Figure 4 shows the ENGS for a range of sample sizes, identifying the optimum at approximately 200 patients per arm (see ESM, Appendix 2 for calculations).

Fig. 4
figure 4

Expected value of sample information, total cost and expected net gain of sampling by sample size. EVSI n Expected value of sample information of a study of sample size n per arm, TC n = total cost of a study of sample size n per arm, ENGS n  = expected net gain of sampling from a study of sample size n per arm

4 Discussion and Conclusion

This paper aims to provide a ‘hands on’ guide to using VoI, providing a working template to assist readers in conducting their own analyses. The worked examples show the analytic approach, whilst the numeric approach is detailed in Appendix 1. Both have their respective advantages and disadvantages. The major advantage of the analytic approach is that it is fast to calculate, and is not subject to random ‘noise’ (Monte Carlo error) intrinsic in simulation methods. The major disadvantage is the assumption of normally distributed parameters. Conversely, the advantage of the numeric approach is its flexibility with regards to the distributional form of both input and output parameters; however, it can be time consuming to run sufficient simulations in order to minimise Monte Carlo errors. Comparisons of the results of the analytic and numeric approaches to the same decision problem would be a useful addition to the literature.

Steuten et al. [27] recently conducted a systematic review of the literature covering both development of methods and application of VoI. The review identified a roughly 50/50 split between methodological and applied examples. Amongst the applications, most succeeded in calculating the EVPI and/or EVPPI, but very few went on to calculate the EVSI. A possible reason for this could be the computational burden, with some analyses requiring weeks of computer processing time. Steuten and colleagues [27] acknowledge a number of studies concerned with efficient computation of EVSI, and conclude with a recommendation that future research should focus on making VoI applicable to the needs of decision makers.

There are a number of methodological challenges that have arisen in adapting VoI to the healthcare sector, the most important of which is defining the scope of the benefits from the proposed trial. In the case of a firm conducting market research, the expected net benefit of the research is simply the net impact on expected profit. However, healthcare applications usually seek to inform policy decisions for the benefit of a population. Most economic evaluations express the INB on a per-patient scale. Thus, the EVPI and EVSI are also expressed per patient. To estimate the gain to the health economy, the EVPI and EVSI must be multiplied by the patient population. However defining this is far from straightforward. Those who could potentially benefit from the information include the prevalent cohort with the disease in question and/or the future incident population. Whilst it may be possible to estimate the future incidence and prevalence of the disease with a reasonable degree of accuracy, the time horizon over which the incidence should be calculated is unclear. Most studies use 10–20 years as a de facto standard (and discount the benefit to future populations at the prevailing rate), but without any clear justification [28]. This is of concern as the VoI statistics can be highly sensitive to the time horizon.

After determining the relevant prevalence and incidence, it is argued that patients who participate in the study will not benefit from the information yielded (although this depends on whether the condition is acute or chronic [29]). Therefore, the beneficial population is usually reduced by the numbers of patients enrolled in a study [18, 26]. Likewise, patients enrolled in the ‘inferior’ arm of a study incur an opportunity cost equal to the foregone INB per patient (which is usually added to the total cost of conducting the study). The impact of these issues on the overall value of information depends on the size of the patient population relative to those enrolled in the trial. For a common disease such as asthma or diabetes, trial enrolees will comprise a very small proportion of the total population. However, for rarer diseases, accounting for the opportunity cost of trial enrolees may affect the optimal sample size calculations substantially.

A number of other issues in adapting VoI to the healthcare setting relate to the independence (or lack thereof) of the adoption and research decisions. Whilst conceptually separate, they are not independent of one another as (i) if the adoption decision is delayed whilst new research is underway, there will be an opportunity cost to those who could have benefited if the technology does indeed have a positive INB [30] (and vice versa: if the technology actually has a negative INB and it is adopted with a review of the decision following further research then patients would have been better off with the old treatment); and (ii) if there are considerable costs associated with reversing a decision [17]; for example, retraining of staff or costly conversion of facilities to other uses ([31] cited in [17]).

The former issue has the potential to dramatically reduce the expected value of information: if the time horizon for the analysis is 10 years, but it takes 5 years for a proposed study to be conducted and disseminated, the value of sample information could be (more than) halved. The latter issue can be addressed by adopting an option pricing approach borrowed from financial economics, where the expected value of a strategy to reject with the option to accept pending further evidence compared with a strategy of immediate adoption or rejection is calculated [30]. This requires adding in the (expected, present value) cost of future reversal to the cost of a strategy of immediate adoption [17], and comparing the net benefit of this with one of delay followed by investment.

The final issue relates to the nature of information as a public good: once in the public domain it is non-rival and non-excludable meaning consumption by one individual or group neither diminishes consumption by another, nor can that individual or group prevent another from consuming it. Ignoring other potential benefits to an economy from research (e.g., employment maintenance and prestige), this would lead to free riding as there is no reason for one jurisdiction (e.g., a state research funder) to pay for research when another can do so. Therefore, whilst the EVSI may suggest a particular study should be carried out, it may be strategically optimal to wait for another jurisdiction to undertake the research instead, depending on the transferability/generalisability of the results to the local jurisdiction. This could lead to a sub-optimal (Nash) equilibrium with a failure to carry out research that would be beneficial to both jurisdictions. Alternatively, there may be a global optimal allocation of patients across jurisdictions in a particular trial, dependent on the relative costs and benefits in each location [32].

In conclusion, VoI is a technique for quantifying the expected return on investment in research. This paper, along with the accompanying Excel files, is intended to provide a useful template that can be readily adapted to other situations.