The Benefit-Cost Ratio as a Decision Criteria When Managing Catastrophes

Previous work has shown that when projects are non-marginal, it creates an interdependence among projects. This implies that policies to manage catastrophes should not be evaluated in isolation but in conjunction with each other. As long as relative risk aversion is sufficiently high, the benefits of averting one catastrophe depend positively on the background risk created by other catastrophes. This specific bias makes it possible to create upper and lower boundaries on the willingness to pay to manage catastrophes and the optimal policy. These boundaries can be used to make inferences on which catastrophes should be averted and not, and in which order. The upper and lower boundaries depend only on the individual catastrophe’s benefit-cost ratio and the coefficient of risk aversion, which both are easy to identify using standard economic frameworks.


Introduction
Society is facing multiple catastrophic threats, but resources to manage them are limited. How do we decide which catastrophes to manage, and in which order? The standard economic tool for evaluating projects is cost-benefit analysis (CBA), but if the net benefits of a project are large compared to aggregated consumption, then CBA can cause biased results (Dasgupta et al. 1972). Martin and Pindyck (2015) argue that because catastrophes are non-marginal events, policies to avert them should not be evaluated in isolation. They find that the problem can only be approximated by standard CBA if the total benefits and individual costs are sufficiently small. In Martin and Pindyck (2015), the interdependence between projects occurs because the benefit of averting one catastrophe depends positively on the background risk created by the existence of other catastrophes. Background risk decreases future consumption, which in return increases expected future marginal utility, and therefore also the benefits of 1 3 managing or even averting catastrophes. The benefits of a project to manage one catastrophe, therefore, depends on whether or not projects to manage other catastrophes are carried out. Martin and Pindyck (2015) is not the only study that notes the potential bias of CBA. Hoehn and Randall (1989) show how standard CBA is systematically biased when the number of projects is large. The economy's capacity provides an upper bound on net benefits that are only evident when projects are evaluated together or sequentially, whereas the standard measure of net benefits are unbounded. Thus, when the number of projects is large, net benefits are overestimated. Dietz and Hepburn (2013) examine the conditions of when CBA can lead to biased results when evaluating large projects. They show that using CBA to evaluate non-marginal climate and energy projects can result in sub-optimal solutions, and find that the source of the error is the elasticity of marginal utility. Furthermore, Tsur and Zemel (2017) study intertemporal policies for managing multiple catastrophes where efforts to alleviate a catastrophe can be smoothed out over time. In their study they find that background risk can both increase and decrease the benefits of averting a catastrophe.
The direction of the bias caused by background risk in the Martin and Pindyck (2015) framework is specific, making it possible to create upper and lower boundaries on willingness to pay and the optimal scaling of all policy measures. The optimal scaling is increasing in the level of background risk. The lower boundary is defined as the optimal policy in the absence of background risk and requires no information about the other potential catastrophes. The upper boundary is equivalent to the optimal policy in the presence of background risk when the policy measure is evaluated in isolation.
This paper aims to illustrate how these boundaries can be used to make inferences on which catastrophes should be averted, and in which order. The goal of the new decision criterion is to provide correct qualitative guidance for decision-makers. The benefit-cost ratio of the individual catastrophe is the key determinant in the decision criterion. Thus, evaluating policy measures in isolation can provide the social planner with enough information to decide which catastrophes should be averted. The decision criterion proposed in this paper can replicate the numerical examples in Martin and Pindyck (2015) using less information and a method that requires less computational skill.

Consumption, Welfare and Willingness to Pay
The framework builds on Martin (2008Martin ( , 2013, which extends the lognormal consumptionbased asset-pricing model to allow for the combination of general independent and identically distributed (i.i.d) consumption growth and power utility. In continuous time this allows log consumption to follow a Lévy process. Martin and Pindyck (2015) use this result to explore policy interdependencies caused by background risk arising from catastrophes, and to derive decision rules for which catastrophes to avert. Assume society is facing N > 1 catastrophes. If catastrophe i occurs, it causes a permanent drop in log consumption (c t ) equal to the random amount i . I normalize consumption, such that at t = 0 , C 0 = 1. Log consumption follows a Poisson process, where t is the time subscript and g is the growth rate of the economy. Q i (t) is the Poisson counting process for catastrophe i with known mean arrival rate i . Preferences are represented using a constant relative risk aversion (CRRA) utility function, U(C) = C 1− 1− , where is the rate of relative risk aversion. Unless noted otherwise, in the rest of the paper I assume that > 1. 1 The choice of utility function and its implication on the paper's primary results are discussed in section five.
The expected present value of welfare is given by the expression where is the rate of time preference. I follow Martin (2008Martin ( , 2013 and Martin and Pindyck (2015), and introduce a cumulant generating function (CGF). The cumulant generating function describes the probability distribution in a useful and compact way, and helps find C t 1− . The CGF in time t is defined as Because consumption is a Poisson process, the one period CGF is linearly scalable in t, such that t (1 − ) = (1 − )t . Using the law of iterated expectations, the CGF for time t is The expected present value of welfare is then For all i catastrophes there exists a policy measure that can reduce the likelihood of the catastrophe occurring. To make a clear distinction between introducing multiple policy measures in isolation and introducing a set of policy measures simultaneously, I introduce two different policy impact vectors. The single impact policy vector only contains the scaling of policy measure i, p i , and the multiple impact policy vector, , represents the scaling of all policy measures. The mean arrival rate of catastrophe i is a non-increasing and convex function of The willingness to pay for the isolated effect of policy p i in the presence of N − 1 other catastrophes is The willingness to pay, w N i (p i ), is non-decreasing and concave in p i . 2 If all i catastrophes are equal, the willingness to pay is increasing in the number of other catastrophes. When all catastrophes are equal, the willingness to pay is increasing in the number of other catastrophes. The more catastrophes lurking in the background, the higher is the willingness to pay for policy measure i. Since the willingness to pay for policy measure i in isolation positively depends on the existence of all other catastrophes, the willingness to pay is bounded below by the willingness to pay for policy measure i when no other catastrophes exist w 1 (p i ) . Thus, The less catastrophes that lurk in the background, or the less catastrophic the catastrophes are, the lower is background risk and the willingness to pay to manage catastrophe i. If the social planner introduces a policy to reduce the likelihood of catastrophe j, this will decrease the willingness to pay to manage catastrophe i. Martin and Pindyck (2015) argue that because of this interdependence, policies to avert catastrophes should not be evaluated in isolation unless the total benefits of averting the catastrophes are sufficiently small. The relationship between w N i (p i ) and w 1 i (p i ) is where i is used to denote the level of background risk arising from catastrophe i, such that Let 1 × N be a multiple impact policy vector, which contains the scaling of all N policy measures. The willingness to pay for policy p i , w N i ( ) , now depends whole policy vector . Since Λ i (p i ) is non-increasing and Λ i (p i ) ∈ [0, i ], the willingness to pay for policy measure i is bounded both below and above, Given that > 1 , the optimal policy p * i will similarly be bounded above and below. The next section will show how these boundaries can be used to sequentially decide which catastrophes should be averted or not.

Optimal Policy
The willingness to pay determines the benefits of managing the catastrophe. The benefit of averting catastrophe i, . B N i is the percentage loss of utility when consumption is reduced by w N i percentage. The cost of introducing policy measure i is a permanent tax on consumption, T i (p i ) . For simplicity assume that both the tax and the mean arrival rate are linear in The parameter i can be interpreted as the cost effectiveness of the policy. If the social planner focuses on catastrophe i only, she chooses a policy scaling p i that solves Let p N i be the optimal scaling of policy measure i in isolation when N − 1 other potential catastrophes is lurking in the background. Solving (7) and using the definition of B N i from above, the optimal scaling of policy measure p i in isolation is, i is the benefit of averting catastrophe i when there are no other catastrophes present. Given the boundaries on willingness to pay derived earlier, we know that B 1 i < B N i , and therefore p 1 i < p N i . Two factors determine the size of p N i and p 1 i . The first is the rate of relative risk aversion . The second, and most interesting, is the benefit-cost ratio, The larger the benefit-cost ratio is, the larger is p N i . Note that the lower boundary, p 1 i , only depends on the characteristics of catastrophe i, and require no information about other potential catastrophes.
Because of the interdependence caused by background risk the social planner should not be solving (7) repeatedly for all i = 1, … , N. Instead, she should choose a policy vector that solves the problem When evaluating all policy measures in conjunction with each other, solving for p i as a function of all other policies, ̄ , gives the following optimal policy response function where B N i is the benefit of averting catastrophe i in isolation (as before). For proof see "Appendix D''. The optimal policy scaling of policy measure i in isolation p N i is the main determinant in the policy response function. The presence of other policies decrease the optimal scaling of policy i. The higher the benefit-cost ratio of i is, the larger is p N i and the more robust is the optimal scaling of policy i to the presence of other projects. The last term also shows that the larger the benefits of policy i are, the less sensitive p * i is to the presence of other policies. However, this also implies that p * j would be relatively more sensitive to p i . (10)

The last term in
< 0 for all combination of i and j. Therefore, The optimal policy is bounded below by the optimal policy in isolation when there is no background risk and bounded above by the optimal policy in isolation. Note that p N i is a linear and increasing function of the level of background risk, 3 and that when background risk is zero p N i = p 1 i . Increasing background risk, increases the distance between p N i and p 1 i . From these results, I make three propositions. Propositions 1 and 2 formulate two general rules for choosing which projects to undertake and which not to undertake, and how to scale them. Proposition 1 makes it possible for the social planner to edit the optimal set without any knowledge of the characteristics of the other catastrophes. Proposition 2 requires knowledge of the background risk, but allows the social planner to edit the optimal set further using a straightforward decision rule. Proposition 3 provides guidance on which of the catastrophes that should be averted first if the social planner chooses sequentially. Propositions 1 and 2 follow from the results in Martin and Pindyck (2015), but Proposition 3 does not.
Proposition 1 If it is optimal to avert (p 1 i ≥ 1) or manage (p 1 i > 0) catastrophe i in the absence of all other catastrophes, then it is optimal to avert (p * i ≥ 1) or manage (p * i > 0) the catastrophe in the presence of any subset of catastrophes.
Proposition 2 If it is optimal to do nothing to manage catastrophe i in the presence of all catastrophes (p N i ≤ 0) , then it is optimal to do nothing to manage catastrophe in the presence of any subset of catastrophes (p * i ≤ 0).
Solving (9) requires us to solve a set of N equations, and require the knowledge of the benefit and cost of all catastrophes. The main goal of Propositions 1-2 is to show how we can make inferences on the optimal policy without evaluating the full set of N. Proposition 3 provides the social planner with guidance on sequential choice.

Proposition 3 Catastrophe i dominates catastrophe j if the upper and lower policy boundary of catastrophe i is larger than the upper and lower policy boundary of catastrophe j. If
it is optimal to avert catastrophe i, and catastrophe i dominates all other catastrophes, it is optimal to avert catastrophe i first.
Thus, if catastrophe i dominates j in the upper and lower boundary, it always dominates j. ◻ Proposition 3 states that in some cases, it is possible to decide which catastrophe is the most serious, avert that, and then decide whether to avert the other catastrophes. The three propositions simplify the policy decision-making by allowing the policy decision maker to make choices concerning the optimal policy without solving a complex optimization problem that requires a lot of information. The propositions are intuitively easy to understand and rely only on the benefit-cost ratio and knowledge of the curvature of the utility function. Finding The major shortcoming of this method proposed is that there may exist catastrophes that fall in between Propositions 1 and 2. These catastrophes are characterized by the following: When evaluated in isolation, it is optimal to avert catastrophe i p N i ≥ 1 , while it is optimal to do nothing p 1 i ≤ 0 when there is no background risk. The problem can be solved by using the information available to update the boundaries. As long as the optimal policy for one or more catastrophes can be derived using the upper or lower boundary, that information can be used to create new upper boundaries for the other catastrophes by updating the level of background risk.

Example
In order to make the example similar to the work of Martin and Pindyck (2015), I use the same parameter values and functional forms. z i = e − i is distributed according to the Power distribution with parameter i > 0 , such that b(z i ) = i z i −1 i with 0 ≤ z i ≤ 1. Note that, given the functional form of the utility function, it is necessary that i > − 1 for all i to ensure that − N (1 − ) > 0 . The growth rate is g = 0.02 and the rate of time preference is = 0.02 . The individual parameter values for the catastrophes are given in Table 1. The intuition behind each of the parameter values differ and a full discussion of each one can be found in Martin and Pindyck (2015). The intention behind the numerical example is not to provide guidance on which of these catastrophes society should avert, but illustrate how the criterion can be applied in practice. Because of large uncertainties surrounding the parameter values, the estimates and results in the example should be viewed as illustrative. It is also important to note that this framework is appropriate when analyzing consumption disasters, and not deadly catastrophes. The damage and the likelihood of a mega-virus, floods, storms, and earthquakes are roughly based on historical occurrences. The β parameter for each of the catastrophes are calculated using the average drop in gross domestic product (GDP) the catastrophe have caused. For floods, storms, and earthquakes, the average drop is calculated to be approximately 1%. For climate change, the authors focus on catastrophic scenarios and assume that catastrophic climate change will cause a 20% drop in GDP. They further assume that the likelihood of a catastrophic climate event occurring in the next 50-60 years is 29%, which implies λ = 0.004 . There are especially large uncertainties surrounding the value . For example, the cost of preventing floods and storms is high because fully avoid all impact of such events involves costly measures such as the reallocation of homes and other infrastructure. The authors assume that the cost of preventing floods and storms is 2%, while for earthquakes, it is only 1% because many buildings in vulnerable areas are already earthquake-proof.
The numerical example in Martin and Pindyck (2015) builds on a discrete model where p i is either zero or one, while the extension introduced in this paper allows for p i ∈ [0, 1] . Still, the continuous model in this paper and the discrete model in their paper provide the same results. Figures 1 and 2 show the upper and lower policy boundaries before editing (upper figure) and after policy boundaries have been updated once (lower figure) for two different levels of risk aversion. In the initial stage, when risk aversion is low ( = 2 ) , the social planner should avert the following catastrophes: mega-virus, nuclear terrorism, floods, bioterrorism, and storms. This result is derived using only Proposition 1, which requires no knowledge about the background risk level. Looking at the policy intervals, we see that the mega-virus catastrophe dominates all other catastrophes. If society only can avert one catastrophe, the social planner should focus on averting the mega-virus catastrophe. The second catastrophe the social planner should avert is floods, storms, and finally nuclear terrorism and bioterrorism. How good of an approximation evaluating in isolation is, depends on the absolute distance between the boundaries. Evaluating the policy measure to manage the mega-virus catastrophe in isolation is a good approximation, for earthquakes, it is not. For earthquakes, the distance between the upper and lower boundary is large, and it is not possible to make inferences on whether or not the catastrophe should be averted in the initial stage. Using the knowledge that these five catastrophes should be averted, I update the upper boundaries once by removing the background risk initially caused by the five catastrophes. The new boundaries make it easier to see that society should not avert the climate catastrophe or earthquakes.
When risk aversion is high ( = 4) , the initial stage provides the social planner with information that it is optimal to avert the mega-virus catastrophe, climate catastrophe, nuclear terrorism and floods. Initially no catastrophe dominates. Using this information, the boundaries are updated once. In the second stage, we see that bioterrorism, earthquakes and storms should not be averted. The second editing stage shows that the mega-virus catastrophe again dominates all the other catastrophes. For both levels of risk aversion, the proposed criteria replicate the results in Martin and Pindyck (2015) and the results from solving (9).
The strategy of updating boundaries using the information that is available in the initial stage depends on there being any information available in the initial stage. There has to be one or more catastrophes that can be edited in or out of the optimal set using Propositions 1 or 2. Also, the strategy of updating boundaries, compared to solving a set of simultaneous equations, only makes sense if it is less complex and time-consuming. Figure 3 illustrates for which combinations of willingness to pay in isolation and tax, Proposition 1 suggest that the social planner should avert the catastrophe or do something to manage the catastrophe, and for which combinations Proposition 2 suggest the social planner should do nothing, given two levels of risk aversion and background risk.
Background risk is given as the sum suggested by the seven catastrophes in Martin and Pindyck (2015). The sum is used to find the upper boundary p N i using (6). When risk aversion is low background risk is 0.0136 and when risk aversion is high background risk is 0.0652.
There is a large set of willingness to pay in isolation and tax combinations that allow the social planner to use Propositions 1 and 2 to make inferences on what the optimal policy may be. When risk aversion is low 90% of the combinations allow us to make inferences Earthquakes on the optimal policy scaling using Propositions 1 and 2, while when risk aversion is high this number drops to 76%. Note that even in the area "do something" it is possible to make inferences about the size of p * i since it is bounded below by p 1 i . If p i = 0.5 , then the social planner knows that 0.5 ≤ p * i ≤ 1 . From Fig. 3 we also see that when willingness to pay and tax is low it is more likely that the conclusion will be avert or do nothing, but as the willingness to pay and the tax increases (even if the ratio remains the same), it is more likely that we will get an interior solution (do something). If p 1 i ≤ 0 and p N i ≥ 0 , it is not possible to use Propositions 1 and 2 to make inferences on what the optimal policy is. Thus, whenever the criterion proposed in this paper will yield inconclusive results. The larger the cost and the higher the background risk, the less likely it is that the criteria will provide any qualitative guidance on the optimal policy. Still, as seen in Fig. 3, for most combinations of

Welfare Framework and Effect of Background Risk
The Martin and Pindyck (2015) paper builds on Martin (2008Martin ( , 2013, which extends the lognormal consumption-based asset-pricing model to allow for the combination of general i.i.d. consumption growth and power utility. Thus, the power utility assumption carries over both into Martin and Pindyck (2015), and the follow-up paper by Tsur and Zemel (2017) which study intertemporal policies for managing multiple catastrophes. The use of constant relative risk aversion preferences is common in the economic literature on catastrophic risk. For example, Barro (2009) uses power utility in his work on asset pricing puzzles and the welfare cost of disasters. Similarly, Bretschger and Vinogradova (2017) use constant relative risk aversion utility in their analyses of optimal policy response to environmental disasters. Moreover, Dietz and Hepburn (2013) also assumes power utility when they investigate at the error of using conventional CBA to evaluate non-marginal climate and energy projects. However, the application of expected utility is sensitive to assumptions about the shape of the probability distribution and the utility function. The combination of a probability distribution with heavy tails and power utility implies infinite expected utility and infinite expected marginal utility (Geweke 2001;Weitzman 2009). In the presence of heavy tailed risk, the constant relative risk aversion utility function may not be the most appropriate choice (Millner 2013). As an alternative Ikefuji et al. (2013) suggests Pareto utility as a more appropriate choice of utility function in the face of catastrophic risk. However, no analytical solution exist for Martin and Pindyck (2015) model with Pareto utility.
The background risk in this paper is mean changing. Increases in background risk reduce expected future consumption because a catastrophic event reduces the growth rate. How the benefits of averting one catastrophe, and thus also the optimal policy scaling, are affected by an increase in background risk depends on two conflicting mechanisms. The presence of other catastrophes reduces expected future consumption. Thus if catastrophe i occurs, it would cause a smaller absolute drop in consumption than if there were no other catastrophic threats. This reduces the benefit of policy i. However, the reduction in expected future consumption raise future expected marginal utility. This effect raises the benefit of policy i, because the loss in welfare caused by catastrophe i is greater when total consumption is low (Martin and Pindyck 2015). Which of the two effects that dominate depend on the curvature of the utility function. The latter dominates if the coefficient of relative risk aversion is sufficiently large. Thus, the direction of the bias caused by background risk in the Martin and Pindyck (2015) framework is specific, and is tied to the choice of utility function and the coefficient of constant relative risk aversion.
To illustrate this, assume that the net welfare of policy p i in the presence of background risk can be approximated using a first-order Taylor expansion W( The effect of increases in background risk on the welfare changes of policy p i is then given by where C is the expected level of consumption society can enjoy under policy p i . C is given by (3) and the cost of p i , so C = exp g + i (1 − p i )(Ee − i − 1) − Φ (1 − i p i ) . Note that C < 0. Following the common assumption that welfare is an increasing and concave function of consumption, and assuming that it is optimal to do to do something p i > 0, increases in background risk increase the net welfare of policy is the Arrow-Pratt measure of relative risk aversion, also referred to as the coefficient of relative risk aversion.
For any functional form of W(.) that satisfies (12), the optimal scaling of policy i is increasing in the level of background risk. For constant relative risk aversion utility R(C) = , so as pointed out in Martin and Pindyck (2015), (12) holds whenever 1 < . Exponential utility has the functional form U(C) = 1 − exp(−aC) ∕C when ever a ≠ 0 , where a is a constant that represents the degree of risk preference. The coefficient of relative risk aversion for the exponential utility function is R(C) = aC , so (12) holds whenever there is sufficiently risk averse preferences. Pareto utility has the functional form U(C) = 1 − 1∕ 1 +C∕ k where > 0 and k > 0 . The parameters and k jointly characterize the shape of the utility function. The two parameter form provides the flexibility to calibrate k so that the Pareto utility function matches power-like behavior for values of C that are far from zero, while at the same time adjusting such that the degrees of absolute risk aversion do not increase too rapidly for inputs close to zero (Ikefuji et al. 2013). For Pareto utility R(C) =C(k + )∕(C + ) which holds whenever ∕C < k + − 1 . Thus, it holds as long as k is sufficiently large.
Since we do not know when, or if a catastrophe will occur, nor how large the damage will be, both risk and uncertainty are central in the modeling of catastrophes. Ambiguity aversion is a preference for known risk over the unknown risk, and there exist multiple normative models of decision-making under uncertainty that allows for ambiguity aversion. In expected utility, the decreasing marginal value of consumption and attitudes towards risk are not separable. Since the bias exist because the presence of catastrophes decreases expected consumption, it is the first and not the latter that determines the direction of the bias. Thus, in a non-expected utility model where attitudes towards outcomes are represented using a utility function, the direction of the bias will still depend on the curvature of the utility function. However, other preferences such as ambiguity aversion may affect the size of the bias. Two common models of decision-making under uncertainty that allows for ambiguity aversion is maximin expected utility (Gilboa and Schmeidler 1989) and the smooth ambiguity model (Klibanoff et al. 2005). The maximin expected utility (Gilboa and Schmeidler 1989) approach first identifies the worst expected outcome and then evaluate policies based on these worst-case scenarios. This focus on bad distributions is essentially an increase in background risk. For the smooth ambiguity model (Klibanoff et al. 2005), where the decision-maker assigns a subjective weight to each distribution and then combine these evaluations into a single value, it is the subjective weights that will determine the size of the bias.
The criterion holds for consumption disasters, however, some of the catastrophes society is facing may be of a different nature. If one or more of the catastrophes pose an existential threat to humanity, the conclusions in this paper will change. If one or more of the catastrophes poses a threat of extinction, then the presence of such catastrophes will reduce expected future marginal utility, even when (12) holds.

Conclusion and Policy Advice
Previous research shows that evaluating policy measures to manage catastrophes in isolation of each other can cause biased results, which makes conventional CBA unsuitable. This paper aims to explore further if and when standard CBA can provide correct qualitative guidance. Because of the specific nature of the dependency between policies that arise in the presence of background risk, it is still possible to use the benefit-cost ratio to make inferences about the optimal policy. This is done by approximating the optimal policy by deriving upper and lower policy boundaries. As long as the relative rate of risk aversion is above one, the lower boundary is the optimal policy in isolation when no other catastrophes are present. The upper boundary is the optimal policy in the presence of background risk when the policy measure is evaluated in isolation. The magnitudes of the upper and lower boundaries depend only on the individual catastrophe's benefit-cost ratio and the level of risk aversion. To find the lower policy boundary the social planner does not need any information about other potential catastrophes. The upper policy boundary requires information about background risk, but neither boundary requires information on the other catastrophes benefit-cost ratios. This makes the policy boundaries easy to find using a standard economic framework. The policy decision maker can use these boundaries to make inferences on which catastrophes should and should not be included in the optimal policy set. Avert or partially alleviate catastrophes if the lower boundary is positive and do not avert the catastrophe if the upper boundary is negative or zero. Moreover, the boundaries can help with sequential choice and reveal both which catastrophes that should be managed, and in what order.
Based on the results in this paper, I formulate three points of advice for policy decision makers.
1. Any project that passes the cost-benefit test is welfare enhancing. 2. The less background risk there is, the better is the approximation using standard costbenefit analysis. 3. A project to manage catastrophes that has a large benefits and a large benefit-cost ratio is less sensitive to other policy alleviation than a project with small benefits and a small benefit-cost ratio. Since all denominators and numerators are positive, this is the same as which can be simplified to which is always true since j ≥ Λ j p j for all j.

Proof that w
can be simplified to such that 1 − w 1 i (p i ) 1− ≤ 1 − w N i ( ) 1− is the same as Since all denominators and numerators are positive, this is the same as The expression can further be simplified to

Appendix E: Proof of equation (12)
Note that the chain rule states that W which is the same as