Uncertainty, Learning and International Environmental Agreements: The Role of Risk Aversion

This paper analyses the formation of international environmental agreements (IEAs) under uncertainty, learning and risk aversion. It bridges two strands of the IEA literature: (i) the role of learning when countries are risk neutral; (ii) the role of risk aversion under no learning. Combining learning and risk aversion seems appropriate as the uncertainties surrounding many international environmental problems are large, often highly correlated (e.g. climate change), but are gradually reduced over time through learning. The paper analyses three scenarios of learning. A key finding is that risk aversion can change the ranking of these three scenarios of learning in terms of welfare and membership. In particular, the negative conclusion about the role of learning in a strategic context under risk neutrality is qualified. When countries are significantly risk averse, then it pays them to wait until uncertainties have been largely resolved before joining an IEA. This may suggest why it has been so difficult to reach an effective climate change agreement.


Introduction
Environmental issues such as climate change pose three key challenges for economic analysis: (i) there are considerable uncertainties about the likely future costs of environmental damages and abatement; (ii) our understanding of these uncertainties changes over time as a result of learning more about climate science, possible technological responses and behavioral responses by households, firms and governments; (iii) the problem is global, but since there is no single global agency to tackle climate change, policies need to be negotiated through international environmental agreements (IEAs). 1,2 Recently, these three issues have begun to be integrated in one framework. Two strands of literature can be distinguished.
The first strand of literature studies uncertainty and IEA formation with the focus on the role of learning, but under the assumption of risk neutrality. Ulph and Ulph (1996) and Ulph and Maddison (1997) compare the fully cooperative and the non-cooperative scenarios when countries face uncertainty about damage costs. They show that the value of learning about damage costs may be negative when countries act non-cooperatively and damage costs are correlated across countries. Na and Shin (1998), Ulph (2004), Kolstad (2007), Ulph (2008, 2011) have considered how the prospect of future resolution of uncertainty affects the incentives for countries to join an IEA. Kolstad and Ulph consider a model where countries face common uncertainty about the level of environmental damage costs. 3 Three scenarios of learning are considered: with Full Learning, uncertainty about damage costs is resolved before countries decide whether to join an IEA; with Partial Learning, uncertainty is resolved after countries decide whether to join an IEA, but before they choose their emissions levels; with No Learning, uncertainty is neither resolved before stage 1 nor stage 2. They showed that the prospect of learning, either full or partial, generally reduces the expected aggregate payoff in stable IEAs. In particular, Ulph (2008, 2011) showed that Partial Learning would yield the highest aggregate payoff for only a small proportion of parameter values. For a significant majority of parameter values, the highest expected aggregate payoff arose under No Learning. Hence, it is better to form an IEA before waiting for better information: removing the "veil of uncertainty" seems to be detrimental to the success of international environmental cooperation.
All these models have assumed that countries are risk neutral. However, in the climate context, risks are highly correlated and hence possibilities for risk sharing are limited so that the assumption of risk aversion may be quite relevant. Therefore, we extend the two-stage coalition formation setting by Kolstad and Ulph (2008) by departing from the assumption of risk neutrality. In this paper, we allow for countries to be risk averse, and show that if countries have a relatively high degree of risk aversion, then for a majority of parameter values Full Learning yields higher expected aggregate utility than No Learning. This may help to explain why it has taken such a long time between the start of the process of tackling climate change (the Kyoto Protocol) to reach a more substantial agreement in Paris-countries are risk averse and so needed to wait until they had more information about the risk of climate change before committing to significant action to tackle climate change.
The second strand of literature studies uncertainty and IEA formation with the focus on the role of risk aversion, though under the assumption of No Learning. Endres and Ohl (2003) show in a simple two-player prisoners' dilemma, using the mean-standard deviation approach to capture risk aversion, that risk aversion can increase the prospects of cooperation once it reaches a certain threshold. The reason is that the benefits of mutual cooperation increase relative to the payoffs of unilateral cooperation and no cooperation because cooperation reduces the variance of payoffs. The more risk averse players are, the more attractive cooperation becomes compared to free-riding. In their model, there is a first threshold above which the prisoners' dilemma turns into a chicken game and a second threshold above which the game turns into an assurance game. Compared to their paper, we allow for an arbitrary number of players, model cooperation as a two-stage coalition formation game and consider explicitly the role of learning. Bramoullé and Treich (2009) consider risk-averse players in a global emission model, in which all players behave non-cooperatively as singletons. They show that equilibrium emissions are lower under uncertainty than under certainty, as part of a hedging strategy, but the effect on global welfare is ambiguous. The authors also find that emissions decrease with the level of risk aversion. Unlike our paper, Bramoullé and Treich are not concerned with learning and coalition formation. Boucher and Bramoullé (2010) consider the effects of risk aversion on coalition formation, but only with No Learning. They analyze the formation of an international environmental treaty using a similar coalition game and payoff function as adopted in this paper. Using an expected utility approach, their analysis focuses on the effect of uncertainty and risk aversion on signatories' efforts, the participation level in an agreement and expected aggregate utility. They show that if additional abatement reduces the variance of countries' payoffs, then, under risk aversion, an increase in uncertainty tends to increase abatement levels and may decrease equilibrium IEA membership while the reverse is true if additional abatement increases the variance of countries' payoffs. 4 In this paper, our model of No Learning satisfies the first condition, but we extend the analysis of Boucher and Bramoullé (2010) by considering also Partial Learning and Full Learning.
Thus, taken together, in our paper, we generalize the analysis of Kolstad and Ulph (2008) by allowing for risk aversion, and the analysis of Boucher and Bramoullé (2010) and Endres and Ohl (2003) by considering the role of learning. The key findings are that as countries become more risk averse it is no longer the case that for most parameter values the scenario of No Learning yields the highest expected aggregate utility, but increasingly it is the scenario Full Learning. Moreover, the set of parameter values for which the scenario Partial Learning yields the highest expected aggregate utility, which is a small subset of such values when countries are risk neutral, becomes slightly larger as countries become more risk averse. Thus, we qualify the negative conclusion about the role of learning in a strategic context if players are sufficiently risk averse.
As in Kolstad and Ulph (2008), our model consists of a two-stage game of IEA formation. The damage cost from global emissions is uncertain and three learning scenarios are considered. Risk aversion is incorporated into the model, through risk preferences and the expected utility framework. In our model, emissions last for just one period, which may seem restrictive in the context of dynamic environmental problems such as climate change. However, it has been shown in the literature on IEA formation under uncertainty that one period models produce similar results to multi-period models. For instance, Kolstad and Ulph (2008), using a one-period model, and Ulph (2004), using a two-period model, found similar results regarding the implications of different scenarios of learning for the size of an IEA and expected welfare. 5 Taken together, the tension we seek to capture in our modeling is between No Learning, where countries decide to join an IEA and base their decisions for ever on expected damage costs ignoring any later information, Full Learning where countries delay making any decision to join an agreement until (almost) all uncertainty about damage costs has been resolved, or Partial Learning where countries start the process of joining an IEA knowing that as they get better information they will be able to use that to adjust their emissions policies. This would seem to be particularly relevant to the kind of situation to which Weitzman (2009) has drawn attention-a small probability of catastrophic climate change-but also generally, given the continuous updates in recent years about expected damages from climate change.
The paper proceeds as follows. In Sect. 2, we set out the theoretical model and present our general results in Sect. 3. Section 4 presents some additional results based on simulations, while Sect. 5 summarizes our main conclusions and implications for future research.

No Uncertainty
To establish the basic framework, we set out the model with no uncertainty. There are N identical countries, indexed i 1, . . . , N . Country i produces emissions x i with aggregate emissions denoted by X N i 1 x i . Aggregate emissions cause global environmental damages. The cost of environmental damages per unit of global emissions is γ and the benefit per unit of individual emissions is normalized to 1. (Thus, γ essentially measures the cost-benefit ratio.) The payoff to country i, as a function of own and aggregate emissions, is given by with π 0 a positive constant. In this simple model with a linear payoff function, following the literature, the (continuous) strategy space can be normalized to x i ∈ [0, 1]. If benefits from emissions are lower than damage costs then the equilibrium emissions are zero. If benefits exceed costs, equilibrium emissions are set at their upper bound, which is normalised to 1. To make this model interesting, we make the following assumption: The individual benefit exceeds the individual unit damage cost from pollution, i.e. 1 > γ (so countries pollute in the Nash equilibrium) but does not exceed the global unit damage cost, i.e. 1 < γ N (so countries abate in the social optimum); the second condition is a sufficient condition to ensure that π i (.) > 0, which we will need when we consider expected utility. 6 In order to study coalition formation, we employ the widely used two-stage model of IEA formation (Carraro and Siniscalco 1993;Barrett 1994) which is solved backwards. In stage 2, the emission game, for any arbitrary number of IEA members n, 1 ≤ n ≤ N , the members of the IEA and the remaining countries set their emission levels as the outcome of a Nash game between the coalition and the fringe countries. 7 That is, the coalition members together maximize the aggregate payoff to their coalition, whereas fringe countries maximize their own individual payoff. Hereafter, symbols c and f will be used to denote coalition countries and fringe countries, respectively. Given 1 > γ , x f 1 follows; coalition members chose x c 0 if 1 ≤ γ n, and so π f (n) π 0 + 1 − γ (N − n) and π c (n) π 0 − γ (N − n); however if 1 > γ n, then coalition members will also pollute, x c 1 and so π c (n) π f (n) π 0 + 1 − γ N . 8 Knowing the payoffs to coalition and fringe countries for any arbitrary number of IEA members, we then determine the stable (Nash) equilibrium in stage 1, the membership game. No member should have an incentive to leave the coalition, that is, the coalition is internally stable, π c (n) ≥ π f (n−1), and no fringe country should have an incentive to join the coalition, that is, the coalition is externally stable, π f (n) > π c (n + 1). 9 It is now easy to show that the stable IEA is of size n * (γ ) I (1/γ ), which is the smallest integer no less than 1/γ . Consider internal stability and consider the non-trivial situation where n members do not pollute because 1 ≤ γ n. If after one member left the coalition, the remaining coalition members continued not to pollute, that is 1 ≤ γ (n − 1) was satisfied, then the gain from leaving would be positive: the additional benefit is 1 and the additional damage is γ , with 1 > γ by Assumption 1. Thus, a coalition of n members can only be stable if and only if 1 > γ (n − 1) is true after one member left as the remaining coalition members would switch from x c (n) 0 to x c (n − 1) 1. Then the additional benefit from pollution of 1 falls short of the additional damage γ n, as by assumption 1 ≤ γ n in the initial situation with n members. It is easily checked that such an equilibrium is also externally stable. The aggregate payoff, that is, the sum of the payoffs over all countries, when a stable coalition of size n * (γ ) forms is given by: (γ ) ≡ n * (γ )π c (n * (γ )) + (N − n * (γ ))π f (n * (γ )) Thus, this simple model provides a relationship between the unit damage cost γ and the equilibrium number of coalition members. The equilibrium is a knife-edge equilibrium with n * (γ ) countries forming the coalition, which de facto dissolves once a member leaves the coalition as no country would abate anymore. The equilibrium coalition size weakly decreases in the cost-benefit ratio from emissions γ -the larger is γ the smaller is the number of countries in a stable IEA.

Uncertainty, Risk Aversion and Learning
Now assume that the unit damage cost of global emissions is uncertain and equal for all countries, both ex-ante and ex-post. We denote the value by γ s in the state of the world s and hence (1) becomes: Following Kolstad and Ulph (2008) and Boucher and Bramoullé (2010), we assume for simplicity that γ s can take one of two values: low damage costs, γ l , with probability p, and high damage costs, γ h , with probability 1 − p, where γ l < γ h and 0 < p < 1. We denote byγ ≡ pγ l + (1 − p)γ h the expected value of unit damage costs and byγ 0.5(γ l + γ h ) the median value of damage costs. We define n l ≡ I (1/γ l ), n h ≡ I (1/γ h ),n ≡ I (1/γ ), n ≡ I (1/γ ) and make the following assumptions: Assumptions 2(i) and 2(ii) are just the analogues to Assumption 1(i) and 1(ii) when there is uncertainty. Assumption 2(iii) means that uncertainty matters, in the sense that it implies significant differences in the size of the stable IEAs that would arise if we knew for certain which state of the world prevailed.
To allow for risk aversion, we assume that each country has an identical utility function over payoffs: u(π i ), u (π i ) > 0, u (π i ) < 0. While ex-ante countries face uncertainty about the true value of unit damage costs, we want to allow for the possibility that countries may learn information during the course of the game which changes the risk they face. We shall follow Kolstad and Ulph (2008) in considering three very simple scenarios of learning, denoted by m, m ∈ {N L, P L, F L}. With No Learning (m NL) countries make their decisions about membership and emissions with uncertainty about the true value of unit damage costs. With Full Learning (m FL) countries learn the true value of unit damage costs before they have to take their decisions on membership (stage 1) and emissions (stage 2). With Partial Learning (m PL) countries learn the true value of damage costs at the end of stage 1, that is after they have made their membership decisions but before they make their emission decisions (stage 2). Thus, in this simple analysis, learning takes the form of revealing perfect information. 10 We will compare the outcomes of the three scenarios of learning in terms of the expected size of IEAs and expected aggregate utility from an ex-ante perspective, i.e. before stage 1.

Analytical Results
In this section, we set out the equilibrium of the IEA model for each of the three models of learning with risk aversion, generalizing the results of Kolstad and Ulph (2008) who assumed risk neutrality. The proofs are provided in "Appendix A1".

Full Learning
We start with the benchmark scenario of Full Learning (FL). Players know the realization of the damage parameter γ at the outset of the coalition formation game, i.e. before stage 1. Thus, the results follow directly from what we know from the game with certainty in Sect. 2.1 above.
Proposition 1: Full Learning If state s l, h has been revealed before stage 1, then the subsequent membership and emission decisions are as in the model with certainty with damage cost parameter γ s . In each state s l, h the size of the stable IEA is n s I(1/γ s ) and the expected membership is E(n FL ) pn l + (1-p)n h . The expected aggregate utility is given by: Note that with Full Learning, while the degree of risk aversion does not affect the expected size of the IEA it will affect expected utility. Importantly, the expected size and expected aggregate utility are computed from an ex-ante perspective to make a comparison with the other models of learning meaningful.

No Learning
In this section, we address the scenario of No Learning in which players take their membership (stage 1) and emission (stage 2) decisions under uncertainty. 11 We begin by solving for optimal emissions of countries for any number of IEA members n. Since the benefit of one unit of emissions exceeds the damage cost in both states of the world, it is straightforward to see that fringe countries will always pollute. To solve for the optimal emissions for a coalition member for any n, which we denote by x c (n), we need to introduce some notation. We define: ) is the expected utility to an IEA country when there are n IEA members who set emissions x c and all fringe countries set emissions equal to 1. Then: < 0 and hence it is optimal for IEA countries to completely abate, while if γ h n ≤ 1, then ∂ E(u c ) ∂ x c > 0 and hence it is optimal for IEA countries to completely pollute. To get tighter bounds on when IEA countries completely pollute or abate, we defineñ as the largest value of n such that: andñ as the smallest value of n such that: We summarise the results on emissions in the following Lemma:

Lemma 1: Emission Decisions with No Learning
As already noted, for any size of an IEA, n, fringe countries always choose the upper limit of emissions. For IEA members, there is a critical range of values for n, [ñ,ñ], which lies between n h andn such that IEA members will completely abate if n ≥ñ, and choose the upper limit of emissions for n <ñ; but if there are values of n which lie within the range ñ,ñ , then coalition members choose a level of emissions below the upper limit. Note that, as in Boucher and Bramoullé (2010), with both risk neutrality and risk aversion x c (n) 0 for n ≥n and x c (n) 1 for n <ñ ≤n. With risk neutrality for bothñ ≤ n <ñ andñ ≤ n <n the emissions are x c (n) 1; while with risk aversion 0 ≤ x c (n) ≤ 1 forñ ≤ n <ñ and x c (n) 0 forñ ≤ n <n. So forñ ≤ n <n aggregate emissions are lower with risk aversion than with risk neutrality.

Proposition 2: No Learning With No
Learning, for all parameter values, there exists a stable IEA with membership n NL , which is the same in both states of the world with n N L ∈ [ñ,ñ]. This is also expected membership, i.e. E(n N L ) n N L ∈ [ñ,ñ] which is (weakly) lower than under risk neutrality with E(n N L ) n asñ ≤n. Emissions of fringe and IEA members are given in Lemma 1. Expected aggregate utility is given by: So the expected equilibrium coalition size is (weakly) smaller under risk aversion than risk neutrality. With uncertainty, countries are unsure about the state of the world. With risk aversion, and hence concave utility, countries shy away from the commitment to be a member in an IEA. This is in line with the findings in Boucher and Bramoullé (2010).

Partial Learning
In the scenario of Partial Learning, countries have to make their decision on whether to join an IEA without knowing the true damage cost of emissions, but can make their subsequent emission decisions based on that knowledge. It follows that the emission decisions of countries do not depend on risk aversion and so are the same as in Kolstad and Ulph (2008). Since for one unit of emissions the benefit exceeds damage costs in each state of the world, a fringe country will optimally set x f ,s 1, s l, h; for an IEA member optimal emissions depend on the size of the IEA n; so n ≥ n l ⇒ x c,s (n) 0, s l, h; n h ≤ n < n l ⇒ x c,l (n) 1 and x c,h (n) 0; n < n h ⇒ x c,s (n) 1, s l, h. That is, fringe countries always pollute; if there are at least n l IEA members, then IEA members always abate; if there are less than n h IEA members, then IEA members always pollute; otherwise IEA members pollute in the low damage cost state and abate in the high damage cost state.
As in Kolstad and Ulph (2008), for certain values of p there may be more than one stable IEA with Partial Learning. In our model a second stable IEA exists iffp ≤ p ≤ 1 wherẽ Kolstad and Ulph (2008). Irrespective of the degree of risk aversion, it is straightforward to see that γ h → 1 ⇒p → 0. However, we have not been able to determine analytically howp varies with the degree of risk aversion, as this depends on the exact form of the utility function. In Sect. 4 we report our findings on this from our simulation results.
Ifp ≤ p ≤ 1, then there is a second stable IEA with n P L2 n l members. In both states of the world, coalition members abate and fringe countries pollute. Expected aggregate utility is given by: Since the second equilibrium Pareto-dominates the first equilibrium if it exists, expected membership is either E(n P L 1 ) n h if p <p or E(n P L 2 ) n l ifp ≤ p ≤ 1.
As the degree of risk aversion affectsp it has an effect on the likelihood of a second coalition with higher membership n l being stable. This effect is further explored in Sect. 4.

Comparison Across the Three Scenarios of Learning
This section presents analytical results on the ranking of the learning scenarios in terms of expected membership and expected utility. General results are exposed as well as results on special cases with limit parameter values.

General Results
In this sub-section, we investigate what we can say generically about expected IEA membership, payoffs and expected utility across the four possible equilibria, FL, NL, PL 1 and PL 2 .
In terms of expected membership of an IEA it is clear that since E(n P L 1 ) n h , this equilibrium has the lowest expected membership while since E(n P L 2 ) n l , this equilibrium has the highest expected membership. Note also that: Moreover it is straightforward to show that: Hence: From Proposition 2 and Lemma 1 we have that: (7a) and (7b) are not sufficient to ensure that E(n N L ) ≤ E(n F L ). Moreover, as 0 < p < 1, then E(n P L 1 ) < E(n F L ) < E(n P L 2 ). Thus, as we shall see, the following two rankings can occur, according to the parameter values: In terms of payoffs across the four equilibria, it is straightforward to see from Propositions 1, 2 and 3 that: For NL, in the low damage cost state of the world, the highest payoff to coalition members is when x c 1, which is less than or equal to the payoff to coalition members in PL 2 since n l γ l ≥ 1; in the high damage cost state of the world, the highest payoff to coalition members is when x c 0, which is less than the payoff to members in PL 2 since E(n N L )γ h < n l γ h . So it must be the case that: The results in (9a) and (9b) allow us to rank a number of the payoffs across the four possible equilibria for both members and fringe countries in the high and low damage cost states of the world. However, this is not sufficient to allow us to rank expected aggregate utility across different models of learning at an analytical and general level because (i) the weights of the different utilities in the aggregation depend on the equilibrium coalition size, which differs across the learning scenarios; and (ii) how differences in payoffs translate into differences in expected aggregate utility depend on the exact form of the utility function. The next section reports the simulations we have carried out to compare expected IEA membership and expected welfare across the different models of learning.
The analytic results obtained allow us, however, to get some insights about the role of risk aversion on expected IEA membership and welfare. Comparing to the case of risk neutrality explored by Kolstad and Ulph (2008), we conclude that risk aversion does not affect the expected coalition under FL but (weakly) decreases it under NL. Regarding PL it does not affect the two possible coalition sizes under PL, n P L 1 n h and n P L 2 n l , although it affects the likelihood of the larger equilibrium. Hence, if we restrict our analysis to the extreme cases of FL and NL, we can conclude that the prospects of learning being conducive to larger IEAs are higher under risk aversion. Thus, the result found by Kolstad and Ulph (2008) that expected welfare is higher under NL compared to FL is less likely under risk aversion. Adding PL to this picture, the novelty is that the likelihood of the larger equilibrium depends on the level of risk aversion.

Special Cases
In this subsection, the comparison of learning scenarios in terms of expected IEA membership and expected utility is undertaken for special parameter values. Following Karp (2012), we use two limit cases regarding the probability of low damages: p ε ≈ 0 and p 1 − ε ≈ 1, where ε denotes an infinitesimal. Limit values for the damage cost from pollution were also considered: γ l 1 N + ε ≈ 1 N and γ h 1 − ε ≈ 1.
Lemma 2 shows the results that could be obtained analytically on the ranking of learning scenarios, in terms of expected membership.

Lemma 2: Expected Membership
When high damages are very likely, p ≈ 0, then FL leads to larger expected IEA membership than PL and NL, irrespective of the level of risk aversion. Hence, in this context, learning is conducive to the formation of larger agreements. The opposite holds when low damages are very likely, p ≈ 1, with PL and NL leading to larger expected IEA membership.
In the context of risk neutrality, Karp (2012) found that p ≈ 0 implies E n F L > E n N L and p ≈ 1 implies E n N L > E n F L . Thus, we show that Karp's result also holds under risk aversion and we have extended it to the scenario of PL.
If the low damage cost from pollution is close to its lower bound, γ l ≈ 1/N , then under PL the smaller equilibrium coalition size forms, n P L 1 n h , and hence the expected size of an IEA under FL is higher than under PL. If the high damage cost is at its upper bound, γ h ≈ 1, then the larger IEA forms under PL, n P L 2 n l , which exceeds the expected coalition size 123 under FL. For these two limit values of the damage cost, analytical results could not be found on the rankings between FL and NL, as well PL and NL. The results of the special cases in terms of expected utility are shown in Lemma 3.

Lemma 3: Expected Utility
Regarding expected utility, for each special case we could obtain only pairwise rankings of the three learning scenarios, and not a complete ranking. When high damages are very likely, p ≈ 0, then FL yields higher expected utility than PL. When these damages have a low likelihood, p ≈ 1, then PL and NL lead to same expected utility. When low damage cost from pollution is close to its lower bound, γ l ≈ 1/N , FL provides a larger expected utility than PL. When γ h ≈ 1, then as we showed in Sect. 3.3,p ≈ 0, and PL 2 will be the selected equilibrium for Partial Learning, so expected utility will be higher than with either FL or NL.
Combining the results of expected membership size and expected utility a few conclusions can be made. First, if p ≈ 0 then learning leads to a better outcome in terms of membership, E(n F L ) > E(n P L ) E(n N L ). Regarding utility, the only analytical result obtained also points in that direction, E(U F L ) > E(U P L ). Second, for p ≈ 1, learning leads to worse results in terms of membership, E(n P L ) E(n N L ) > E(n F L ). The result obtained on utility, E(U N L ) E(U P L ), points to a neutrality of learning. Third, for γ l ≈ 1/N the results obtained indicate an advantage of learning in terms of membership, E n F L > n P L n h , and utility E(U F L ) > E(U P L ). The opposite occurs for γ h ≈ 1. Using these four limit cases, we can conclude that under risk aversion the role of learning in terms of membership and utility depends on the distribution of the damage cost, namely parameters, γ l , γ h , p. This is in line with the results obtained by Karp (2012) under risk neutrality. In the next section we comment on the implications for these limiting results arising from our simulations.

Results from Simulations
There are three sets of issues we were unable to resolve analytically in Sect. 3 and which we explore using numerical simulations. (i) What is the expected size of the IEA in the case of No Learning, E(n N L ), in relation to the theoretical limitsñ andñ and, more importantly, to the key parameters of our model, n h andn and how does this vary across different degrees of risk aversion? (ii) In the case of Partial Learning what is the critical value of the likelihood of low damage state of the worldp such that forp ≤ p ≤ 1 there is second stable IEA (n P L 2 n l ) and how doesp vary across different degrees of risk aversion? (iii) Most importantly, how does the expected size of IEA and expected aggregate utility compare across the three different models of learning, Full Learning (FL), No Learning (NL), and Partial Learning (PL 1 , PL 2 ) and how does this comparison depend on the degree of risk aversion and other parameters of the model?
We now discuss our choice of parameter values for the simulations. To guarantee nonnegative payoffs we set: ( 1 0 ) and to ensure that payoffs are sensitive to countries' abatement decisions we chose B 1.1. 12 For the parameter of relative risk aversion, ρ, in the CRRA utility function u(π i ) [1/(1 − ρ)]π 1−ρ i we use ρ 0 (risk-neutral) as a benchmark case and then choose 7 values of ρ 0.05, 0.5, 0.99, 2.5, 5.0, 10.0, and 20.0 to capture what we believe to be a reasonable range of values for country-level risk aversion. 13 For the remaining key parameters (N, p, γ l , γ h ), we report results of 2 sets of simulations. In Sect. 4.1 we present results from a set of simulations using a small number of specific values of the parameters (N, p, γ l , γ h ) to get some insights into what drives results. Then in Sect. 4.2 we present results of a more general set of 500,000 simulations where the parameters (N, p, γ l , γ h ) are chosen randomly within a specified range. The details of how these parameters are chosen are set out in the "Appendix A2".
In addition to reporting the membership of the IEA and the expected utility for each of our three models of learning, m, we also report expected damage costs as a percentage of GDP, 14 which we define as D m . The reason for doing this is to indicate that the environmental issue has significant implications.
Finally, in Sect. 4.3, we will test the robustness of our simulation results in three respects. 15 First, we check whether the results in Sects. 4.1 and 4.2, derived for B 1.1, hold for higher values of B. Second, we will undertake simulations using the limiting conditions adopted in Lemmas 2 and 3, in Sect. 3.4, to check the ranking of the learning scenarios in terms of expected IEA membership and expected utility. Third, in Sects. 4.1 and 4.2 we assume that each country has a constant relative risk aversion (CRRA) utility function, and we will test whether our key numerical results also hold if we use a constant absolute risk aversion (CARA) utility function.

Results from Simple Simulations
In our simple simulations we use B 1.1 and N 20. The results 16 are shown in Table 1. A total of 13 simulations were used. The first part of the table shows the parameter values we used, while the next three sections show the outcomes for NL, FL and PL respectively. In the first 7 columns we use different values of risk aversion ρ, but keep constant the values for p, γ l , γ h . In the remaining 6 columns we set ρ 2.5, the midpoint of the values we use, and vary in turn p, (columns 8 and 9), and then γ l , γ h by choosing higher and lower values ofγ (columns 10 and 11) and then tighter and wider spreads of γ l , γ h aroundγ . For these parameter values, expected damage costs as a percentage of GDP ranges between 0.34 and 3.40. 12 We are grateful to a referee for this suggestion. 13 Meyer and Meyer (2006) note that the CRRA utility function is widely used in empirical studies of risk aversion, and that empirical estimates of ρ vary between 0 and 100. They note that such estimates depend on the variable that enters the utility function, and for the three most commonly used variables-wealth, income and profits-the appropriate empirical estimate increases as one moves from wealth to profits. In our one-period model the relevant variable is income, though there is no distinction between wealth and income. Hence, we have chosen a range of values for ρ at the lower end of the range noted by Meyer and Meyer, namely between 0 and 20. 14 A referee suggested we calculate this measure to indicate that the environmental problem is a significant one. We define GDP as: π 0 N + (N − n) + nx c ; each coalition country produces output π 0 + x c ; each fringe country produces output π 0 + 1. 15 We are grateful to two referees for these suggestions. 16 The key results are not sensitive to the choice of N.
123 Table 1 Results of simple simulations: B

123
We now turn to the implications for the three sets of issues we want to examine using our simulation results.

Implications for x c and n NL
From the first two rows, simulations 1-7, in the No Learning part, we see first that for both low and high values of ρ the gap betweenñ andñ tends to zero. The reason is that, for small values of ρ,ñ andñ are bunched closely to the upper limitn while for large values of ρ they are bunched closely to the lower limit n h . Positive gaps between them only occur for intermediate values of ρ. So, as can be seen from rows 3 and 4, it is more likely we get interior solutions for x c for intermediate values of ρ. So these results show that as ρ increases, n NL steadily decreases fromn to n h .
As we saw from Lemma 1 as p tends to 0,ñ,ñ,n → n h , and hence the gap betweenñ and n vanishes. This result was obtained in simulation 8 using p 0.05. From Lemma 1 we also know that when p tends to 1, the gap betweenñ andñ also converges to zero asñ,ñ,n → n l . In simulation 9 we set p 0.95, which is not sufficiently high for the convergence betweeñ n andñ, and hence we get an interior solution. Similarly, from columns 10, 11, 12 and 13 we see that for lower values ofγ and larger spreads of γ s aroundγ we are more likely to have significant gaps betweenñ,ñ,, and hence more likely to get an interior solution for x c .

Implications of Variations in forp
The first row of the results for Partial Learning show, for different values of risk aversion ρ, the critical value ofp such that for p ≥p there exists a second Partial Learning equilibrium. The results of simulations 1-7 show that as risk aversion increasesp tends to falls, so there is a wider range of parameter values for which there exists a second stable PL equilibrium. Not surprisingly, columns 8 and 9 confirm thatp does not depend on p.
However, in looking at simulations 10-13 there is no clear pattern of results. In fact, a broader range of simulation results we have carried out shows thatp is quite sensitive to variations in these parameter values. While the decreasing relationship betweenp and ρ shown in Table 1 holds fairly generally, it is not universally true for all parameter values, and these broader simulations show that there is no systematic relationship betweenp and the parameters γ l , γ h . The simulations we report in Sect. 4.2 give a better perspective on how risk aversion affectsp and hence the likelihood of there being a second Partial Learning equilibrium.

Implications for Ranking Expected Size of Stable IEA, Payoffs and Expected Welfare Across Different Models of Learning
It is straightforward to see from Table 1 that for all the parameter values we have used it is always the case that: consistent with (8a). This reflects the limited range of parameter values we used in Table 1.
In terms of the rankings of the payoffs for both members and fringe countries in the high and low damage cost states of the world, it is readily checked that the payoffs in Table 1 are consistent with the rather limited set of comparisons we were able to make in (9a) and (9b). Importantly the results also confirm that these are the only general results that hold. In particular, for the different parameter values used in Table 1, all possible rankings of the payoffs for NL relative to FL and PL1 are possible. Finally, the last row of Table 1 shows the rankings of expected welfare across the different models of learning for the different parameter values, where for Partial Learning it is only for the parameters in column 9 that we are able to select PL2 as the appropriate equilibrium. Columns 1-7 confirm that as ρ increases, welfare for NL moves from being the highest relative to FL and PL1 to the lowest, while columns 8-13 show that keeping ρ 2.50, introducing the other variations in parameter values, other than tightening the spread of γ l , γ h around the median (column 12), also imply that the welfare from NL moves from being the highest relative to FL and PL to the lowest.

Implications for x c and n NL
Recall that from Kolstad and Ulph (2008) the expected size of the stable IEA with No Learning when countries are risk neutral is n N L n. The first 3 rows of Table 2 shows how increasing ρ affects how n NL relates toñ,ñ and whether 0 < x c < 1 or x c 0; while rows 4-7 show how n NL relates to n h ,n.
Row 1 shows that as ρ increases the proportion of cases for which x c is an interior solution increases and then falls. The reason is the same as we argued in Sect. 4.1.1. As shown in rows 3, 5 and 7, for ρ close to 0, the large majority of cases have n NL n, withñ,ñ bunched close ton andñ n N L ñ . As ρ increasesñ,ñ steadily decline, the gap between them opens up, allowing more opportunity for x c to take an interior value, and the proportion of cases where n NL n declines. But as ρ continues to increase above 1ñ,ñ tend towards n h , the gap between them narrows, allowing less opportunity for x c to take an interior value.

Implications of Variations in ρ forp
The first row of Table 3 below shows for each value of ρ the average value ofp, while the second row shows the proportion of cases for which p ≥p and hence for which we can select the second equilibrium for Partial Learning, the one with the highest expected aggregate utility across the different models of learning. 17 Consistent with the results in Table 1, as ρ increases the average value ofp decreases and the proportion of cases for which p ≥p increases. However for any given value of ρ there is significant variation in the value ofp caused by the variation of the other parameters in the simulations; we do not report the details here, but as an example with ρ 0.99,p varies between 0.7864 and 0.9999. So the general message is that as risk aversion increases the proportion of cases for which the second Partial Learning equilibrium exists also increases.

Implications for Ranking Expected Size of Stable IEA, Payoffs and Expected
Welfare Across Different Models of Learning Rows 3-5 of Table 3 show for each model of learning the maximum value (across all 500,000 simulations) of damage costs as a percentage of the value of output. 18 Not surprisingly this occurs in the high damage cost state of the world for which the equilibrium is the same for Full Learning and the first Partial Learning equilibrium, which is why this percentage is the same for Full Learning and Partial Learning. For No Learning the maximum value of this percentage will tend to occur when n NL is close to n h , and hence the number of signatories is low, and so would be similar to Full Learning and Partial Learning. But there is an additional consideration. As we have seen from Table 2, there are parameter values for which signatory countries will also pollute, which will raise damage costs, and this effect is particularly pronounced for intermediate values of ρ.
Rows 6-8 show the % of cases for which the expected IEA membership size is highest in the three models of learning, while rows 9-11 show the similar figures for expected aggregate utility. Consistent with the theoretical results in Sect. 3.4.1, rows 6 and 9 show that on both measures the highest values occur for Partial Learning only when the second Partial Learning equilibrium is feasible, and so these percentages are exactly the same as the percentage of cases for which p ≥p, as shown in row 2 of Table 3.
Our key results are in rows 7-8 and 10-11, which show that as risk aversion increases from 0.0 to 20.0 the percentage of cases in which NL gives the highest number of expected signatories falls from just under 9.5% to close to zero, and the percentage of cases in which NL yields the highest level of expected utility falls from just under 87% to just over 2%. These trends reflect both the decline in the IEA membership fromn to n h and the fact that, for intermediate values of ρ, IEA signatories produce positive emissions. The implications of the results for NL and PL are that as ρ increases from 0.0 to 20.0 the percentage of cases where FL has the highest expected utility rises from just over 11% to just under 92%. These increases are monotonic between ρ 0.0 and 10.0, but fall slightly when ρ 20.0, due to the increase in the percentage of cases where PL has the highest expected signatories and expected utility from just under 4% to just under 6%. It is also important to note that percentage of cases where FL ranks first in expected utility exceeds that percentage for NL as ρ increases from 0.0 to just 0.99.
So the general message is that quite modest increases in the level of risk aversion overturn the presumption from the literature using risk neutrality that No Learning yields the highest 123 level of expected utility. This is also true for the expected utility of an individual country from an ex ante perspective, which, for symmetric countries, is simply aggregate expected utility divided by the number of countries. Thus, if countries were able to choose which model of learning they should adopt, then, for quite modest levels of risk aversion, they would favour leaving the decision to form an IEA and set their emissions until they had full information about the likely damage cost of climate change.

Robustness Checks
In this sub-section we briefly discuss the sensitivity of the results we derived in Sects. 4.1 and 4.2 to changes in underlying assumptions in three respects. First, in the simulations in Sects. 4.1 and 4.2 we set the value of the parameter B in π 0 to B 1.1 to ensure that payoffs were sensitive to countries' abatement decisions. We have also run the simulation results reported in Tables 1, 2 and 3 above with higher values of B, whilst also ensuring that damage costs from emissions remain significant. We present the results in "Appendix B". 19 These show that for higher values of B, the key findings in Tables 1, 2 and 3 are slightly less sensitive to changes in risk aversion, but the overall results are robust; in particular, in relation to the results in Table 3, the percentage of cases where the payoff from Full Learning is the highest across all models of learning now increases monotonically in ρ (from 11% to 92%), and is greater than the percentage of cases where the payoff from No Learning is highest for values of ρ ≥ 0.99.
Second, we checked the limiting cases we discussed in Lemmas 2 and 3 in Sect. 3.4 above through numerical simulations. To do this we used essentially the same underlying parameter values as in Table 1, N 20, B 1.1, γ l 0.0615, γ h 0.1289, and p 0.5, but then successively took limiting values for individual parameters of p ≈ 0.0, p ≈ 1.0, γ l ≈ 1/N , γ h ≈ 1.0, holding all other parameters at their original levels. To save space in Table 4 we report the results for 3 values of ρ 0.05, 2.50, 10.0, which illustrate the limiting results derived in Lemmas 2 and 3. Moreover, the simulation results also indicate that for p ≈ 0 not only E(U F L ) > E(U P L ) but also E U F L > E U N L , reinforcing the message that learning leads to a better outcome in terms of expected utility. For p ≈ 1, besides E U P L E U N L the simulation results reveal a full ranking of expected utility, E U F L ≤ E U P L E U N L , which indicate a negative effect of learning. For the third limit parameter value, γ l 1/N , the results indicate that not only E U F L > E U P L but also the level of risk aversion affects the ranking of expected utility between FL and NL, with E U N L > E U F L for ρ 0.05 and E U F L > E U N L for ρ 10. Finally, for γ h 1 a full ranking of the learning scenarios was obtained: Finally, all the simulation results we have reported so far have been based on the constant relative risk aversion (CRRA) utility function: where ρ ≥ 0 measures the degree of relative risk aversion. We now test the robustness of our key results in Table 3 to the use of a constant absolute risk aversion (CARA) utility function:  Rank Rank where λ > 0 measures the degree of absolute risk aversion. In checking how our results using CARA relate to those using CRRA we need to ensure that the values we choose for the parameters λ are broadly consistent with those we chose for ρ. For any level of π i the coefficient of absolute risk aversion is related to the coefficient of relative risk aversion by the formula λ ρ/π i , and we discuss in "Appendix A2" how we use this relationship to choose values of λ which are consistent with those we have used for ρ.
We have re-run the simulation results we presented in Table 3, which used the CRRA utility function, but using the CARA utility function. The choice of the ranges for all other parameter values other than ρ are exactly the same as in Table 3. The results are shown in Table 5, where, as we have just discussed above, we have chosen the value of the parameter λ in the CARA utility function to be consistent with the values of ρ we used in Table 3.
It is clear to see that while the individual numbers in Table 5 are slightly different from those in Table 3 all the key findings we derived from Table 3 relating to the percentages of cases for which the expected size of IEA or expected utility are largest across the three models of learning carry over to those in Table 5.
So we conclude that the key findings from the simulation results reported in Sects. 4.1 and 4.2 are robust to a number of changes in the main aspects of the simulation modeling we have employed.

Summary and Conclusions
This paper bridges two strands of literature on the formation of IEAs under uncertainty by addressing the combined roles of learning and risk aversion. This approach allowed us to explore the impact of learning for any given level of risk aversion as well as the impact of changing risk aversion under various scenarios of learning.
We generalized the model of Kolstad and Ulph (2008) who showed that with risk neutrality the possibility of learning more information about environmental damage costs generally had rather pessimistic implications for the success of the formation of IEAs. The authors found that learning reduces expected aggregate and individual payoffs for a wide range of parameter values. This suggests that countries are better off forming an IEA rather than waiting for better information.
In this paper, we have allowed countries to be risk averse using an expected utility approach which maps payoffs into utility. We first derived the theoretical results for each of our three scenarios of learning with risk aversion, confirming the main findings of Boucher and Bramoullé (2010) for the No Learning case. For No learning, risk leads to smaller stable coalitions and higher global emissions. In terms of equilibrium coalitions and global emissions, we showed that Full Learning remains unaffected by risk and changes for Partial Learning are small. However, even with special functional forms for the underlying utility functions there was limited scope for deriving analytical comparisons across our three scenarios of learning, primarily because welfare effects could differ for signatory and non-signatory countries. Our simulation results showed that contrary to the finding with risk neutrality, when countries become significantly risk averse, the set of parameter values for which countries are better off with No Learning compared to Full Learning shrinks significantly and those cases for which this is reversed increases accordingly. This may explain why it has taken so long for a proper climate agreement to be reached-countries are risk averse and waited till they had much better information about the risks of climate change.
In terms of future research, it would be desirable to use a model with asymmetric countries, though it is unlikely to be possible to derive analytical results; so it may be more useful to introduce different models of learning into Integrated Assessment Models of climate change. It would also be interesting to endogenise the process of learning by allowing countries to invest in research in order to obtain better information.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
As stated in Sect. 3.2, for a given n, each coalition member chooses x c (n) to maximise E(u c (x c (n), n))) ≡ pu(π c,l (x c (n), n))+(1− p)u(π c,h (x c (n), n)) which leads to the following first and second order conditions:

Boundary Values for x c (n)
From (A2a): We now look for tighter bounds for n that guarantees x c (n) 0 and x c (n) 1, respectively. So we now focus on the range n h − 1 < 1 γ h < n < 1 γ l ≤ n l . From (A1b), in this range To make progress, we treat n as if it was a real value, z. To save notation define: We first definez as the unique value of z such that: ∂ E(u c (1,z)) ∂ x c p(1 − γ lz )θ l + (1 − p)(1 − γ hz )θ h 0 ⇒z pθ l + (1 − p)θ h γ l pθ l + γ h (1 − p)θ h .