Belief updating : Does the ' good-news , bad-news ' asymmetry extend to purely financial domains ?

Belief updating: Does the ‘good-news, bad-news’ asymmetry extend to purely financial domains?* Bayes’ statistical rule remains the status quo for modeling belief updating in both normative and descriptive models of behavior under uncertainty. Some recent research has questioned the use of Bayes’ rule in descriptive models of behavior, presenting evidence that people overweight ‘good news’ relative to ‘bad news’ when updating ego-relevant beliefs. In this paper, we present experimental evidence testing whether this ‘good-news, bad-news’ effect is present in a financial decision making context (i.e. a domain that is important for understanding much economic decision making). We find no evidence of asymmetric updating in this domain. In contrast, in our experiment, belief updating is close to the Bayesian benchmark on average. However, we show that this average behavior masks substantial heterogeneity in individual updating behavior. We find no evidence in support of a sizeable subgroup of asymmetric updators.


Introduction
Throughout our lives, we are constantly receiving new information about ourselves and our environment. The way that we filter, summarize and store this new information is of critical importance for the quality of our decision making. Theories of human behavior under dynamic uncertainty are therefore enriched by an understanding of how individuals process new information. Economists typically write down models where information is summarized in the form of probabilistic 'beliefs' over states of the world, and updated upon receipt of new information according to Bayes' rule. However, the assumption that individuals process information in a statistically accurate way is receiving an increasing amount of attention, with many studies documenting systematic deviations. 1 One important strand of this literature examines whether belief formation and updating is influenced by the affective content of the new information 2 , i.e. whether individuals update their beliefs symmetrically in response to 'good-news' and 'bad-news' (see, for example, Eil and Rao (2011); Ertac (2011); Möbius et al. (2014); Coutts (2019)). 3 Essentially, this literature tests an implicit assumption of Bayesian updating, namely that the only object that is relevant for predicting an individual's belief is her information set, and therefore her beliefs are completely unaffected by the prizes and punishments she will receive in different states of the world. This fundamental assumption -that people update their beliefs symmetrically -is of paramount importance because it underpins the results of a wide range of theoretical studies within economics, including all research in which agents receive new information and form rational expectations. As noted by Brunnermeier and Parker (2005), since Muth (1960Muth ( , 1961 and Lucas (1976), the vast majority of economic research involving uncertainty has built on the rational expectations assumption, with diverse applications including the vast literatures on social learning (see, for example, Chamley (2004) for a review), capital markets (e.g. Fama (1970(e.g. Fama ( , 1976), intertemporal portfolio choice problems (e.g. Mossin (1968); Merton (1969); Samuelson (1969); Lewellen and Shanken (2002)) and consumption savings problems (e.g. Friedman (1957); Hall (1978)). Therefore, it is essential to empirically test whether the assumption that people form beliefs symmetrically in response to 'good-news' and 'bad-news', such that their posterior beliefs are completely unaffected by their hopes and wishes, is consistent with how belief updating actually works.
With this objective in mind, in this paper we employ a laboratory experiment to study how individuals update their beliefs from exogenously assigned prior beliefs in a two-state world when they receive a sequence of partially informative binary signals. In particular, we vary the financial rewards associated with one of the two states of the world in order to test whether these statecontingent financial rewards influence belief updating. A nice feature of this experimental design is that it allows us to compare posterior beliefs in situations where the entire information set is held constant, but the rewards associated with the states of the world are varied. For example, we can compare how two groups of individuals revise their beliefs when both groups share the same prior belief and receive an equally informative signal, but for one group of individuals the signal is 'good news' and for others the signal constitutes 'bad news'. Furthermore, we can conduct a similar exercise for a single individual, by comparing two situations in which she has the same information set, and receives the same signal, but the signal consistutes 'good news' in one instance, and 'bad news' in the other. Our experimental design therefore permits a clean test of the asymmetric updating hypothesis.
In our experiment we study belief updating in two contexts. In our treatment, we examine how subjects update their beliefs when they have an equal stake in each of the underlying states and are therefore indifferent about which state is realized. We compare this with updating behavior in our two treatments, in which a larger bonus payment will be paid if one state of the world is realized. Here, one would expect that subjects prefer that the state with the bonus payment is realized. In all treatments, subjects are told the true prior probability of the two states, and then update their beliefs upon receipt of a sequence of informative but noisy signals. 4 We elicit this sequence of beliefs. In our experiment, we can therefore conduct two separate tests of the asymmetric updating hypothesis. Firstly, we can compare how the same individual responds to 'good-news' and 'bad-news' within the treatments. Secondly, we can conduct a between-subject comparison of belief updating in the treatment and treatments. Each individual in our experiment faces only one incentive environment. However, since we exogenously endow participants with a prior over the states of the world, we are able to repeat the exercise several times for each individual and study how they update from each of five different priors, p 0 , chosen from the set { 1 6 , 2 6 , 3 6 , 4 6 , 5 6 }. The experimental design and analysis aim to address several challenges that are present when studying belief updating in the presence of state dependent stakes. Firstly, we use exogenous variation in the priors to ensure that the estimates are robust to the econometric issues that arise when a right-hand side variable (i.e. the prior) is a lagged version of the dependent variable (i.e. the posterior). Secondly, we avoid a second type of endogeneity issue, which arises when the underlying states are defined as a function of some personal characteristic of the individual (e.g. her relative IQ) that might also be related to how she updates (see Appendix C for further details). Thirdly, we measure the influence that hedging has on belief elicitation when there are statedependent stakes. Furthermore, we conduct several exercises to correct our estimates for this hedging influence -both experimentally, and econometrically. Fourth, our experimental design allows us to study belief updating from priors spanning much of the unit interval. Importantly, averaging across all subjects, the design generates a balanced distribution of 'good' and 'bad' signals.
The empirical strategy employed permits testing for several commonly hypothesised deviations from Bayesian updating, including confirmatory bias and base rate neglect, however the focus of both the experimental design, and the analysis, is on testing for the presence of an asymmetry in updating. Our results show no evidence in favour of asymmetric updating in response to 'goodnews' in comparison to 'bad-news' in the domain of financial outcomes (when there is a minimal role for ego-utility). Several robustness exercises are carried out in support of this conclusion. Furthermore, we find that average updating behavior is well approximated by Bayes' rule. 5 This average behavior masks substantial heterogeneity in updating behavior at the individual level, but we find no evidence in support of a sizeable subgroup of asymmetric updators.
The evidence reported here contributes to the recent literature studying the asymmetric updating hypothesis across different contexts (Eil and Rao (2011), Ertac (2011), Grossman and Owens (2012), Mayraz (2013), Möbius et al. (2014), Kuhnen (2015), Schwardmann and Van der Weele (2016), Gotthard-Real (2017) Charness and Dave (2017), Heger and Papageorge (2018), Coutts (2019) and Buser et al. (2018)). The results in this literature thus far are surprisingly heterogeneous, with some papers finding an greater responsiveness to good-news; some a greater responsiveness to bad-news; and some no evidence of an asymmetry. 6 There are several candidate contextual and experimental design factors that could potentially be driving these heterogeneous results. Section 7 offers a detailed discussion of these factors, and asks whether the existing body of evidence can help us to detect a systematic pattern that organises the results (for an alternative discussion of this literature, see Benjamin (2019), who taxonomizes the existing evidence on errors in probabilistic reasoning).
The remainder of the paper proceeds as follows. Section 2 outlines the theoretical framework, Section 3 details the experimental design, Section 4 provides some descriptive statistics, Section 5 presents the empirical specification, Section 6 discusses the related literature and Section 7 concludes.

Theoretical Framework and Hypotheses
In the following section, we discuss a simple framework for belief updating that augments the standard benchmark of Bayesian updating by allowing for several of the deviations from Bayes' rule commonly discussed in the psychology and related economics literature. This framework is borrowed from Möbius et al. (2014) and is currently a common approach used in this type of descriptive belief updating study. The basic idea is that, while Bayes' rule captures the normative benchmark for how we might think a rational agent should update her beliefs 7 , it has been argued that, descriptively, people may update their beliefs in ways that depart systematically from Bayes' rule. 8 The framework below serves to facilitate a discussion of these different potential deviations, and motivates the empirical approach that we will use to test whether they are observed in our data.
In short, below we describe a model that embeds the normative Bayesian benchmark, but also allows for commonly discussed belief updating distortions. The aim will be to then make use of this model to test whether subjects update their beliefs like a Bayesian automaton or display some systematic deviations from this Bayesian benchmark. Most importantly, the model permits a discussion of what observed updating behavior should look like if agents update their beliefs asymmetrically in response to 'good-news' and 'bad-news'.

A Simple Model of Belief Formation
We consider a single agent who forms a belief over two states of the world, ω ∈ {A, B}, at each point in time, t. One of these states of the world is selected by nature as the 'correct' (or 'realized') state, where state ω = A is chosen with prior probabilityp 0 (known to the agent). The agent's belief at time t is denoted by π t ∈ [0, 1], where π t is the agent's belief regarding the likelihood that the state is ω = A and 1 − π t is the agent's belief that the state is ω = B. In each period, the agent receives a signal, s t ∈ {a, b}, regarding the state of the world, which is correct with probability q ∈ ( 1 2 , 1). In other words, p(a|A) = p(b|B) = q > 1 2 . Furthermore, the history, H t , is defined 7 This statement is not uncontentious. One argument in favor of using Bayes' rule as a normative benchmark for belief updating is that Bayes' rule captures the objective statistical relationship between a prior probability and a posterior probability, given new information. Therefore, if an individual updates her beliefs according to Bayes' rule, she will always hold beliefs that are as accurate as possible, given her information set. This will allow her to use these optimal beliefs to guide her decision making. However, this argument that Bayes' rule represents the normative optimum rests on the assumption that beliefs serve only an instrumental role in guiding decision making. If, for example, we relax this assumption and allow beliefs to yield intrinsic utility, then this argument no longer holds as it may be optimal for the decision maker to hold beliefs that are distorted away from the Bayesian posterior. 8 One argument used to justify the use of Bayes' rule in descriptive models of human behavior is that evolution should have selected individuals who were able to process new information in a statistically efficient way over individuals who could not. Therefore, Bayesian individuals would have had more accurate beliefs guiding their decision making, and been more likely to survive. as the sequence of signals received by the agent in periods 1, . . . , t, with H 0 = ∅. Therefore, the history at time t is given by H t = (s 1 , ..., s t ).
In order to study how individuals update their beliefs, we follow Möbius et al. (2014) in considering the following model of augmented Bayesian updating: 9 logit(π t+1 ) = δlogit(π t ) + γ a log( The parameters δ, γ a and γ b serve, firstly, to provide structure for a discussion of different ways in which our agent may depart from Bayesian updating, and secondly, to provide a clear prescription for how to test for these departures in our empirical analysis below. If δ = γ a = γ b = 1 then the agent updates her beliefs according to Bayes' rule. However, if we consider deviations from this benchmark, we see that δ captures the degree to which the magnitude of the agent's prior affects her updating. For example, if δ > 1 then this suggests that the agent displays a confirmatory bias 10 , whereby she is more responsive to information that supports her prior. In contrast, δ < 1 suggests she is more responsive to information that contradicts her prior (i.e. base rate neglect 11 ). The former would predict that beliefs will polarize over time, while the latter would predict that over time beliefs remain closer to 0.5 in a two-state world than Bayes' rule would predict.
The parameters, γ a and γ b capture the agent's responsiveness to information. If γ a = γ b < 1 then the agent is less responsive to the information that she receives than a Bayesian updater would be. And if γ a = γ b > 1, then she is more responsive than a Bayesian. For example, if γ a = 2, then whenever the agent receives a signal s t = a, she updates her belief exactly as much as a Bayesian would if he received two a signals, s t = {a, a}. The interpretation of the parameters is summarized in the first five rows of Table 1 below.
9 For a discussion of core properties underlying Bayes' rule, see Appendix B.1. This discussion serves as a motivation for the approach taken in augmenting Bayes' rule in equation 1. For further details regarding the derivation of equation 1, refer to an earlier working paper version of this paper (Barron, 2018), or refer to the comprehensive survey of this topic provided by Benjamin (2019). 10 For a detailed discussion of the confirmatory bias, see Rabin and Schrag (1999). Essentially, it is the tendency to weight information that supports one's priors more heavily than information that opposes one's priors. In this case, when one's prior regarding state ω = A is greater than 0.5, i.e. π t > 0.5, a participant who is prone to the confirmatory bias weights signals that support state ω = A more heavily than signals that support state ω = B; and vice versa when her prior suggests state ω = B is more likely, i.e. π t < 0.5. 11 One can think of base rate neglect in this context as the agent forming her beliefs as if she attenuates the influence of her prior belief when calculating her posterior -i.e. acting as if her prior was closer to 0.5 than it actually was.

Affective States
The discussion in the preceding section has so far focused on purely cognitive deviations from Bayes' rule. The affect or desirability of different states of the world has played no role. However, in most situations in which individuals form beliefs, there are some states that yield an outcome that is preferred to the outcomes associated with other states -i.e. there are good and bad states of the world. For example, an individual would generally prefer to be more intelligent rather than less intelligent, to win a lottery rather than lose, and for the price of assets in her possession to increase rather than decrease. This implies that new information about the state of the world is also often either good-news or bad-news.
In order to allow for the possibility that individuals update their beliefs differently in response to good-news in comparison to bad-news, we relax the assumption that belief updating is orthogonal to the affect of the information. 12 To do this, assume that each of the two states of the world is associated with a certain outcome-i.e. in state ω = A, the agent receives outcome x A , and in state ω = B, she receives x B . There are now two belief updating scenarios: • Scenario 1 ( ): the agent is indifferent between outcomes (i.e. x A ∼ x B ); and • Scenario 2 ( ): the agent strictly prefers one of the two outcomes (i.e.
Now, the question of interest is whether the agent will update her beliefs differently in the and scenarios. Under the assumption that the agent's behavior is consistent with the model described above in Equation 1, this involves asking whether the parameters δ,γ a and γ b , differ between the two contexts.
To guide our discussion, we consider the following two benchmarks. The first natural benchmark is Bayes' rule, which prescribes that all three parameters equal 1 in both the and 12 To avoid ambiguity, in the discussion below, the term 'preference' is usually used to refer to preferences over sure outcomes-never to a preference ordering over lotteries. We will also sometimes refer to 'preferring' one state of the world to another. This simply captures the idea that an individual prefers the realisation of a state in which a good outcome is realised. 7 contexts-statistically efficient updating of probabilities is unaffected by the statedependent rewards and punishments. According to Bayes' rule, news is news, independent of its affective content.
The second benchmark that we consider is the asymmetric updating hypothesis-that individuals respond more to 'good-news' than 'bad-news'. Here, in our simple framework there are two ways to identify asymmetric updating.
Firstly, if we only consider the behavior of individuals within the scenario, we can ask whether there is an asymmetry in updating after signals that favor the more desirable state ω = A ('good-news'), relative to signals that favor the less desirable state ω = B ('bad-news'). For example, if γ a > γ b , this would indicate that the agent updates more in response to 'good-news'. We refer to such an agent as an optimistic updater. Conversely, if we have γ a < γ b then the agent updates more in response to 'bad-news'. We refer to such an agent as a pessimistic updater. 13 Secondly, if we compare behavior between the and scenarios, we can ask whether the parameters of equation 1 differ according to the scenario. We use the postscript c ∈ {A, S} to distinguish the parameters in the two scenarios-i.e. δ S ,γ S a and γ S b in and δ A ,γ A a and γ A b in . In the treatment, where the agent is completely indifferent between the two states, there is no reason to expect her updating to be asymmetric. Therefore, we assume that γ S a = γ S b = γ S . Thus, the difference γ A a − γ S a reflects a measure of the increase in the agent's responsiveness when information is desirable, relative to when information is neutral in terms of its affect. Similarly, γ A b − γ S b is a measure of the increase in the agent's responsiveness when information is undesirable, relative to the case in which information is neutral in affect.
Hypothesis 2 (Asymmetric Updating) Individuals update their beliefs asymmetrically, responding more to good than bad news. Therefore, within the scenario, we will observe γ A a > γ A b . And in a comparison between the and scenarios, we will observe γ A a − γ S a > 0 and γ A b − γ S b < 0. Together, we can summarize the asymmetric updating hypothesis parameter predictions as follows: 13 Notice that these definitions of optimistic and pessimistic updating relate only to the asymmetry in responsiveness to signals supporting different states, but depend on how responsive the agent is to the signals she receives relative to Bayes' rule. Therefore, under the definition given, we can have an optimistic updater who is less responsive to signals supporting the desirable state than a Bayesian (i.e. 1 > γ a > γ b ). This agent is less responsive than a Bayesian to both desirable and undesirable information, but more responsive to desirable information relative to undesirable information.
In our experiment, we will provide evidence on both of these questions. Firstly, we will examine whether there is an asymmetry in updating behavior within the context; and secondly, we will examine whether the parameters of the updating process differ between the and the contexts, in a between subjects comparison. Furthermore, our experiment will allow us to test for other systematic deviations from Bayes' rule, such as those mentioned in the discussion above. Table 1 summarizes the interpretations of the different values that the belief updating parameters may take.

Belief Elicitation and Incentives
In order to empirically test the hypotheses above using an experiment, we would like to be able to elicit our participants' true beliefs. One common feature of many studies using belief elicitation is to assume that participants evaluate prospects using a specific model, and then to show that under the incentives of the elicitation technique, a participant who is following the assumed model should report her true belief (i.e. under the model, truthful revelation is incentive compatible).
However, in the context of studying the relationship between preferences and beliefs, two concerns may be raised regarding this approach: firstly, the assumption that participants follow expected utility involves a stronger assumption than we might wish to make; and secondly, the inherent hedging motive faced by participants who have a stake in one state of the world poses an additional challenge (see Karni and Safra (1995) for a discussion). Therefore we adopt the approach developed by Offerman et al. (2009), and extended to accommodate state-dependent stakes as in Kothiyal et al. (2011).
The central idea behind this approach is to acknowledge that the incentive environment within which we elicit beliefs in the laboratory may exert a distortionary influence on the beliefs which some participants report, relative to the beliefs they actually hold, and then measure this distortionary influence of the incentive environment in a separate part of the experiment. Once we have constructed a mapping from true beliefs to reported beliefs within the relevant incentive environment, we can use this function to recover the participant's true beliefs from her reported beliefs. In other words, our objective is to recover the function that each individual uses to map her true beliefs to the beliefs that she reports within the given incentive environment.
The incentive environment that we will use in our experiment to elicit beliefs is the quadratic scoring rule (QSR). 14 In Appendix B.2, we provide a detailed discussion of way in which reported 14 There are several reasons for adopting this approach: firstly, the QSR has the advantage that it ensures that the decision environment is clear and simple for the participants -essentially they are making a single choice from a list of binary prospects; secondly, the quadratic scoring rule has been commonly used in the literature, with both the theoretical properties and empirical performance having been studied in detail (see, e.g., Armantier and Treich (2013)); thirdly, in a horse race between elicitation methods, Trautmann and van de Kuilen (2015) show that there is no improvement in the empirical performance of more complex elicitation methods over the Offerman et al. beliefs might be distorted under the quadratic scoring rule, depending on the assumed underlying model of preferences. A key example is that it is well documented under the QSR a risk averse agent should distort their reported belief towards 50%. However, this is only true in the absence of state-dependent stakes. In our experiment, we will also be interested in eliciting beliefs when participants have an exogenous stake associated with one of the two states.
In the context of state-dependant stakes, a risk averse EU maximizer will face two distortionary motives in reporting her belief: (i) she will face the motive to distort her belief towards 50% as discussed above; and (ii) in addition, there is a hedging motive, which will compel a risk averse individual to lower her reported belief, r t , towards 0% as the size of the exogenous stake increases increases. 15 If the participants in our experiment are risk neutral expected utility maximizers, the reported beliefs, r t , that we elicit under the QSR will coincide with their true beliefs, π t . However, in order to allow for choice behavior consistent with a wider range of decision models, we measure the size of the distortionary influence of the elicitation incentives at an individual level and correct the beliefs accordingly.

A Non-EU 'Truth Serum'
The Offerman et al. (2009) approach proposes correcting the reported beliefs for the risk aversion caused by the curvature of the utility function or by non-linear probability weighting. This approach involves eliciting participants' reported belief parameter, r, for a set of risky events where they know the objective probability, p (known probability). This is done under precisely the same QSR incentive environment in which the participants' subjective beliefs, π, regarding the events of interest (where they don't know the objective probability: unknown probability) are elicited. If a subject's reported beliefs, r, differ from the known objective probabilities, p, this indicates that the subject is distorting her beliefs due to the incentive environment (e.g. due to risk aversion). The objective of the correction mechanism is therefore to construct a map, R, from the objective beliefs, p ∈ [0, 1], to the reported beliefs, r, for each individual under the relevant incentive environment.
In Appendix B.2, we offer a detailed discussion of how the Offerman et al. (2009) method operates, describe the underlying assumptions, and demonstrate how it can be augmented (as pro-(2009) method, neither in terms of internal validity, nor in terms of behavior prediction. Out of the set of alternative elicitation techniques, the two that are most theoretically attractive are, the binarized scoring rule, proposed by Hossain and Okui (2013), and the probability matching mechanism, described by Grether (1992) and Karni (2009). However, in the context of the current paper, we viewed neither of these approaches as being preferable to the Offerman et al. (2009) technique, since both of these approaches introduce an additional layer of probabilities and in the study of probability bias, this is an undesirable attribute of the elicitation strategy. 15 This assumes that the state-dependent payment is associated with the 100% state, not the 0% state. This assumption is made throughout the paper and the experiment. posed in Kothiyal et al. (2011)) to allow for the scenario where there are state-contingent stakes (i.e. x = 0).

Experimental Design
The experiment was designed to test the asymmetric updating hypothesis using both withinsubject and between-subjects comparisons of updating behavior. The experiment consisted of three treatment groups. The treatment T1.S corresponds to Scenario 1 ( : no exogenous state-contingent stakes) and the other two treatment groups, T2.C and T3.S , correspond to Scenario 2 ( : state-contingent stakes) discussed above.
In T1.S , one would expect the participants to be indifferent between each of the two states being realized, while in T2.C and T3.S , the larger payment associated with state ω = A should imply that the participants prefer that this state be realized. T2.C and T3.S are identical in terms of the financial incentives. The rationale for running two treatments with identical incentives was to conduct additional checks to ensure that our results were not driven by the influence of a hedging motive as discussed above. To do this, we varied only the way that the incentive environment was described to participants (i.e. we only varied the framing of the incentives). In T2.C , this information was summarized in a way that made it much easier for participants to notice the hedging opportunity in comparison to T3.S .
This exogenous variation in the salience of the hedging opportunity served two purposes. Firstly, it allowed us to assess whether a hedging motive influenced the beliefs elicited. Secondly, it provided an opportunity to assess the validity of the mechanism we use to correct reported beliefs. 16 The difference between these two treatments is discussed in more detail in the 'Incentives and Treatment Groups' section below.
The experiment proceeded in three stages. The first stage comprised the core belief updating task in which we elicited a sequence of reported beliefs from subjects as they received a sequence of noisy signals regarding the true state of the world, updating from an exogenously provided prior. In the second stage we collected the reported probabilities associated with known objective probabilities on the interval [0, 1] required for the Offerman et al. (2009) correction approach, as well as data on risk preferences. In the third stage, we obtained data on several demographic characteristics as well as some further non-incentivized measures. In each of the first two stages, one of the subject's choices was chosen at random and paid out. The Belief Updating Task (Stage 1) The Belief Updating Task was the primary task of the experiment. We used this task to collect data about participants' belief updating behaviour. The experimental design for this task is summarized in Figure 1 and described in the following discussion.  The Belief Updating Task consisted of five rounds. In each round, participants were presented with a pair of computerized 'urns' containing blue and red colored balls, with each of these two urns representing one of the two states of the world. The composition of the two urns was always constant, with the state ω = A represented by the urn containing more blue balls (5 blue and 3 red), while the state ω = B was represented by the urn containing more red balls (5 red and 3 blue).

Priors
The five rounds differed from one another only in the exogenous prior probability that ω = A was the true state, with this prior, p 0 , chosen from the set { 1 6 , 2 6 , 3 6 , 4 6 , 5 6 }. In each round, this prior was known to the participant. The order of these rounds was randomly chosen for each individual. Conditional on the prior, p 0 , one of the two urns was then chosen through the throw of a virtual die, independently for each individual, in each round.

Belief Updating
In each round, after being informed of this prior probability, p 0 , the participant received a sequence of five partially informative signals, s t , for t = {1, 2, 3, 4, 5}. These signals consisted of draws, with replacement, from the urn chosen for that round. Therefore, if the state of the world in a specific round was ω = A then the chance of drawing a red ball was 3 8 and the chance of drawing a blue ball was 5 8 for each of the draws in that round (see Figure 1). In each round, we elicited the participant's reported belief, r t , about the likelihood that state ω = A was the correct state of the world, six times (i.e. for t = {0, 1, 2, 3, 4, 5}). We first elicited her reported belief, r 0 , directly after she was informed of the exogenous prior probability, p 0 , and then after she received each of her five signals we elicited r t for t = {1, 2, 3, 4, 5}. Overall, we therefore elicited 30 reported beliefs in Stage 1 from each individual (6 reported beliefs in each of 5 rounds). 18

Incentives and Treatment Groups
The Belief Updating Task was identical across treatment groups with the exception of the incentives faced by participants. In each treatment, participants' payment consisted of two components: (i) an exogenous state-contingent payment 19 , and (ii) an accuracy payment that depended on their stated belief 20 and the true state (i.e. the QSR payment described in equations 6 and 7 above). In treatments T2.C and T3.S , the state-contingent payment was substantially higher at £10 in state ω = A in comparison to £0.10 in state ω = B, making ω = A the more attractive state of the world. In T1.S , participants simply received an equal statecontingent payment of £0.10 in both states, ω = A and ω = B, implying neither was preferable.
In all three treatments, participants received nearly identical detailed written instructions describing the belief updating task as well as the two payment components. In order to further simplify the task faced by participants and try to ensure that they understood the incentive environment they faced, we presented the QSR as a choice from a list of lotteries (this approach is also used, for example, by Armantier and Treich (2013) and Offerman et al. (2009)). To this effect, subjects were presented with payment tables, which informed them of the precise prospect they would face for each choice of r t , in increments of 0.01. An abbreviated version of the three payment tables associated with each of the three treatment groups is presented in Table 2. 21 In order to represent all payments as integers in the instructions and payment tables, we adopted the approach of using experimental points. At the end of the experiment, these experimental points were converted to money using an experimental exchange rate of 6000 points = £1. Table 2 highlights the difference between treatments T2.C and T3.S . While, participants in these two treatments faced precisely the same incentives, the treatments differed in terms the salience of the hedging motive. In particular, the only difference between the two treatments was the way in which the payment information was summarised in the payment table. As shown in Table 2, in T2.C , the payment table showed the combined payment from both (i) the exogenous state-contingent payment, and (ii) the accuracy payment, together. Therefore, it summarised the reduced form prospect associated with each reported probability choice (r t ) for subjects.
The motivation behind having two treatments with identical incentives, but a different presentation of the incentives, was following. The rationale for T2.C was that presenting the incentives in a combined form is the simplest and clearest way of relaying the true incentives faced to participants. The rationale for T3.S was that if participants "narrowly bracket", then the separate presentation of incentives could reduce the influence of the hedging motive, and therefore would induce more accurate belief reporting. By implementing both treatments, we were able to evaluate the influence of the presentation of the incentives on the reported beliefs. As we will see below, this influence was substantial and corresponds to the theoretical predictions for how a risk averse individual would act if she were hedging more when the hedging opportunity Furthermore, an additional benefit of running both treatments was that it provided us with a way to test the internal validity of the correction mechanism we use. We will see below that, while the uncorrected distribution of beliefs differ substantially between the two treatments, the distributions of the corrected beliefs are very similar to one another.

The Offerman et al. (2009) Correction Task (Stage 2)
In the second stage of the experiment we elicited the twenty reported beliefs, r, for events with known objective probabilities required to estimate the incentive distortion function, R, for each individual. In each of the three treatments, we estimate the function R using belief from Stage 2 elicited under the same incentive environment as in the Belief Updating Task in Stage 1.
In Stage 2, participants were asked to report their probability judgment regarding the likelihood that statements of the form: "the number the computer chooses will be between 1 and 75" (i.e. p = 0.75), were true, after being told that the computer would randomly choose a number between 1 and 100, with each number equally probable. For T1.S , this specific example of the probability of the randomly chosen number being in the interval between 1 and 75 essentially involves choosing r from the list of prospects defined by and T3.S , this example would involve choosing r from the list of prospects defined by x + 1 − (1 − r) 2 0.75 (1 − r 2 ). As in Stage 1, in each of the treatments, the Stage 2 payment table summarized the relevant payment information. For each treatment, this payment table contained identical values in Stage 1 and Stage 2. Therefore, Table 2 above also provides a summary of the Stage 2 payment tables.
The twenty reported beliefs corresponded to the objective probabilities 0.05, 0.1, . . . , 0.95. 22 At the end of the experiment, one choice from Stage 2 was randomly chosen to contribute to each participant's final payment.

Data and Descriptive Evidence
The experiment was conducted at the UCL-ELSE experimental laboratory in London as well as at the WZB-TU laboratory in Berlin, with two sessions for each of the three Treatment groups at each location, making twelve sessions and 222 participants in total. 23 At both locations, participants were solicited through an online database using ORSEE (Greiner, 2015) and the experiment was run using the experimental software, z-Tree (Fischbacher, 2007). On average, sessions lasted approximately 1.5 hours and the average participant earned £19.7 in London and e20.3 in Berlin. Realized payments ranged between £11 [e11] and £34 [e34].
One challenge faced by belief updating studies is ensuring that subjects understand and engage with the task. In order to facilitate this, we were careful to ensure that the instructions received were as clear and simple as possible. Nonetheless, there remained a non-trivial fraction of participants who took decisions to update in the 'incorrect' direction 24 upon receiving new information. In order to ensure that the behavior we are studying is reflective of actual updating behavior of individuals who understood and engaged with the task, we restrict our sample for our main analysis by removing rounds where an individual updates in the incorrect direction. 25 However, we also estimate all the main results on the full sample, and the general patterns of behavior are similar.
While randomization to treatment group should ensure that the samples are balanced on observable and unobservable characteristics, Table 9 provides a check that the selection of our preferred sample has not substantially biased our treatments groups by reporting the sample means of a set of individual characteristics for each treatment group. Overall, the treatments appear to be balanced, with the exception that individuals in T3.S are more likely to speak English at home than individuals in T2.C .

Empirical Specification
In this section, we first discuss the calibration exercise used to correct the reported beliefs. Second, we describe the core estimation equations used in our analysis. These build on the work by Möbius et al. (2014) as their data has a similar structure to ours. One key difference in our data is the exogenous assignment of the participants' entire information set. We exploit this feature of our data to address endogeneity issues that can arise when studying belief updating.

The Belief Correction Procedure
The belief correction procedure that we adopt involves assuming a flexible parametric form for the participants' utility and probability weighting functions in order to estimate their personal beliefdistortion function (this is the R function discussed in Equation 8 in Appendix B.2.1). We estimate this function for each individual separately in order to correct the reported beliefs at the individual level. Essentially, we are simply fitting a curve through each subject's belief elicitation distortion, for the relevant incentive environment. Appendix B.2.4 contains a more detailed discussion of the estimation of the belief correction function, and illustrates the main ideas through Figures 5 and 6. Figure 5 displays the average correction curves for each of the treatment groups across all individuals. However, at the individual level, there is a large degree of heterogeneity in the degree to which individuals distorted their reported belief away from their actual belief, given the incentive environment. Figure 6 displays the correction curves estimated for two individuals from each treatment group. In Figure 2 we plot the CDFs of the uncorrected reported beliefs as well as the reported beliefs that have been corrected at the individual level. The left panel displays the distribution of reported beliefs, prior to correction, in each of the three treatments. One interesting feature of this figure is that, in spite of the fact that subjects in T2.C and T3.S are offered precisely the same incentives, the distributions of reported beliefs of these two groups are significantly different from one another (Mann-Whitney rank-sum test, p < 0.01). The larger mass of reported beliefs to the left of 50 in T2.C suggests that individuals are more likely to respond to the hedging opportunity when it is more salient.
This right panel shows the corrected beliefs. It suggests that the belief correction approach that we use was successful in removing the strong hedging influence in the T2.C treatment since the correction procedure removes the majority of the difference between the distributions of beliefs between the two treatments with identical incentives, T2.C and T3.S . This evidence underscores the usefulness of the belief correction mechanism in reducing the bias in the reported beliefs, when there is a salient hedging opportunity.
With the corrected beliefs in hand, we can proceed to the main analysis of belief updating in our sample. The analysis is done using both the corrected and uncorrected beliefs. 26

Core Estimation Specifications
Our core estimation equations aim to test for systematic patterns in updating behavior, within the framework developed in Equation 1. Firstly, we examine whether there are systematic deviations from Bayes' rule in updating behavior, independent of having a stake in one of the two states of the world. Secondly, we assess the influence that having a stake in one of the two states of the world has by (i) testing for an asymmetry in updating within treatments where there is a state-contingent stake (i.e. T2.C and T3.S ); and (ii) testing whether there are differences in updating behavior between the treatments with and without a state-contingent stake.
The first estimation equation follows directly from Equation 1, allowing us to test the asymmetric updating hypothesis, and also to test for other common deviations from Bayes' rule, such as a confirmatory bias or base rate neglect 27 : whereπ i,j,t = logit(π i,j,t ) andq = log( q 1−q ); j refers to a round of decisions; and the errors, ǫ ijt , are clustered at the individual (i) level.
In order to determine the belief updating pattern within each incentive environment, we estimate this equation separately for each treatment; and then to test for significant differences between the coefficients in different incentive environments, we pool our sample and interact the treatment variable with all three of the coefficients in this equation (i.e. δ,γ a , and γ b ). This provides us with a test of whether the parameters differ between either of the two treatments and the treatment.

Endogeneity of the Lagged Belief
One potential concern with the identification of the parameters of Equation 2 is the common issue where the right hand side contains lagged versions of the dependent variable. This implies that there is a possible endogeneity of the lagged beliefs, π i,j,t , if they are correlated with the error term, E{π i,j,t ǫ i,j,t+1 } = 0. 28 If this is the case, it can result in biased and inconsistent estimators for the parameters of the regression. Our experiment was designed to avoid this issue by virtue of exogenously assigning the subjects' entire information set. This allows us to use the exogenously assigned prior probability of state ω = A being the true state, p i,j,t=0 , as well as the sequence of signals observed, s t , to construct an instrument for the lagged belief, π i,j,t , in Specification 2. 29 We do this by calculating the objective Bayesian posterior, given the agent's information set at time t, and using this as an instrument for her belief, π i,j,t .
The approach used here also avoids a second type of endogeneity issue that can arise when studying belief updating when the states of the world are a functions of personal characteristics (e.g. when examining beliefs regarding individual attributes, such as one's own skills, IQ, or beauty) or personal choices. When this is the case, the conditional probability of observing a specific signal depends on the state of the world, and therefore can be correlated with personal characteristics. 30 28 For example, this would be the case if there is individual heterogeneity in the way individuals respond to information. We provide evidence below that this individual heterogeneity is present. 29 More precisely, we instrument using the logit of the accurate Bayesian posterior, logit(p i,j,t ), as an instrument for the logit of the lagged belief,π i,j,t = logit(π i,j,t ) in Specification 2. 30 Consider the following toy example to illustrate this. Assume there are two states of the world-being TALL and SHORT-and being tall is viewed as "good". Individuals are initially uncertain about their height group (because only relative height is important). Now, assume that TALL individuals are more responsive to new information than SHORT individuals. For example, assume TALL individuals update exactly according to Bayes' rule, but SHORT individuals don't update their prior beliefs at all in response to new information. Now, let subjects receive a sequence of noisy signals that are informative (in the sense that, for each type, the signal they receive is correct more than fifty percent of the time). In this toy example, both types respond symmetrically to 'good-news' and 'bad-news', but because TALL individuals are always more likely to receive 'good-news', and SHORT individuals are always more likely to receive 'bad-news', the resulting aggregate behavior will appear to display the "good-news, bad news effect". This is due to the fact that, on average, a good signal was more likely received by a TALL individual, who updates like a Bayesian, and a bad signal is more likely to go to a SHORT individual who doesn't update at all. Furthermore, updating would display conservatism on average. See Appendix C for further details. 20 Table 3 reports the results from estimating Equation 2 for each of the treatment groups separately. These estimates provide an overview of the updating behavior of the average individual within each of the three treatment groups. Within each treatment group, we report the results for both the OLS (top panel) and the IV (bottom panel) estimates discussed above. Columns (1a), (2a) and (3a) use the uncorrected reported beliefs, while columns (1b), (2b) and (3b) use the corrected beliefs. Every coefficient in the table is statistically different from 0 at the 1% level. Since our primary interest is in testing whether the coefficients are different from 1, in this table, we use asterisks to reflect the significance of a t-test of whether a coefficient is statistically different from 1.
Perhaps the most striking features of this table are: (i) the similarity in the updating patterns across the three treatment groups; and (ii) that for the average individual, the observed updating behavior is close to Bayesian in all three treatment groups. The p-values from the test of the null hypothesis, H 0 : γ a = γ b , show that in none of the three treatment groups do we observe a statistically significant difference (at the 5% level) between the responsiveness to the signals in favor of ω = A and ω = B (i.e. we don't observe asymmetric updating).
Both the OLS results in the top panel, and the IV results in the bottom panel, indicate that the responsiveness to new information was, on average, not statistically different to that of a Bayesian, since both γ a and γ b are not significantly different to 1 at the 5 percent level. The primary difference between the OLS results and the IV results is that, while the OLS estimates suggest a small degree of base rate neglect across all three treatments (δ < 1), once we control for the possible sources of endogeneity discussed above using our instrumental variable strategy, the estimates are no longer indicative of base rate neglect. Since the OLS estimates may be biased 31 , the IV estimates represent our preferred results. The first stage regression results for the IV estimation are reported in the Appendices in Table 8, indicating that we don't have a weak instrument issue. Importantly, however, the form of endogeneity addressed by IV estimation only pertains to potential biases in the δ parameter, since it addresses endogeneity in the prior belief variable. It is therefore reassuring that for the estimates of our primary parameters of interest, namely γ a and γ b , the estimates are largely consistent across all the estimation specifications reported in Table 3. A reason for this is that the signals, s i,j,t+1 , that subjects receive are always completely exogenous, both across rounds, and with respect to the subject's personal characteristics. Furthermore, the distribution of signals observed is also exogenous, and balanced in expectation. This helps to 31 As mentioned above, one candidate for biased estimates is due to the fact that the posterior belief at time t is a left-hand-side variable in time t, but also serves as the right-hand-side variable through being the prior belief at time t + 1. This can lead to biased estimates if there is individual heterogeneity in updating. Our instrumental variable strategy uses the exogeneity of the priors and signals that subjects have been endowed with (summarized as the objective Bayesian probability at that point in time) to instrument for their prior beliefs, thereby removing this endogeneity of the prior beliefs in the regressions. avoid other sources of endogeneity (see, e.g., Appendix C) and to alleviate the influence of other potential confounding belief updating biases (see the Discussion section below). (ii) All coefficients are significantly different from 0 at the 1% level. Therefore, t-tests of the null hypothesis (H 0 : Coefficient = 1) are reported: * = 10%, ** = 5%, *** = 1%.
(iii) The rows corresponding to p (H 0 : γ a = γ b ) report the p-statistic from a t-test of the equality of the coefficients γ a and γ b (i.e. a test of the asymmetric updating hypothesis).
It is worth noticing that although we observe a substantial difference in the levels of the corrected and uncorrected beliefs in Figure 2 above, the estimates for updating in Table 3 are quite similar for the corrected and uncorrected beliefs. One explanation for this apparent inconsistency is the following. If the degree of hedging by an individual is similar for both the prior and posterior belief, then the correction could have a sizeable effect on the levels of both beliefs, but not result in the large difference in the estimated updating parameters, since updating pertains to the change in the belief rather than the level.

A Model Free Test of the Asymmetric Updating Hypothesis
In order to alleviate the potential concern that these results are dependent on the functional form of our empirical specification, we conduct a model-free test of the the asymmetric updating hypothesis. Perhaps the simplest and most direct test of this hypothesis is obtained by directly comparing the posterior beliefs formed in two scenarios where the information set is identical, but the rewards associated with one of the states of the world is varied. Our data is well suited for conducting this exercise.
We do this by considering a comparison of information-set-equivalent posterior beliefs after individuals have received only (i) the exogenous prior and (ii) a single ball draw. 32 This allows us to test the asymmetric updating hypothesis while remaining agnostic regarding the process that guides belief updating, testing only whether it is symmetric. Our data allows us to conduct two comparisons of information-set-equivalent posterior beliefs-a within-subject and a betweensubject comparison.
Firstly, we can compare posterior beliefs, π 1 , formed with identical information sets {p 0 , s 1 } between treatment groups, where the payments associated with states of the world differ. For example, we can compare the average posterior formed after an identical prior, e.g. p 0 = 1 6 , and an identical signal, e.g. s 1 = a (i.e. a blue ball), across treatments.
Secondly, we can compare information-set-equivalent posterior beliefs within treatment groups. This comparison involves comparing π 1 after {p 0 = p, s 1 = s} with 1−π 1 after {p 0 = 1−p, s 1 = s c } where s c is the complementary signal to s. 33 For example, we can compare the posterior, π 1 , formed after a prior of p 0 = 1 6 and the signal s 1 = a (i.e. a blue ball), with 1 − π 1 after a prior of p 0 = 5 6 and the signal s 1 = b (i.e. a red ball). To see why this comparison involves a comparison of information-set-equivalent posterior beliefs, recall that the experiment is designed to be completely symmetric in terms of information, with the information content of a red ball exactly the same as a blue ball, except in support of the other state of the world. Therefore, if an individual updates symmetrically, then π 1 | {p 0 =p,s 1 =s} = 1 − π 1 | {p 0 =1−p,s 1 =s c } . This prediction does not rely on Bayes' rule (although it is an implication of Bayes' rule), but rather only requires symmetric updating, and therefore it provides us with a non-parametric test of the asymmetric updating hypothesis. Figure 3 depicts both of these comparisons, with each group of six bars collecting together the relevant information-set-equivalent groups. Each bar presents the mean posterior belief for that group, as well as a 95% confidence interval around the mean. 34 Each group is labeled on the x-axis by the prior belief associated with the 'red' bars, which correspond to the information sets that include a red ball as a signal (i.e. s 1 = b). The 'blue' bars report the mean of 1−π 1 , for information sets containing a blue ball (i.e. s 1 = a) and for these bars the x-axis label corresponds to 1 − p 0 . Within each group, the first two bars represent the average posterior beliefs in T1.S ; the second pair of bars depict the same for T2.C ; and the third pair of bars for T3.S .  The results displayed in Figure 3 show that there are no systematic differences between posterior beliefs within information-set-equivalent groups, neither within nor between treatment groups. Furthermore, when testing non-parametrically whether there are differences within or between treatment groups for information-set-equivalent groups, none of the 45 relevant binary comparisons 35 are significant at the 5 percent significance level under a Mann-Whitney test, suggesting that we cannot reject the hypothesis that the posterior beliefs within information-set-equivalent groups are drawn from the same distribution. This lends support to the results described above which indicate that we fail to find evidence in support of the asymmetric updating hypothesis.

Robustness Exercises
In addition to this model-free test, to check for the robustness of the belief updating results for the average individual presented in Table 3, we conducted several robustness exercises. These exercises, and their corresponding results, are discussed in detail in Appendix A.
The first exercise examines whether the results from the main specification described in Equation 2 are robust to first differencing the dependent variable (i.e. this considers how new information shifts the change in beliefs, imposing the assumption that δ = 1). The second subsection extends the main empirical specification to allow for individual-specific updating parameters. For both of these specifications, an ex post power analysis is conducted, reporting the MDE for a significance level of α = 0.05 and a power of κ = 0.8. The third subsection pools all the observations across the three treatments together, and then tests whether the average updating parameters differ across treatments by interacting treatment group dummies with the regressors of the main specification described in Equation 2. The results from all of these exercises are highly consistent with those in Table 3 and fail to provide any evidence in favor of an asymmetry in updating.

Heterogeneity in Updating Behavior
In order to investigate whether the aggregate results are masking heterogeneity in updating behavior, we estimate Specification 2 at the individual level and collect the parameters. The distributions of these individual level parameters are reported in Figure 4. Perhaps the most conspicuous feature of this figure is the fact that all three treatment groups display such similar parameter distributions in each of the panels -i.e. for each of the parameters. Testing for differences between the underlying distributions from which the parameters are drawn in the different treatment groups fails to detect any statistically significant differences in any of the four panels. 36 The upper-right panel shows that the majority of individuals have an estimated δ parameter in the interval [0.6, 1.1], with a large proportion of these concentrated around 1 in all three treatment groups. The two left-hand panels show that there is substantially more individual heterogeneity in the estimated γ a and γ b parameters, which are dispersed over the interval [0, 3.5] in all three treatments.
With such a large degree of variation in the individual level parameters, a natural conjecture to make is that, while we do not observe asymmetric updating at the aggregate level, it is entirely plausible that there may be a subsample of individuals who are optimistic updaters and another subsample of individuals who are pessimistic updaters. If these two subsamples are of a similar size and their bias is of a similar magnitude, we would observe no asymmetry at the aggregate level. The lower-right panel of Figure 4 suggests that this is not the case by plotting the distribution of the individual level difference between the γ a and γ b parameters. The majority of the distribution is concentrated in a narrow interval around 0 for all three treatment groups, suggesting that there is no asymmetry for any sizable subsample. Furthermore, this conclusion is supported by the fact there are no significant differences between the distributions of updating parameters observed across the three treatments in any of the four panels, since the motive for a 'good-news, bad-news' effect is switched off in the T1.S treatment.

Heterogeneity in results observed in the asymmetric updating literature
A central question that emerges from the discussion above is why we observe no evidence of a 'good-news, bad-news' effect here, while some other influential contributions to this literature 26 have found evidence for such an effect. In his excellent chapter in the Handbook of Behavioral Economics, Benjamin (2019) points out that, more generally, the evidence in this nascent literature is so far very mixed. In the economics literature, three papers find evidence in favor of stronger inference from good news: Eil and Rao (2011), Möbius et al. (2014) and Charness and Dave (2017). 37 In contrast, there are three papers that find evidence of stronger inference from bad news: Ertac (2011), Kuhnen (2015) and Coutts (2019). 38 Furthermore, in addition to the current paper, there are four other papers that find no evidence in favor of a preference-biased asymmetry in belief updating: Grossman and Owens (2012), Schwardmann and Van der Weele (2016), Gotthard-Real (2017) and Buser et al. (2018). 39 In the psychology literature, however, there appears to be a near-consensus arguing that there does exist an asymmetry in belief updating in favor of good news (see, e.g., Sharot et al. (2011), Sharot et al. (2012, Kuzmanovic et al. (2015), and Marks and Baines (2017), amongst others). A notable exception is provided by Shah et al. (2016) who argue that many of the contributions to this literature suffer from methodological concerns; Garrett and Sharot (2017) offer a rebuttal, claiming that optimistically biased updating is robust to these concerns.
So far, there seems to be no clear pattern organizing these heterogeneous results. However, below I offer a discussion of some of the candidate explanations for the heterogeneity in observed results. In general, the candidate explanations for these heterogeneous results fall into two categories: (1) the hypothesis that contextual factors mediate asymmetric updating: belief updating is influenced by preferences, but this preference-biased updating is switched on or off by contextual factors.
(2) the hypothesis that asymmetric updating is sometimes misidentified: belief updating is not actually influenced by preferences, but rather, what appears to be asymmetric updating is driven by a different cognitive bias (e.g. pior-biased inference). The majority of the explanations discussed below fall into the first category.

Information structure
One avenue of enquiry for attempting to reconcile the results is to consider the differences in the information structures across experiments. For example, while several of the studies adopt a two- 37 Mayraz (2013) also presents evidence in favor of individuals forming motivated beliefs that are distorted towards more desirable states, however in his experiment one cannot calculate the Bayesian posterior, so it is less comparable to the other studies in this literature. 38 Kuhnen (2015) differs slightly from the other papers in this literature by studying investor learning from financial information in the domain of gains versus losses (the experimental game is framed as a financial decision). She finds that investors are especially reactive to bad outcomes in the loss domain, implying that she observes negative asymmetric updating in the loss domain (and not the gain domain), and overall more pessimistic beliefs in the loss domain. The implication is that individuals learn differently in the domain of losses in comparison to the domain of gains. 39 Additionally, while Eil and Rao (2011) found evidence of an asymmetry in favor of good-news in the domain of beliefs about one's own Beauty, they did not find evidence of an asymmetry in the domain of beliefs about one's own IQ.
27 state bookbag-and-poker-chip experimental paradigm, Ertac (2011) uses a three-state structure with signals that are perfectly informative about one state, and Eil and Rao (2011) consider a ten-state updating task with binary signals. However, this does not seem to be driving the differences in results, since we observe heterogeneous results amongst papers with similar information structures-e.g. restricting attention only to the papers with two-state structures with binary signals (e.g. the current paper, Möbius et al. (2014), Gotthard-Real (2017), Coutts (2019)) yields mixed results.

Priors
Focusing only on two-state experiments, there is substantial variation in average prior belief across experiments, with Coutts (2019) (by design) observing relatively low average priors in comparison to Möbius et al. (2014), for example. If belief updating is influenced by prior beliefs (e.g. a confirmatory bias), then what looks like preference-biased belief updating may be driven by a completely different cognitive deviation from Bayesian updating, namely prior-biased updating. However, if we look at the papers that find evidence for preference-biased updating, Charness and Dave (2017) do find evidence in favor of prior-biased updating, while Eil and Rao (2011) and Möbius et al. (2014) do not find evidence of prior-biased updating. 40 This speaks against the explanation that what appears to be asymmetric belief updating due to preferences is actually driven by a confirmatory bias. 41

Ambiguity
One important dimension that differentiates the current paper from much of the literature is that in our experiment subjects are exogenously endowed with accurate point estimate priors, as opposed to updating from subjectively formed prior beliefs. 42 There are advantages and disadvantages to this approach. It brings increased experimental control and improved causal identification, but this comes at the expense of reduced realism and perhaps a slightly less natural setting. 43 However, this discussion highlights a key assumption that is typically made in this liter-40 I refer the interested reader to Benjamin (2019) Section 8 for a more detailed discussion of prior-biased updating. 41 A second possible way in which priors can be related to belief updating is through the distribution of signals observed. For example, compare an individual with a low prior with an individual with a high prior in a typical twostate experimental framework. The individual with the low prior is also more likely to be in the low state (assuming beliefs are correlated with the true states). Therefore, in these two-state bookbag-and-poker-chip experiments, the low prior belief individual is more likely to receive more low signals. This implies that if there is any deviation from Bayesian updating that is related to the distribution of signals observed, then this can look like a preference-biased asymmetry in updating when the average prior in the experiment is different from 0.5. This point is discussed further in Coutts (2019). In the current paper, priors and signals are balanced for each individual, alleviating this concern.
42 An exception to this is Gotthard-Real (2017), who uses a similar experimental design to the current paper, also endowing participants with an exogenous prior point belief. 43 An interesting recent contribution to the belief updating literature by Le Yaouanq and Schwardmann (2019) proposes a methodology for studying belief updating in more natural settings, while still maintaining experimental ature, namely that subjects are probabilistically sophisticated and therefore have in mind a point estimate of this probability (or update as if they hold a point estimate prior). In cases where subjects must form their own subjective prior belief, this assumption is not innocuous. If, instead, the beliefs subjects hold are ambiguous 44 , then simple Bayesian updating of a single point estimate may no longer be the most appropriate (normative or descriptive) benchmark. Firstly, there are several competing theoretical models of belief updating in the presence of ambiguity with differing predictions, e.g. full Bayesian updating (Jaffray, 1989;Pires, 2002) and maximum likelihood updating (Gilboa and Schmeidler, 1993), and some recent experimental evidence testing between them (Ngangoue, 2018;Liang, 2019). Secondly, one might postulate that there is greater scope for motivated reasoning when one is updating beliefs from ambiguous priors (or ambiguous signals) in comparison to belief updating from exogenously endowed point estimate priors (and signals with clearly defined informativeness).
However, the existing evidence suggests that the presence or absence of ambiguity is not the primary explanation for the differing results. Even within the set of papers with home-grown subjective priors (e.g. Eil and Rao (2011), Ertac (2011), Möbius et al. (2014), and Coutts (2019)), the results are very mixed.

Domain of belief updating
Typically, we treat belief formation as being domain-independent. However, it seems natural to consider the possibility that humans evolved to process information about their physical environment differently from information about their self and their social environment. For example, the mental processes involved in forming a belief about the likelihood of future rainfall may be fundamentally different to those involved in forming a strategic belief about the probability that another individual will be trustworthy in a specific scenario. The latter may involve forming a mental model of the other individual's incentives and personal characteristics, and mapping them into the specific scenario. Some papers in this literature have explored this question by asking whether we update differently about a given fundamental characteristic of one's own self in comparison to the same fundamental characteristic of another individual (e.g., Möbius et al. (2014) and Coutts (2019)). Furthermore, recent theoretical and experimental work has studied how individuals attribute outcomes to their self versus an external fundamental from their physical or social environment (see, e.g., Heidhues et al. (2018), Hestermann and Le Yaouanq (2018), and control and permitting a comparison with Bayesian updating. Their approach is attractive, since it facilitates the study of belief updating of home-grown priors upon receipt unstructured information that is unobserved by the experimenter. 44 For example, in a typical two-state experiment, I might believe that my chance of being in the top half of an ability distribution is somewhere between 50 and 70 percent. Even if I don't have an ambiguous belief, but rather hold the belief that my chance of being in the top half is uniformly distributed between 50 and 70, there is evidence that attitudes towards compound lotteries is closely related to attitudes towards ambiguity. (Halevy, 2007) 29 Coutts et al. (2019)).
The influence of the domain on belief updating may matter for the asymmetric updating literature because this literature as a whole considers updating scenarios pertaining to both the environment (with 'good' states typically represented by high monetary payments) and to the self (where 'good' states pertain to a desirable individual characteristic). 45 One might posit that asymmetric updating only manifests in certain domains. 46 However, even within the group of studies considering beliefs about the self, the evidence is mixed (e.g. Eil and Rao (2011), Ertac (2011), Möbius et al. (2014, and Coutts (2019)).

Outcomes and stake size
One caveat to the results reported in the current paper is the financial stakes in play are not extremely large, and therefore it is feasible that the results would change in the presence of larger stakes. While small stake sizes are a standard caveat to most laboratory experiments, it is perhaps a larger concern here, since the asymmetric updating hypothesis may rely on the degree to which individuals desire the occurance of the good state of the world. However, Coutts (2019) investigates the role played by stakes, increasing the stake size up to $80 and finds no evidence that it plays a role. This suggests that stake size may not be a pivotal concern. 47 Nonetheless, it is worth keeping this caveat in mind when interpreting the results.
Overall, the results in this literature appear to be somewhat incongruous, and there doesn't yet seem to be a compelling explanation for the observed pattern of behavior observed across studies. As suggested by Daniel Benjamin (2019), resolving these seeming contradictory results should be a priority. The current paper adds an incremental contriibution towards this pursuit by demonstrating the absence of a good-news, bad-news effect in one cell of this contextual matrix-namely, a context with financial outcomes, minimal ambiguity, and a minimal role for the individual's ego. 45 A closely related, but slightly different, taxonomy of the domain space considered in this literature is obtained by distinguishing domains where an individual's ego is present from domains in which it is absent. However, for most of the discussion here, this overlaps closely with the self vs physical environment dichotomy. 46 One potential explanation for the difference in belief updating between the domains of self-image and financial decision making is the idea that ego maintainance could yield evolutionary benefits. In particular, a positive asymmetry in updating about one's self-image would lead to overconfident beliefs, and several authors have posited that maintaining a high self-confidence may be associated with evolutionary advantages (see, e.g., Bernardo and Welch (2001); Heifetz et al. (2007); Johnson and Fowler (2011);Burks et al. (2013); Schwardmann and Van der Weele (2016)). In contrast, asymmetric updating about external states of the world would lead to overoptimism which is likely to lead to costly mistakes. 47 One might argue that even this large monetary stake size is small in comparison to how much individuals care about their self. However, the heterogeneity within studies focused on beliefs about the self suggest that this is not a pivotal dimension for switching the good-news, bad-news effect on and off.

Addressing hedging
One inherent challenge in this literature studying belief updating in the presence of state-dependent stakes is the inherent hedging motive. Various approaches have been adopted to try to deal with it. Typically, the papers considering belief updating about an ego-related characteristic (e.g. IQ) rely on the (very reasonable) implicit assumption that hedging across the ego utility and monetary utility domains will be minimal. 48 Amongst the studies considering monetary state-dependent stakes, two different approaches to dealing with this challenge have been adopted. Coutts (2019) follows the method suggested in Blanco et al. (2010), which involves partitioning the world such that the participant will either be paid according to their belief or according to the state-dependent prize, never both. As discussed by Blanco et al. (2010) and Appendix A in Coutts (2019), this approach has some attractive theoretical properties. In contrast, in this paper we follow the method developed by Offerman et al. (2009) and Kothiyal et al. (2011). Both these two approaches are theoretically valid for alleviating the influence of hedging under the assumptions they make; both approaches have advantages and drawbacks. However, neither Coutts (2019) nor the current paper finds support for a larger responsiveness to good-news, which suggests that it is not the case that one of the two methods is allowing hedging to suppress the over-responsiveness to good-news.
In this paper, we chose to adopt the Offerman et al. (2009)-Kothiyal et al. (2011 method for the following reasons. Firstly, using QSR incentives facilitates a simple presentation of the incentives to subjects, potentially enhancing the understanding of the actual incentives faced. Secondly, under the assumptions discussed in the main text and appendices, the Offerman et al. (2009)-Kothiyal et al. (2011 method allows one to both correct for any hedging motive present, and and to measure how large this hedging motive is in a particular incentive environment. Thirdly, as demonstrated by Blanco et al. (2010), hedging is generally not a major problem unless the hedging opportunity is transparent. To this end, we introduced the salience manipulation in the transparency of the hedging motive across our two A treatments. As advocated by Blanco et al. (2010), we see that the degree of hedging diminishes substantially in T3.S when the hedging opportunity is not prominent. This suggests that even the uncorrected beliefs in T3.S are a good approximation for the beliefs subjects hold. However, we also conduct all of our analysis using the corrected beliefs from each of the T2.C and T3.S treatments. This provides two independent tests of the hypotheses we are testing, each of which is theoretically valid.
Even given these factors motivating our design choices, the possibility of hedging deserves attention when interpreting the results. However, when considering the full set of results we observe, there are several features of the results that suggest that hedging is not a driving factor behind the absence of a good-news, bad-news effect. The observed absence of any asymmetry in updating within either of the two A treatments for either the uncorrected or corrected beliefs, as well as the consistency in updating patterns observed across all three treatments is not easily explained by a combination of a 'good-news, bad-news' effect and hedging.

Conclusion
The objective of this paper was to study belief updating when an individual prefers one state of the world to another. In particular, the experiment was designed to test the asymmetric updating hypothesis in the domain of financial decision making, and contribute to the body of work that is constructing a descriptive understanding of how individuals form beliefs.
The main finding of the current paper is that we find no evidence for asymmetric updating in our experimental context with financial state-dependent stakes. Instead, we find that the updating behavior of the average individual is approximately Bayesian, irrespective of the presence or absence of financial stakes.
The current paper complements the existing work on this topic in several ways. Firstly, we consider belief updating from a wider range of prior beliefs for each individual. Secondly, we demonstrate how the exogeneity of the priors can be used to conduct empirical exercises that alleviate endogeneity concerns. Thirdly, our experimental design ensures that the distribution of realised signals observed is balanced in terms of the frequency of 'good' and 'bad' signals. This is useful as it removes the potential influence of the signal distribution that is documented by Coutts (2019). Fourth, our data allows us to conduct both within subject and between subject tests, as well as several robustness exercises in support of the main result.
The results described in this paper, and in this nascent literature as a whole, are instructive as there is a large class of economically important situations in which individuals form beliefs, preferring one state to another, ranging from capital markets to intertemporal portfolio choice problems to consumption savings problems. It is clearly important for economists to have an accurate understanding of how individuals form beliefs in these contexts. However, as described in the Discussion section above, the results observed in this literature are so far mixed. This paper contributes evidence towards revealing an accurate descriptive model of belief updating, showing an absence of the 'good-news, bad-news' effect in a context with financial stakes, no ego, and a well defined and unambiguous information structure. However, to construct a more complete model of how we update our beliefs, more evidence is clearly needed.

APPENDICES Appendix A: Robustness Checks
The empirical specification used in the main text of this paper assumes that updating follows the flexible parametric process described in Equation 1. This specification allows for a wide range of deviations from Bayes' rule, as discussed in Section 2. In this section we conduct several exercises to test for the robustness of the main results.
The first subsection examines whether the results from the main specification described in Equation 2 are robust to first differencing the dependent variable (i.e. this considers how new information moves the change in beliefs, imposing the assumption that δ = 1). The second subsection extends the main empirical specification to allow for individual-specific updating parameters. The third subsection pools all the observations across the three treatments together, and then tests whether the average updating parameters differ across treatments, by interacting treatment group dummies with the regressors of the main specification described in Equation 2. (i) Standard errors in parentheses (clustered at the individual level) (ii) T-tests of H 0 : δ = 1; γ a = 1; γ b − γ a = 0 indicated by * = 10%, ** = 5%, *** = 1% (iii) MDE reports the minimum detectable effect size for a power of κ.

Robustness Check 1: First-Differences Specification and Power Calculation
This section of the robustness checks serves two purposes. The first purpose is to check for the robustness of the results from the core empirical specification to the use of a first differences specification (DIFF), which essentially involves imposing the assumption that δ = 1. The second purpose of this section is to report the size of the minimum detectable effect (MDE) from power calculations for both our main OLS and IV empirical specification, and the DIFF specification.
One of the challenges in carrying out a statistical analysis of belief updating behavior is that an individual's current posterior belief necessarily depends upon her prior belief, which in turn is the result of updating in response to past information. Therefore, when estimating a parametric belief updating function, one concern is that the individual's prior belief is correlated with unobservables. In the main text, we devoted substantial space to discussing how the experiment was designed explicitly to address this concern by generating a completely exogenous information set, facilitating a natural instrumental variables (IV) approach to estimation. The first differences specification results presented here serve to further complement the IV analysis, since the DIFF specification avoids the potential endogeneity issue by removing the lagged belief from the set of dependent variables in the regression.
In columns (#a) and (#b), Table 4 repeats the OLS and IV results from Table 3 for the corrected beliefs, with one minor change to the core specification in Equation 2. Here we report, instead, the results for the equivalent specification: whereπ i,j,t = logit(π i,j,t ) and q = log( q 1−q )·[1(s i,j,t = a)−1(s i,j,t = b)]; while as above,q = log( q 1−q ); j refers to a round of decisions; t counts the decision numbers within a round, and the errors ǫ ijt+1 are clustered at the individual (i) level. The difference γ b − γ a denotes a single parameter estimated in the regression, but is denoted as the difference between γ b and γ a as this is the natural way to think about this parameter in the context of the discussion above (i.e. the difference between how subjects update in response to 'bad news' and 'good news').
The reason for the rearrangement of the equation is that, while it is equivalent 49 to the specification in Equation 2, it displays the test of the difference between γ a and γ b more clearly (i.e. the 49 Notice that the regression coefficients and standard errors on δ and γ a are the same in Tables 3 and 4 (where we are only considering the corrected beliefs). Furthermore, we can see the equivalence from the following simple rearrangement: test of the asymmetric updating hypothesis), and thereby also facilitates calculating the MDE. In Table 3, we have presented the MDE for a power of κ = 0.8.
Columns (#c) report the results for the first difference specification, which imposes the restriction that δ = 1: where ∆π i,j,t+1 = logit(π i,j,t+1 ) − logit(π i,j,t ) and q = log( q 1−q ) · [1(s i,j,t = a) − 1(s i,j,t = b)]; j refers to a round of decisions; t counts the decision numbers within a round, and the errors ǫ ijt+1 are clustered at the individual (i) level.
The results indicate that the γ b − γ a parameter is robust to the different empirical specifications adopted, and also doesn't vary substantially across treatment groups. In all treatment groups, and for each of the empirical specifications considered, we cannot reject the null hypothesis that this parameter is equal to zero, which implies that we do not find support for the asymmetric updating hypothesis. Furthermore, we calculate the MDE for each specification, considering a significance level of α = 0.05 and a power of κ = 0.8. Under these assumptions, the MDE for the difference between the γ b and γ a parameters in each of the regressions considered in isolation ranges between 0.22 and 0.27. As a result, we cannot conclusively reject the possibility that there exists a small asymmetry in updating; however none of our results provide any support for this conclusion.

Robustness Check 2: Allowing for Individual-Specific Updating Parameters
As discussed above, one reason we might think that endogeneity of the lagged belief could lead to biased estimates is if there is heterogeneity in individual updating behavior and this leads to a correlation between the unobserved error term and the lagged belief variable amongst the regressors. We have tried to address this issue above using, firstly, an instrumental variable approach, and secondly, a first differences empirical specification. However, since the data were collected in the form of a panel of belief updates for each individual, the data lends itself to controlling for individual-specific behavior through exploiting the panel. A typical fixed effects model is not appropriate here, as it is not the level of the regression that shifts from individual to individual. However, we can include individual-specific updating parameters to control for the slope to shift at the individual level. This allows us to extract the individual heterogeneity in how responsive individuals are to their prior belief, and to new information in general, and reduce the possible bias in the main parameter of interest, the average difference in responsiveness to 'bad news' and 'good news' : γ b − γ a . With this in mind, our third robustness check involves estimating the following empirical specification: where δ i and γ i are estimated at the individual level, and the remaining parameters and variables are defined as above. The results from this exercise using the corrected beliefs are reported in Table 5.
These results are very consistent with the estimates from the core specification, as well as from the DIFF specification in Robustness Check 2. In summary, all the empirical estimates provide support for the same underlying story that the data collected in this experiment provide no support for the asymmetric updating hypothesis in this context. (i) Standard errors in parentheses (ii) T-tests of H 0 : Coefficient = 0 reported: * = 10%, ** = 5%, *** = 1% (iii) MDE reports the minimum detectable effect size for a power of κ.

Robustness Check 3: Between Treatments Comparison of Updating Parameters
This section tests whether the belief updating parameters in our core specification are significantly different between the three treatment groups. This is done by pooling together the three treatment groups and estimating Equation 2, but with the inclusion of treatment dummies interacted with the updating coefficients. This provides us with a test of whether the parameters differ between either of the two treatments and .
More specifically, this involves estimating the following equation: 42 where T k i,j,t is an indicator variable for treatment k [i.e. T k i,j,t = 1(T i,j,t = k)], with T i,j,t a treatment variable taking the values {1, 2, 3} corresponding to the three treatment groups. The coefficients δ,γ a , and γ b reflect the baseline parameters without the influence of state-contingent stakes and the parametersδ k , γ k a and γ k b estimate the movement from these parameters for each of the two state-contingent stake treatments, k ∈ {2, 3}.
The results from this exercise are presented in Table 6. The results show that, for the average individual, there are no systematic differences in the updating parameters across treatment groups. This implies that the differences in exogenous state-contingent incentives do not exert a strong influence on how individuals update their beliefs in the different treatments. (ii) Estimates use the corrected beliefs and are instrumented using the correct lagged Bayesian posterior.
(iii) All of the non-interacted coefficients are significantly different from 0 at the 1% level. Only one of the forty-two interaction coefficients are significantly different from zero at the 10% level. This is the γ b * T3 coefficient in the Belief 1 column, which is significant at the 10%, but not the 5% level.
where r t is the reported probability of event E A occurring; S A (r t ) is the payment if the state ω = A is realized; S B (r t ) is the payment if the state ω = B is realized. Therefore, the QSR essentially involves a single choice from a list of binary prospects, The QSR is a 'proper' scoring rule since, if the agent is a risk neutral EU maximizer then she is incentivized to truthfully reveal her belief, π t : However, the QSR is no longer incentive compatible once we allow for (i) risk aversion / loving and (ii) participants who have exogenous stakes in the state of the world. The reasons for this are the following. Firstly, it has been well documented theoretically that, if the participant is risk averse, then the QSR leads to reporting of beliefs, r t , that are distorted towards 0.5, away from her true belief, π t , when the participant has no exogenous stakes in the realized state. 50 This distortion has been observed in experimental data (Offerman et al., 2009;Armantier and Treich, 2013). Secondly, in our experiment, we will also be interested in eliciting beliefs when participants have an exogenous stake associated with one of the two states. More precisely, we will be interested in recovering the participant's true belief when she receives an exogenous payment, x, if state ω = A is realized. This payment, x, is in addition to the payment she receives from the QSR. In other words, she will choose from a menu of binary prospects of the form: In the context of state-dependant stakes, a risk averse EU maximizer 51 will face two distortionary motives in reporting her belief: (i) she will face the motive to distort her belief towards 0.5 as discussed above; and (ii) in addition, there is a hedging motive, which will compel a risk averse individual to lower her reported belief, r t , towards zero as x increases.
If the participants in our experiment are risk neutral expected utility maximizers, the reported beliefs, r t , that we elicit under the QSR will coincide with their true beliefs, π t . However, in order to allow for choice behaviors consistent with a wider range of decision models, we will measure the size of the distortionary influence of the elicitation incentives at an individual level and correct the 50 i.e. if π t > 0.5 then π t > r t > 0.5, and if π t < 0.5 then π t < r t < 0.5 for a risk averse individual reporting her beliefs under QSR incentives. 51 A participant who is a risk averse EU maximizer will choose her reported belief r t by solving the following maximization problem: max beliefs accordingly. This approach is valid under the weak assumption that individuals evaluate binary prospects according to the biseparable preferences 52 model and are probabilistically sophisticated. 53 This restriction on behavior is very weak and includes individuals who behave according to EU with any risk preferences as well as the majority of commonly used NEU models. 54

A Non-EU 'Truth Serum'
The discussion above has highlighted how beliefs might be distorted under QSR incentives. The Offerman et al. (2009) approach proposes correcting the reported beliefs for the risk aversion caused by the curvature of the utility function or by non-linear probability weighting. This approach involves eliciting participants' reported belief parameter, r, for a set of risky events where they know the objective probability, p (known probability). This is done under precisely the same QSR incentive environment in which we elicit the participants' subjective beliefs, π, regarding the events of interest (where they don't know the objective probability: unknown probability). If a subject's reported beliefs, r, differ from the known objective probabilities, p, this indicates that the subject is distorting her beliefs due to the incentive environment (e.g. due to risk aversion). The objective of the correction mechanism is therefore to construct a map, R, from the objective beliefs, p ∈ [0, 1], to the reported beliefs, r, for each individual under the relevant incentive environment. Offerman et al. (2009) show that under the assumption that individuals evaluate prospects in a way that is consistent with the weak assumptions of the biseparable preferences model, then in the scenario where there are no state-contingent stakes (i.e. x = 0), individuals evaluate the QSR menu of prospects for r t ≥ 0.5 and therefore the inverse of the map from objective probabilities where U is a real-valued function unique up to level and unit; and W is a unique weighting function, satisfying W (∅) = 0, W (S) = 1 and W (E) ≤ W (F ) if E ⊆ F . S is the set of all states and events are subsets of the full set of states: i.e. E, F ⊆ S. In this paper, we only consider two-state prospects, where the state-space is partitioned into two parts by an event, E and its complement E c . Making the further assumption that the decision maker is probabilistically sophisticated gives the following refinement: y E z → w(P (E))U (y) + (1 − w(P (E)))U (z) 53 Probabilistic sophistication is the assumption that we can model that individual's preferences over prospects as if the individual's beliefs over states can be summarized by a probability measure, P . In other words, probabilistic sophistication implies that we can model the individual's belief regarding the likelihood of an event E as being completely summarized by a single probability judgment, P (E A ). 54 Amongst the models subsumed within the biseparable preferences model are EU, Choquet expected utility (Schmeidler, 1989), maxmin expected utility (Gilboa and Schmeidler, 1989), prospect theory (Tversky and Kahneman, 1992), and α-maxmin expected utility (Ghirardato et al., 2004). See Offerman et al. (2009) for a discussion. to reported probabilities, R, is given by: In Appendix B.2.2, we provide a derivation for this equation, as well as augmenting the Offerman et al. (2009) approach to allow for the scenario where there are state-contingent stakes (i.e. x = 0). This extension to Offerman et al. (2009) represents a special case of the more general treatment of correction methods for binary proper scoring rules considered by Kothiyal et al. (2011). In our empirical analysis, we discuss how we use Equation 8 to recover the function, R, for each individual and thereby recover their beliefs, π t , from their reported beliefs, r t .

Serum' Approach to Include Stakes
The previous section discussed the central ideas motivating the Offerman et al. (2009) approach for correcting for hedging in cases where there are no state-dependent stakes (i.e. x = 0). In projects studying the asymmetric updating hypothesis, allowing for state-dependent stakes (i.e. x = 0) is of fundamental importance. Therefore, in this section, we consider an extension to the Offerman et al. (2009) approach to correcting for hedging. The extension we consider is tailored specifically to our experimental setting, however it is a special case of the more general set of correction techniques studied by Kothiyal et al. (2011). 55 In the case where x = 0, the text above discussed how participants who face the quadratic scoring rule incentives, along with the non-zero state-contingent bonus x, essentially face a choice from a menu of lotteries denoted by (x + 1 − (1 − r t ) 2 ) E A (1 − r 2 t ). An individual who satisfies the biseparable preferences model and is probabilistically sophisticated will evaluate this prospect using the following Equations: 56 For x ≥ 1 or r t ≥ 0.5 : 55 Kothiyal et al. (2011) extend the basic idea used by techniques aiming to correct elicited beliefs for reporting bias (e.g. hedging) to apply to the set of all binary proper scoring rules, and cover the full domain of beliefs. In conjunction with Offerman et al. (2009), this paper therefore offers a useful set of tools for accessing subjects' true beliefs in situations where they may have reason to distort their reports. 56 For expositional simplicity, we don't consider x ∈ (0, 1). The discussion below is easily extended to these cases, but they are irrelevant for the purposes of this paper. This case is slightly different due to the fact that the probability weights on events or states may depend on their ordinal ranking according to preferences in this model. and similarly, For x = 0 & r t < 0.5 : The reason for the two separate conditions is due to the way in which many NEU models, subsumed within biseparable preferences model, allow the probability weighting function, w(.), over events to be influenced by the ordinal ranking over the associated outcomes, from best to worst 57 .
Since the case where x = 0 is discussed extensively in Offerman et al. (2009), we will focus on the case where x ≥ 1 in the discussion that follows. This case only requires a very minor adjustment to their discussion. The key results are the following (adjusted to include the influence of x): Result 1: Under NEU with known probabilities, p, the optimal reported probability, r = R x (p) satisfies: Result 2: Under NEU with unknown probabilities, the optimal reported probability, r, satisfies: This motivates the simple strategy for recovering the agent's subjective beliefs from her reported beliefs under the specific incentive environment that she faces. Since the RHS of (11) and (12) agree, we have: which implies that if we can recover the function R −1 x then we can map the reported beliefs to the participant's subjective beliefs. Equation 11 shows that we can recover this R −1 x function in the same way here, with the bonus payment of x, as in the case where x = 0 considered in Offerman et al. (2009). Essentially, we provide the participant with prospects over known probabilities, p, and ask them for their belief regarding the likelihood that one state will be realized. In order to ensure the incentives to distort one's reported beliefs are kept constant, we do this exercise under precisely the same incentive environment as in the main belief updating task. By eliciting these reported beliefs associated with known probabilities spanning the whole unit interval, we can use these (p, r t ) pairs to estimate R x (p) for each individual for the relevant incentive environment created by the belief elicitation. Having estimated R x (p), we can calculate its inverse, R −1 x (r). We can take any beliefs reported by the participant under the same belief elicitation incentives and then use this estimated R −1 x (r) to recover her true beliefs. In particular, we can use this estimated function to recover her true beliefs from her reported beliefs in the belief updating task that is the focus of this paper. Essentially, we are using this procedure to remove any misreporting effect that the belief elicitation incentive environment may have. It allows us to correct for the possibility that individuals may hold some belief P (E) or π, but instead report a different belief, r.
If the incentive environment does not cause the participant to report a belief different from her true belief, then this procedure is unnecessary, but applying the procedure to her reported beliefs will not have any effect. In this case, the corrected beliefs will be the same as the reported beliefs.

Appendix B.2.3: Calibration of the Belief Correction Procedure: Theory
It is clear from the discussion in the main text and Equation 8, that we could recover R(.) nonparametrically for each individual if we were to collect a large number of (p, r) pairs from participants, such that the interval between the known probabilities, p, is sufficiently small. However, since it is not practical here to elicit such a large number of observations from each participant, we instead impose a parametric structure similar to the one used by Offerman et al. (2009).
For the utility function, U (.), we use the constant relative risk aversion (CRRA) 58 functional form: For the probability weighting function, w(.), we adopt Prelec's (1998), one-parameter family 59 : 58 It is worth noting that the name "constant relative risk aversion (CRRA)" is not appropriate here as risk attitudes are also captured in the probability weighting function. 59 This is a special case of Prelec's two-parameter family of weighting functions: For the purposes of the current context, the two-parameter family is not practically suitable due to the limited data we use at the individual level. This functional form permits the standard inverse-S shaped probability weighting 50 Substituting these parametric functional form specifications into Equation 8 for the case where x = 0 gives: For the the case where x = 0, substituting these parametric functional form specifications described in Equations 14 and 15 into Equation 11 gives: We therefore use this adapted specification for our correction mechanism for the and treatment groups.
In our core analysis, for our individual level reported belief corrections, we will make the simplifying assumption that α = 1, such that risk aversion is captured only through the curvature of the utility function and not through the probability weighting function. The results are similar when we use Prelec's one parameter weighting function. Furthermore, it is substantially easier to interpret the risk aversion parameter estimates when we estimate ρ alone, due to the strong relationship between the ρ and α estimates .
We therefore estimate the following model, for each participant, in order to acquire a numerical estimate for the inverse of this function, R(.): where R(j/20) is the probability reported by the individual that corresponds to true known probability, p = j 20 where 1 ≤ j ≤ 19 60 . As discussed above, α is the parameter of the probability weighting function; ρ gives the curvature of the utility function. The function h(.) is the inverse of R −1 . We estimate this function, h(.) numerically at each step 61 within the maximum likelihood estimation. The error terms, u j , are independently and identically distributed across participants function that has been found to be consistent with the majority of the existing empirical evidence. When β = 1 in the one-parameter family, the α parameter captures the degree of curvature of the inverse-S shape but the point at which w(p) intersects the 45 degree line is predetermined. Adding the second parameter, β, extends the one-parameter specification by allowing this fixed point to vary. 60 In other words, for known probabilities, p, between 0.05 and 0.95 at intervals of 0.05. 61 i.e. given the current parameter guesses. and choices and are drawn from a normal distribution. Essentially, here we are using each participant's 20 (r, p) pairs in order to estimate an R function that reflects the distortion in her reported beliefs due to the particular quadratic scoring rule incentive structure that she is subject to. Notice, that this structure varies across treatments as x varies and therefore the same subject would require a different adjustment curve if she were reassigned to a different treatment.
Using these estimates at an individual level allows us to recover the participants' true subjective beliefs, π t , from the first stage of the experiment in which they report their beliefs, r t , regarding the likelihood of ω = A being the true state. In Figure 6 in the main text, we graph the individual level correction curve estimates for two individuals in each treatment group. It is clear from these examples, firstly, that individuals in the sample are distorting their reported beliefs substantially relative to the known probabilities, and secondly, that the estimated correction curves are sufficiently flexible to fit different types of belief distortion behavior reasonably well. Furthermore, importantly, the graph in the top-left panel of the figure shows that, when an individual accurately reports her beliefs, then the correction mechanism has no harmful effect.
At the aggregate level, for each treatment group, T ∈ {1, 2, 3}, we estimate: where j indexes the 20 reported probabilities of individual i. This specification allows us to examine the distortion caused by the incentive environment to the average individual in each of the three treatment groups.

Appendix B.2.4: Calibration of the Belief Correction Procedure: Estimation
The belief correction procedure that we adopt involves assuming a flexible parametric form for the participants' utility and probability weighting functions in order to estimate the R function discussed in Equation 8 above. We estimate this function for each individual separately in order to correct the reported beliefs at the individual level. In addition, we estimate this function at the aggregate level for each of the treatment groups in order to obtain a measure of the average distortion of the incentive environment faced in each of the treatment groups. A detailed discussion of the mechanics of the Belief Correction Procedure we use is provided in Appendix B.2.3 above. Essentially, we are simply fitting a curve through each subject's belief elicitation incentive distortion. Figure 5 displays the average correction curves for each of the treatment groups, fitting a single curve to the reported belief data observed across all subjects in the relevant treatment group.
Comparing the three subgraphs, we see that the average individual distorts the beliefs she reports in a way that is consistent with what risk aversion under EU would predict, with the inverse-S shape distortion in the T1.S stakes treatment and the strong distortion downwards (away from the more desirable state) in both of the A stakes treatments. Furthermore, we see that the different ways of framing the same incentives in the two A treatments has a clear influence on behavior. In T2.C , the participants hedge far more when choosing their reported beliefs in comparison to those in the T3.S group. This is in spite of the fact that the incentives are identical in these two treatments. This indicates that the reported beliefs in the T3.S treatment are closer to the participants' true beliefs and motivates this presentation of incentives as preferable for future work that calls for the elicitation of beliefs when there are exogenous state-contingent payments.
At the individual level, there is a large degree of heterogeneity in the degree to which individuals distorted their reported belief away from their actual belief, given the incentive environment. Figure 6 displays the correction curves estimated for two individuals from each treatment group. It is clear from this figure that some individuals responded very strongly to the incentive environment in which their belief was elicited, while others reported their belief more accurately. The belief correction procedure is therefore very helpful for recovering the true beliefs of participants who responded strongly to the incentive environment. In cases where the individual simply reported their belief accurately, the corrected beliefs and the reported beliefs are exactly the same.

Separate Treatment (T3), Individual 28
Data Estimate-R 45 deg current paper. However, since it is important to also understand how we form beliefs about self, in some studies the states are determined by personal characteristics of the individual (e.g. IQ). This means that states are essentially equivalent to personal types (i.e. states = types). The implication of this is that if signals are informative about the state of the world, then High types are more likely than Low types to receive Up signals (and vice versa for Down signals). If High types update their beliefs differently from Low types, this can (in principal) lead to finding (what looks like) evidence that the average individual updates asymmetrically when no individual actually does.
In order to show this, I conduct a very simple simulation exercise. I construct a population of 10 000 individuals who are randomly assigned to one of two types, ω ∈ {High, Low}. Within each type, the agents' prior beliefs about the likelihood of being the High type are assigned randomly using a uniform distribution, distributed between zero and one. 64 High types receive an Up signal with probability q = 5 8 and Low types receive a Down signal with probability q = 5 8 . Using a seed of 1000 in STATA, the observed empirical distribution of signals across types is shown in Figure  7.   (i) Standard deviations in parentheses (ii) T-test for difference being significant: * = 10%, ** = 5%, *** = 1% (i) Standard errors in parentheses (clustered at the individual level).
(ii) All coefficients are significantly different from 0 at the 1% level. Therefore, t-tests of the null hypothesis (H 0 : Coefficient = 1) are reported: * = 10%, ** = 5%, *** = 1%. (iii) The rows corresponding to p (H 0 : γ a = γ b ) report the p-statistic from a t-test of the equality of the coefficients γ a and γ b (i.e. a test of the asymmetric updating hypothesis).