Correlation neglect and case-based decisions

In most theories of choice under uncertainty, decision-makers are assumed to evaluate acts in terms of subjective values attributed to consequences and probabilities assigned to events. Case-based decision theory (CBDT), proposed by Gilboa and Schmeidler, is fundamentally different, and in the tradition of reinforcement learning models. It has no state space and no concept of probability. An agent evaluates each available act in terms of the consequences he has experienced through choosing that act in previous decision problems that he perceives to be similar to his current problem. Gilboa and Schmeidler present CBDT as a complement to expected utility theory (EUT), applicable only when the state space is unknown. Accordingly, most experimental tests of CBDT have used problems for which EUT makes no predictions. In contrast, we test the conjecture that case-based reasoning may also be used when relevant probabilities can be derived by Bayesian inference from observations of random processes, and that such reasoning may induce violations of EUT. Our experiment elicits participants’ valuations of a lottery after observing realisations of the lottery being valued and realisations of another lottery. Depending on the treatment, participants know that the payoffs from the two lotteries are independent, positively correlated, or negatively correlated. We find no evidence of correlation neglect indicative of case-based reasoning. However, in the negative correlation treatment, valuations cannot be explained by Bayesian reasoning, while stated qualitative judgements about chances of winning can.


Introduction
Most theories of choice under uncertainty that have been proposed by economists or decision theorists are closely related to expected utility theory, and often are generalisations of that theory. (For one survey, see Machina and Viscusi 2014, chapters 12-14.) In these theories, uncertainty is represented by a set of states of the world, any one of which might obtain. Alternative acts available to an agent are represented as different assignments of consequences to states. Decision-making is conceptualised as a process of evaluating acts in terms of the subjective values that the agent attributes to their consequences and the probabilities or subjective weights that he or she assigns to the events in which those consequences occur.
However, case-based decision theory (CBDT), proposed by Schmeidler (1995, 2001), is based on a fundamentally different representation of decision problems. In broad terms, CBDT is in the tradition of psychological theories of reinforcement learning (e.g. Bush and Mosteller 1953). In CBDT, there is no state space and no concept of probability. The agent is not assumed to know anything about the outside world except what he has actually experienced as the results of previous decision-making. The agent uses neither forward-looking hypothetical reasoning ("What will happen if I choose X?") nor backward-looking counterfactual reasoning ("What would have happened if I had chosen X?"). He simply evaluates each currently available act in terms of the consequences he has in fact experienced as a result of choosing that act (or in some variants of the theory, choosing similar acts) in previous decision problems that he perceives to be similar to the problem at hand. Gilboa and Schmeidler (1995, pp. 606, 622) present CBDT and expected utility theory (EUT) as "complementary theories". They argue that CBDT is normatively most defensible and descriptively most plausible when "states of the world are neither naturally given, nor can they be simply formulated". Decision-making in such circumstances is decision under ignorance, as contrasted with decision under risk (i.e., with known probabilities) and decision under uncertainty (i.e., with known states of the world but unknown probabilities). For decision under ignorance, Gilboa and Schmeidler argue, "the very language of expected utility models is inappropriate". If this complementarity claim were taken at face value, any idea of testing CBDT and EUT against one another would be out of place.
Up to now, most experimental tests of CBDT have been designed on the premise that CBDT and EUT are complementary theories. For example, Ossadnik et al. (2013) set up an experimental environment of "structural ignorance" (p. 212), and compare the explanatory power of CBDT with that of three alternative criteria for decision under ignorance -maximin, maximax, and the pessimism-optimism criterion of Arrow and Hurwicz (1972). Similarly, Grosskopf et al. (2015) start from the explicit premise that CBDT "is not proposed as an alternative to or a generalization of [EUT]" (p. 640), and use an experimental environment in which "EUT is not a reasonable alternative decision-making procedure" (p. 652). They test CBDT against the null hypothesis of random choice and against a very simple heuristic, related to the "Take the Best" algorithm of Gigerenzer and Goldstein (1996). Unlike Ossadnik et al. and Grosskopf et al., who test specific parameterised forms of CBDT, Bleichrodt et al. (2017) report experimental tests of predictions derived from a very general, non-parameterised version of CBDT. But they too presuppose that CBDT is intended to be applied to situations in which states cannot be specified (p. 2), and design their experiment accordingly. 1 In contrast, the starting point for our experimental research was the conjecture that CBDT might have predictive power in situations in which states are well-defined and objective prior probabilities are known, but expected-utility decision-making requires the construction of posterior probabilities by Bayesian inference from observations of random processes. In terms of a distinction introduced by Hertwig et al. (2004), these are situations in which decisions are made from experience (i.e., the properties of alternative options must be inferred from previous experience), rather than from description (i.e., those properties are known a priori). In such situations, the "correctness" of Bayesian reasoning about probabilities is uncontroversial. Nevertheless, such reasoning can be cognitively demanding and its implications can be counter-intuitive. It is well-known that human judgements about probability often contravene Bayesian principles in predictable ways, for example because of the use of availability and representativeness heuristics (Tversky and Kahneman 1973;1983). Charness and Levin (2005) report evidence of deviations from Bayesian reasoning that are consistent with one of the simplest reinforcement learning rules, the "win-stay-lose-shift" heuristic. Given that case-based reasoning requires much less cognitive sophistication and is well-adapted to naturally-occurring problems of decision under ignorance, the hypothesis that human beings are predisposed to use it is psychologically plausible and worthy of investigation.
Since our conjecture has not been endorsed by the proposers of CBDT, we cannot structure our enquiry as a test of that theory. Our methodological strategy is to test predictions of EUT in situations in which there are intuitive reasons, derived from the underlying principles of CBDT, for expecting those predictions to fail in specific ways. This general strategy, used in combination with disparate intuitions, has led to many important developments in decision theory. The Allais paradox, common ratio effect, Ellsberg paradox, and preference reversal phenomenon were all first discovered by researchers who recognised potential limitations of EUT but who, at the time of discovery, were not in a position to propose a comprehensive alternative theory. These robust violations of EUT achieved the status of "exhibits" which informed the subsequent development of alternative decision theories. 2 These exhibits show nonrandom patterns in deviations of actual behaviour from the predictions of EUT. (For example, the Allais paradox involves comparisons between responses to two binary choice problems that EUT treats as equivalent. Individuals' choices are systematically more risk-averse in one problem than in the other.) Such an exhibit provides evidence that some non-random mechanism, not encompassed by EUT, is at work, but is not to be interpreted as confirming any fully-specified alternative theory. Our research was designed to have the potential to create exhibits of this kind which might inform the development and application of CBDT.
Our experiment tests two related intuitions about how case-based reasoning might lead to systematic deviations from the behaviour predicted by EUT. The first of these intuitions derives from a fundamental property of CBDTact separability. In CBDT, experiences are encoded in memory as cases; each case consists of a problem, the act that was chosen in that problem, and the result of that choice (measured in utility units). Given a new problem, a decision-maker assesses each available act by recalling the previous cases in which that act was chosen, and weighting the result of each of those choices by a measure of the similarity between the problem in which that choice was made and the new problem. In this algorithm, when any given act is assessed, the only items of memory that are used are those that record results that have actually been experienced as a result of the choice of that act. If memory is used in this way, information that could show positive or negative correlation between the results of different acts is never retrieved. Thus, one might expect casebased reasoners to differ from Bayesian reasoners by neglecting information about correlation.
The second intuition derives from the fact that probability judgements have no role in CBDT. Case-based reasoning does not lead to the formation of probability judgements that can be classified as "correct" or "incorrect" according to Bayesian principles. Instead, by moving directly from memory (encoded without reference to states or probabilities) to decisions, it circumvents the whole process of forming probability judgements. When an agent's case-based reasoning leads to a violation of EUT, an outside observer may be able to conclude that the agent has behaved as if she were trying to maximise expected utility but had made erroneous probability judgements, but the agent herself may have no perception of making or endorsing the judgments that the observer attributes to her. This raises the possibility that agents might in fact endorse probability judgements that are systematically different from those that are revealed in their decisions when those decisions are analysed in the theoretical framework used by EUT. It is of course debatable whether such a difference can properly be called a violation of EUT. (Opinions differ about whether "probability" in EUT refers to an agent's actual beliefs, or is merely part of a formal representation of her decision-making behaviour.) But a pattern of predictable differences between stated and revealed probabilities would be a surprising phenomenon calling for explanation.
These two lines of investigation have the potential to be mutually corroborating. Suppose that, in some experiment, participants' decisions are found to be insensitive to variations in relevant information about correlation. CBDT would offer a possible explanation for that observation. However, another possibility might be that the participants were Bayesian reasoners who had misunderstood the information given to them, perhaps because of weaknesses in the experimental design. But suppose it were also found that participants' stated probabilities showed Bayesian sensitivity to information about correlation. That would suggest that participants understood that information, used it when forming probability judgements, but failed to use in when making decisions. That would give additional credence to CBDT's explanation of correlation neglect in decisions.
The remainder of the paper is organised as follows. Section 2 describes act separability under CBDT, and Section 3 discusses stated and revealed probabilities under EUT. Section 4 presents the experimental design; Section 5 discusses the hypotheses, and Section 6 presents the results. Section 7 concludes with a further discussion.

Act separability and correlation neglect
In the core version of CBDT presented by Gilboa and Schmeidler (1995), the primitives are problems, acts and results. A problem is interpreted as a description of a unique decision situation in which the agent faces a non-empty set of acts, one of which must be chosen. The choice of a particular act in a particular problem leads to a result. A case is a triple (q, a, r) where q is a problem faced, a is the act chosen in that problem, and r is the result of that choice. The agent's memory M is a set of cases, interpreted as those cases that he has in fact experienced. There is a function u(·) which assigns a real-valued utility u(r) to every possible result r. A utility value of zero is interpreted as the agent's aspiration level, in that if the choice of an act results in zero utility, that result is treated as giving neutral information, and has no effect on the evaluation of the act. There is a function s(·, ·) which assigns a realvalued non-negative similarity coefficient s(p, q) to every ordered pair of problems (p, q). This coefficient is interpreted as a measure of the similarity of q to p, viewed from p. The theory is concerned with the behaviour of an agent who faces a new problem p, given a memory M. The agent chooses the available act a that maximises where the summation over the empty set is treated as having a value of zero. Now consider how this model deals with a simple case involving assets with returns that are potentially correlated. Suppose there are two lotteries L 1 and L 2 . The agent faces a sequence of decision problems; in each problem, he faces one of the two lotteries and chooses whether or not to play it. Each time he plays either of these lotteries, the result is either hit, with constant utility u H > 0 or miss, with constant utility u M < 0. Not betting leads to a utility of zero. If he bets, he immediately learns whether the outcome was hit or miss. Consider an agent who has faced at least 2n such prior problems (where n ≥ 1), and has chosen to play each lottery exactly n times. 3 He has experienced h 1 hits from L 1 and h 2 hits from L 2 . As a final problem, he faces some lottery L i , i ∈ {1, 2}, and has to choose whether or not to play it.
A natural CBDT representation of this situation (Model 1) would treat "play L 1 " as an act a 1 and "play L 2 " as an act a 2 , with a 0 denoting the act of not playing a lottery. Let L i , i ∈ {1, 2}, denote the set of problems in which lottery L i is offered. It would also be natural to assume that there is some parameter σ > 0 such that, for i ∈ {1, 2} and for any two problems p, q ∈ L i , s(p, q) = σ , while s(p, q) = 0 for all other pairs of problems p and q. Then whether the agent chooses to bet in the final problem is determined by the sign of Notice that U(a i ) is independent of h j for j = i, and that ∂U (a i )/∂h i = σ (u H − u M ) > 0. Thus, the final choice is determined solely by the number of hits that have been experienced on whichever lottery is faced in the final problem.
An alternative representation (Model 2) would treat "play some lottery" as a single act and model the difference between L 1 and L 2 in terms of a difference in similarity. 4 Assume there are parameters σ , σ , with σ > 0 and σ ≥ σ ≥ 0, such that for all problems p i , q i , q j , where i = j , s(p i , q i ) = σ and s(p i , q j ) = σ . Then whether the agent chooses to bet in the final problem is determined by the sign of Thus, the final choice is potentially determined by the numbers of hits that have been experienced on both lotteries. Hits on the lottery that is faced in the final problem have a strictly positive weight that is strictly greater than the weight for hits on the other lottery; the latter weight may be zero but cannot be negative.
Notice that, in deriving the conditions under which the agent chooses to bet in the final problem, we have made no assumptions about correlation between the two lotteries. In Model 1, the case-based reasoner's final choice depends only on his experience of the lottery he is actually facing. This is compatible with Bayesian reasoning only in the case of a Bayesian agent with a prior belief that the two lotteries are independent. In Model 2, the case-based reasoner's final choice may also be influenced by his experience of the other lottery, but only in the direction that corresponds with a Bayesian belief in positive correlation, and only to the extent that the two lotteries are perceived as subjectively similar. To assume that similarity judgements systematically incorporate prior beliefs about correlation would be inconsistent with one of the fundamental principles of CBDT -that decision-makers use only knowledge that is derived from direct experience. This thought points to a class of situations in which the underlying intuitions of CBDT suggest that there might be systematic violations of EUT. These are situations in which there is a mismatch between salient similarity cues and information about correlation that is not embedded in experiences of decision-making. Our experimental design attempts to create such situations.
It is already known that people have a tendency to neglect asset correlation, but the possible connection between this tendency and CBDT has not been explored. For example, in an asset allocation experiment, Kallir and Sonsino (2009) found that participants focused their attention on individual asset returns and that the resulting portfolio decisions did not take into account return correlations. Eyster and Weizsäcker (2011) found that even when equipped with correlation information, participants regarded assets independently and resorted to the 1/n heuristic (or naïve diversification) when allocating investment funds to individual securities. Similarly, in a hypothetical investment choice experiment, Hedesström et al. (2006) observed that participants focused on individual asset volatility rather than on portfolio volatility. Resulting portfolios were inappropriately diversified and had higher volatility. Correlation neglect has been found to be sensitive to the magnitude of the stakes involved in the decisions. In portfolio experiments with low stakes, Kroll et al. (1988) found that while participants were aware of the correlation in stock returns, correlation information was not reflected in their portfolio choices. However, when the stakes were significantly increased, participants managed to effectively diversify their asset holdings and the resulting portfolio choices were closer to the predictions of mean-variance optimisation. In contrast to the experiments we have just described, our experiment uses a design that allows us to investigate attitudes to similarity while controlling and manipulating similarity cues.

Stated probabilities and revealed probabilities
EUT legitimates a simple certainty equivalence procedure for eliciting an agent's subjective ranking of the probabilities of two events. Fix any two (non-null) events E 1 and E 2 . Consider consequences that are measured in money units, and assume that larger consequences are always preferred to smaller ones. Choose any two consequences x and y such that x > y. Let xE i y (i ∈ {1, 2}) denote the act that gives x if E i obtains and y otherwise. Given a suitable continuity assumption, EUT implies that there exist z 1 , z 2 ∈ (x, y) such that the agent has the preferences z 1 ∼ xE 1 y and z 2 ∼ xE 2 y. (We use ∼ to denote indifference.) Then the subjective probability of E 1 is greater than (equal to, less than) that of E 2 if and only if z 1 is greater than (equal to, less than) z 2 . Variants of this procedure are widely used in experimental economics to elicit probability judgements.
This procedure is valid for almost all recognised forms of non-expected utility theory. 5 Apart from continuity, to legitimate this procedure a theory must satisfy the following monotonicity property. Fix any event E i and any consequences x > z > y such that z ∼ xE i y. Then for any event E j , xE j y is strictly preferred to (indifferent to, strictly less preferred than) z if and only if E j is more probable than (equally probable as, less probable than) E i . Many theories of choice under uncertainty have this property, even those admitting violations of either the independence or transitivity axioms of EUT.
However, CBDT does not legitimate the certainty equivalence procedure. Since CBDT explains individuals' decisions without making any reference to events or probabilities, there is no way of using that theory to read off the probabilities of events from observations of decisions. Indeed, the logic of CBDT suggests that the idea of trying to infer attitudes to events from decisions is misguided. Notice that, if EUT holds, we can use any pair of non-indifferent consequences to elicit an agent's probability ranking of E 1 and E 2 , and the resulting ranking will be the same. In this sense, the certainty equivalence procedure elicits attitudes to events that are independent of the decision problems in which those attitudes are elicited. But this need not be true for a CBDT agent. For such an agent, indifference between two acts in a particular decision problem is determined by the agent's memory of cases that were similar to that problem; if the problem is changed, the relevantly similar cases can change too.
Once one recognises that there are theories of choice under uncertainty that make no use of probabilities, it becomes a significant research question to ask whether individuals' stated probability judgements are the same as the revealed probabilities that are elicited by the certainty equivalence procedure. Viewed within the conceptual framework of EUT, systematic inconsistencies between stated and revealed probabilities would be surprising, and would raise doubts about the construct validity of the concept of probability used in EUT. There would be particular reason for this kind of doubt if, in a setting in which correct probabilities could be formed by Bayesian reasoning, stated probabilities had Bayesian properties but revealed probabilities did not.
Our experiment is designed to detect instances of correlation neglect that might be induced by case-based reasoning. Such behaviour, were it to occur, would be picked up as non-Bayesian properties of revealed probabilities. As an additional diagnostic tool, we investigate the extent to which revealed probabilities are consistent with stated probabilities.
In experimental economics, it is standard practice to incentivise survey questions by reformulating them as decision problems with material (usually monetary) consequences. In the present case, however, the whole point of the enquiry is to discover whether stated probabilities differ from those that are revealed in decision problems. We believe that this is one of the class of "significant problems in economics that appear to be capable of experimental investigation only in nonincentivised designs" identified by Bardsley et al. (2010, pp. 336-337). The logic of our investigation requires that the elicitation of stated probabilities is not incentivised.

Experimental design
Our experiment used a set-up similar to that analysed in Section 2. In designing the experimental interface, 6 we tried to set a level playing field for investigating the prevalence of Bayesian and case-based reasoning. We avoided all explicit references to probability. Information about probability was always conveyed by describing physical randomising devices, but these descriptions were designed to make the translation between physical properties and objective probabilities as simple as possible.
Judgements about the "chances" of a winning outcome were elicited on a qualitative scale. Thus, participants were not primed to think about probability, but no obstacles were placed in the way of participants who were predisposed to think in this way. The two lotteries seen by any participant were displayed and described in exactly the same way, except for two differentiating features -colour (blue or yellow) and position (left or right on the participant's computer screen). Our background assumption was that, for participants who used Bayesian reasoning, it would be obvious that colour and position provided no information about probabilities or payoffs. However, by making these irrelevant features visually salient, we made it more likely that participants who used case-based reasoning would treat the two lotteries as distinct acts when encoding results in memory.
Each participant was informed about two lotteries, described as the blue game board and the yellow game board. Within an experimental session, the boards were the same for all participants. At appropriate times, these boards were displayed on participants' screens, vibrantly coloured in blue or yellow. The blue board always appeared on the left side of the participant's screen and the yellow board always appeared on the right. Each game board had 100 numbered boxes, corresponding with different numbered balls that might be drawn from a bingo cage (the same cage for both boards). Each box on each board had a predetermined value of either GBP 20 (a winning box) or zero (a losing box), but this value was not visible to participants until the box was "opened." At the start of the session, all boxes were closed.
The experiment had two parts. In Part 1, each participant played ten sample rounds, five using the blue board and five using the yellow board, in random order. These were described as "samples that will give you the opportunity to learn as much as you can about the game boards." In each sample round, the relevant game board was displayed on participants' screens. One ball was drawn from the bingo cage, without replacement. The corresponding box on the board was opened to show its value, with a green background if it was a winning box and a red background if it was a losing box. It remained open only until the end of the round. In this way, each participant learned the values of five of the 100 boxes on each board, selected at random subject to the constraint that all ten opened boxes would have different numbers. Because no more than one box was open at any time, participants could access the information revealed in the sample rounds only by attending to each round as it occurred and by memorising its outcome. This design feature ensured that participants accumulated memory through experience over time, as is usually assumed in interpretations of CBDT. Although the sample rounds were not decision problems in the strict sense of CBDT, the framing was designed to encourage participants to think of each round as a demonstration of what they might in fact experience, were they to choose to play the relevant game board. At the end of Part 1, all balls were returned to the bingo cage. 7 In Part 2, each participant faced a valuation task relating to one of the two game boards, selected at random, independently for each participant. She was told that she had the opportunity to play this board -that is, to receive the value of one box on that board, determined by one draw from the bingo cage. The mechanism of Becker et al. (1964) was used to elicit the minimum amount of money that each participant was willing to accept in return for giving up this opportunity. Each participant considered thirty-five possible offer prices, ranging from GBP 0.20 to GBP 20, and reported whether she was willing to accept each price. In effect, each participant faced thirtyfive binary choice problems, each involving a choice between playing the game board and receiving some amount of money with certainty. No feedback on the outcome of any of these choices was provided until the end of the experiment, when one of the offer prices (selected at random) was revealed as the actual offer price, and participants' decisions conditional on that price were implemented. Irrespective of whether she had chosen to keep or sell the opportunity to play, each participant saw an independent draw (with replacement) from the bingo cage. This determined the number of one box on her board, which was then opened. If she had chosen to keep the opportunity, she was paid the value of this box; if not, she was paid the offer price. All participants received an additional participation fee of GBP 2.
Notice that the valuation task is an instance of the certainty equivalence procedure, as described in Section 3. Thus, if participants behaved according to EUT (or, indeed, according to any of a wide class of non-expected utility theories), the reported valuation of any given participant would be an increasing function of the subjective probability she assigned to winning the final lottery.
At the start of each sample round in Part 1, and also immediately before the elicitation task in Part 2, each participant reported her judgement about "the chance that this game board [i.e., the board relevant for that round or task] will reveal a winning box in this round" on a ten-point Likert scale with end-points labelled "very low" and "very high". These judgement tasks were not incentivised.
We used a between-subjects design with three treatments, implementing different properties of correlation between the lotteries. Each session was pre-assigned to one of the three treatments. The differences between treatments can be described in terms of the proportions π B and π Y of winning boxes on the final game board on the blue and yellow boards respectively. In each treatment, the values of π B and π Y were determined by a random draw from a joint distribution of (π B , π Y ). Participants were fully informed about the prior distribution, but were not informed about the actual draw. Thus, they were given sufficient information to construct objective prior probabilities of winning on each board, which could then be updated by Bayesian inference in light of the outcomes of the sample rounds. The method by which this information was communicated to participants is explained in Appendix A.
In all three treatments, and in all realisations of the random mechanism, each board was assigned either ten or thirty winning boxes. We refer to a game board with thirty winning boxes as being a high type board (type H ), and a game board with ten winning boxes as a low type board (type L). Ex ante, a given board was equally likely to be of the high or low type, and therefore the ex ante chance of a given box on the board being a winning box was 0.2. Thus, colours and box numbers had no information content in themselves.
The treatments differed in the joint probability distribution used to assign the types to the two boards.
• In the independent treatment, the types of the boards were drawn independently. • In the positive correlation treatment, either both boards were type H , or both boards were type L. Each possibility was equally likely. • In the negative correlation treatment, one board was always of type H and the other of type L. It was equally likely that the blue board was type H and the yellow type L, or vice versa.
For a fully rational Bayesian reasoner, all relevant information about correlation is contained in the initial joint distribution of board types. However, the assignment of winning boxes to boards had certain additional features, designed to help participants to understand the correlation properties of each treatment.
In the positive correlation treatment, the winning numbers were the same for both boards. For example, consider a participant who observes a win on box 36 of the blue board in a sample round of the positive treatment. By abstract Bayesian reasoning from the knowledge that there is perfect positive correlation between π B and π Y , she can deduce that the observed win is just as informative about the probability of winning on the yellow board as it is about the probability of winning on the blue board. But our design allows her to make a more direct and more concrete inference, from the knowledge that box 36 on the blue board is a winning box to the conclusion that box 36 on the yellow board is a winning box. In the negative correlation treatment, the winning numbers were different for the two boards. Thus, from the knowledge that box 36 on the blue board was a winning box, a participant could infer that box 36 on the yellow board was a losing box. In the independent treatment, the assignments of winning numbers to the two boards were independent of one another.
As a further aid to understanding, each sample round ended with a screen, described as "summaris [ing] what you learned about the game boards in that round." On this screen, participants saw the game board they had just played, with the box just opened coloured green or red and showing "GBP 20" or "GBP 0." At the bottom of the board there was a message reinforcing this information. On the other side of the screen there was a message about the corresponding box on the other game board. For example, consider a round involving the blue game board. Suppose the announced box number was 4 and its value was GBP 20. Then (irrespective of the treatment) the message on the blue game board would be "4 is a winning box on the blue game board." In the positive correlation treatment, the message on the other side of the screen would be "4 is a winning box on the yellow game board." In the negative correlation treatment, it would be "4 is a losing box on the yellow game board." In the independent treatment, it would be "4 may be a winning box or may be a losing box on the yellow game board." Throughout Part 1, the computer screen also displayed a header that constantly reminded participants of the correlation between the two game boards.
Full instructions for the experiment, including examples of screenshots, are given in an Online Appendix.
Let i index the game board offered to the participant in Part 2, and j denote the other game board. Therefore, i, j ∈ {blue, yellow}, with i = j . Let h i denote the number of winning boxes observed among draws from board i in Part 1, and h j the number of winning boxes observed among draws from board j in Part 1. We refer to the pair (h i , h j ) observed by a participant as their memory. The values of h i and h j are sufficient to compute the Bayesian posterior probability that board i is type H , which we denote ρ i (h i , h j ). Table 1 presents these posteriors for each treatment, for the memories realised in our data. 8 For a given participant, EUT predicts that the stated valuation of a game board will be increasing in the posterior. However, because participants may vary in terms of risk attitudes, this relationship need not be linear, nor the same for all participants. Furthermore, it is unreasonable to expect participants to assess posteriors in a numerically precise way. Therefore, for each treatment I (independent), P (positive correlation), and N (negative correlation), we define corresponding reflexive, complete, and transitive orderings E I , E P , and E N over memories which are based on the qualitative properties of the Bayesian posterior. For any two memories (h i , h j ) and (h i , h j ) we define these orderings as follows: • In the independent treatment, the posterior is determined entirely by the number of winning boxes h i observed on board i. We therefore define E I by In the positive correlation treatment, the posterior is determined entirely by the total number of winning boxes h i + h j observed, irrespective of board. We therefore define E P by • In the negative correlation treatment, the posterior is increasing in the number of winning boxes observed on board i, and decreasing in the number of winning boxes observed on board j . The difference of the number of winning boxes between the boards, h i − h j , is not alone sufficient to determine the posterior. However, among memories for which h i − h j = m for some m, the posterior varies by only a small amount, compared to the variability of the posterior between two memories for which We derive relations of strict ordering E and equivalence ∼ E from E in the usual way.
The ranking of a memory is therefore determined by a treatment-specific summary statistic, which we write M I (h i , h j ) = h i for the independent treatment; , P , N}) therefore represents the ordering E t . The experimental design assigns each participant at random to a memory (h i , h j ) determined by the randomly-selected sample box numbers and the game board offered. Each participant k reports a valuation, which we denote v k (h i , h j ). Under EUT, and indeed any decision theory in which valuations are monotonic in the Bayesian posterior, valuations will increase in the summary statistic. The following hypothesis is therefore an implication of EUT.

Hypothesis 1 For each treatment t, the reported valuations are increasing in M t , the summary statistic of the memory.
Our interest is in whether actual behaviour deviates from EUT in directions that would be indicative of case-based reasoning. Our design is based on the working assumption that, in all three treatments, participants perceive plays on boards of the same colour as more similar to one another than to plays on boards whose colours are different. Given this assumption, the underlying principles of CBDT suggest the following case-based weighting conjecture: when participants report valuations of the offered board i, they give greater weight to winning boxes observed on that board than to winning boxes observed on board j . (Wins on board j might be given zero weight, as in Model 1 of Section 2, or positive but lower weight than wins on board j , as in Model 2.) Following the methodological strategy of looking for exhibits (see Section 1), we focus on cases in which this conjecture implies unambiguous biases in behaviour relative to EUT predictions. Such biases are implied only in the positive and negative correlation treatments.
We formalise the case-based weighting conjecture by defining an alternative ranking of memories in the positive correlation and negative correlation treatments.
• In the positive correlation treatment, fix some m ∈ Z + and consider the set of memories {(h i , h j ) : h i + h j = m}. This set is an indifference class of the ordering E P . The case-based weighting conjecture implies a strict ranking C P of the members of this set, • In the negative correlation treatment, fix some m ∈ Z and consider the set of This set is an indifference class of the ordering E N . The case-based weighting conjecture implies a strict ranking C N of the members of this set, The following hypothesis is an implication of the case-based weighting conjecture:

Hypothesis 2 For a given treatment t ∈ {P , N}, consider any set of memories {(h i , h j ) : M t (h i , h j ) = m} for some m. Within this set, reported valuations are increasing in h i , the number of winning boxes observed on the offered board.
Recall that each participant reported judgements about the "chance" of seeing a winning number prior to each box being opened, both in Part 1 and Part 2. To allow meaningful comparisons across participants, we normalise each participant's use of the Likert scale relative to that participant's judgement of the chance of winning on the first Part 1 game board. At the stage of the experiment at which this first judgement was reported, all participants had the same information, and that information implied that the objective probability of seeing a winning box was 0.2. For each participant, we define the variable expectation difference, which takes on the value +1 when the participant reports a higher chance in Part 2 than at the start of Part 1; −1 when the participant reports a lower chance in Part 2 than at the start of Part 1; and 0 when the chances reported in Part 2 and at the start of Part 1 are the same. Expectation difference can be interpreted as a self-reported judgement about whether the probability of seeing a winning box on the relevant board i, given the participant's memory (h i , h j ), is greater than, less than, or equal to 0.2. Thus, if stated probabilities are consistent with Bayesian reasoning, the following hypothesis will hold: Hypothesis 3 For each treatment t, the reported expectation differences are increasing in M t , the summary statistic of the memory.
Because CBDT makes no reference to probability, it does not support any particular conjecture about stated probabilities.

Results
We conducted a total of thirty sessions in March 2014; there were ten sessions for each treatment. Each session had six to eight participants and lasted 45 minutes. Average earnings were GBP 8.40, and ranged from GBP 2.00 to GBP 22.00. All 226 participants (119 male, 107 female) were recruited from the standing participant pool recruited by the Centre for Behavioural and Experimental Social Science at the University of East Anglia, managed via ORSEE (Greiner 2015).
The experimental design assigned participants randomly to experimental sessions, and therefore to treatments. Furthermore, because the sample results from the game boards were themselves determined at random, the design randomly assigned participants to realised memories (h i , h j ). Therefore, the data of any pair of memories Our hypotheses make predictions about how valuations and expectation differences change as a function of the observed memory. We base our statistical analysis on nonparametric tests using rank orders. These tests are well-suited to our hypotheses in that our hypotheses only concern trends in valuations as function of h i and h j . In addition, because our design uses a discrete and predetermined set of questions to elicit valuations, we obtain only a bracketing interval around each participant's valuation. These intervals are non-overlapping and are therefore ranked from high to low, which is sufficient for the rank order approach.
Our choice architecture did not require participants to give a monotonic response to the questions implementing the Becker et al. procedure. Participants could indicate simultaneously that they accepted some price while rejecting a strictly higher price. Of the 226 participants, 210 provided monotonic responses to the valuation questions. In addition, 4 participants gave responses which were monotonic with the exception of one isolated price, such that, if the response to that price were inverted, the resulting schedule is monotonic. We neglect the isolated non-monotonic response for those 4 participants and include them in our analysis. We drop the remaining 12 participants. For each of the 214 participants in our sample so defined, we define their valuation as the lowest accepted price. 9 Table 2 summarises the data on stated valuations. For each treatment and memory, we report the median valuation, as well as the first and third quartiles. We group memories for each treatment t according to E t , and provide the expected payoff of a single play of board i given the posterior. We note that this expected payoff is lower than the median stated valuation for most memories. This tendency to overvalue a gamble with low probability of winning a large amount is observed, for example, in preference-reversal experiments such as Seidl (2002), and overweighting of small probabilities of gains is a central feature of prospect theory (Kahneman and Tversky 1979). In the independent and positive treatments, median valuations generally increase in the posterior probability; the median valuation is roughly 100 pence above the expected payoff. In contrast, in the negative treatment valuations do not appear , h j ) = (0,3) (0,2) (1,2) (0,1) (1,1) (2,2) (1,0) (2,1) (2,0) (3,0) N 2 9 3 9 7 7 1 1 4 1 1 4 Quartile 1 280  400  100  200  500  400  300  550  350  450  Median  490  600  500  450  750  550  500  700  500  500  Quartile 3 700  750  500  600 1200 1100 750  900  800  600 N is the number of participants to observe each memory. Posterior refers to the Bayesian posterior, given the memory, that the offered board i is of the high type, and RNEUT gives the expected payoff of a single play of board i given the posterior. All values given in pence to vary systematically as a function of the posterior. For example, given the memory (h i , h j ) = (2, 0), the expected payoff to a play of board i is 578 pence, contrasting with an expected payoff of 223 given the memory (h i , h j ) = (0, 2). However, the median valuation in (2, 0) is actually lower than that in (0, 2). To give a more complete picture of how the distribution of valuations changes as a function of the memory, we provide measurements of effect sizes for each pair of memories within a treatment. Consider two memories (h i , h j ) and (h i , h j ), and let v be a randomly-chosen valuation observed at (h i , h j ) and v a randomly-chosen valuation observed at (h i , h j ). The effect size is computed as P r(v > v )+ 1 2 P r(v = v ). These are reported in Table 3, in which each cell is the probability that a valuation drawn from the row memory is greater than a valuation drawn from the column memory. This effect size is directly related to the test statistic used in the Each cell is the probability that a randomly-selected valuation reported given the memory in the row is greater than a randomly-selected valuation reported given the memory in the column. Asterisks indicate significance of the difference using the Mann-Whitney-Wilcoxon test; * at 10%, ** at 5%, *** at 1% Table 4 Test for  Each row reports the effect size of the pairwise comparison of valuations given two summary statistics. Asterisks indicate significance of the pairwise difference using the Mann-Whitney-Wilcoxon test; * at 10%, ** at 5%, *** at 1% Mann-Whitney-Wilcoxon (MWW) test for equality of distributions. As a convenience, we indicate in Table 3 the pairs of memories for which the effect size is statistically different from one-half using the MWW test.
Because memories are sorted in order of the posterior probability that board i is a high-type board, it would be expected under EUT to see effect sizes less than one-half towards the top-right of the matrix. This pattern is observed broadly in the independent and positive correlation treatments. However, no clear pattern emerges in the negative correlation treatment.
The MWW test is suitable for comparing the distributions of valuations in two groups. In some circumstances, our hypotheses require comparing across three or more groups. For this purpose we adopt the test for trend of Cuzick (1985). This test extends the rank-order calculation of the MWW test to three or more groups; for two groups Cuzick's test coincides with MWW exactly. This test requires ordering the groups being compared. The null hypothesis is there is no trend (increasing or decreasing) in the data across the groups, against the alternative hypothesis of a trend.

Result 1
In the independent and positive correlation treatments, valuations are increasing as a function of the summary statistic M t (h i , h j ) of the memory. In the negative correlation treatment, we cannot reject the null hypothesis of no trend. Support. Table 4 reports an aggregation of the pairwise comparisons as shown in Table 3, where pairwise comparisons are made after grouping valuations by the corresponding summary statistic of the memory. If valuations are increasing in the summary statistic, effect sizes below .5 are expected. This is observed in the pairwise comparisons under the independent and positive correlation treatments. Cuzick's test confirms the trend is significant (p = .049 for the independent treatment and p < .001 for the positive correlation treatment). In contrast, the direction of the pairwise comparisons is mixed for the negative correlation treatment, with two of the six pairwise comparisons going in the opposite direction. The null hypothesis of no systematic trend cannot be rejected (p = .58). The pairwise effect sizes are for comparisons as reported in Table 3. The p-value for each conditional test for trend is reported in each row. The overall p-value reported is calculated using Fisher's combined method Result 2 There is no significant evidence of the systematic deviations from EUT implied by the case-based weighting conjecture.
Support. Hypothesis 2 proposes that, in the positive and negative correlation treatments and conditional on a realised value m for the summary statistic, valuations will be increasing in the number h i of winning boxes observed on the offered board.
Conditional on a value of m, we conduct Cuzick's test for trend in h i , and report the results in Table 5. Each row corresponds to one conditional test. For each row, we repeat the pairwise effect sizes drawn from Table 3. If an increasing trend in h i were present, we would expect effect sizes in excess of .5. No individual pairwise comparison, nor any conditional test for trend, is significant. Because of the random assignment of participants to memories, each of the conditional tests is independent of the others, and therefore we can combine the results of the conditional tests into a grand test using Fisher's combined probability method. We cannot reject the null hypothesis of no trend overall (p = .33 for positive correlation and p = .41 for negative correlation).
We now turn to the data on stated judgements. Table 6 shows, for each Bayesian posterior, the breakdown of expectation differences across participants. Overall there is a general trend of decreased optimism, in that around half of the participants report a less optimistic judgement prior to Part 2 than at the start of Part 1. Given that the initial probability of seeing a winning box was only 0.2, the direction of this trend is consistent with the known tendency for rare events to be given greater weight when decisions are made from description -that is, on the basis of a priori information about the properties of random mechanisms -than when they are made from experience -on the basis of trial-by-trial experience of the realisations of random mechanisms (Hertwig et al. 2004). In our experiment, judgements about the first Part 1 game board could be made only from description, while judgements about the Part 2 board could take account of both description and experience. Table 7 presents effect size comparisons on the judgements data. Because we code only the sign of the change in reported judgement from the first period to the last, our measure of judgement is coarse. We therefore aggregate judgements by the level sets ∼ E given by EUT. All treatments show a general pattern of small effect sizes in the upper-right of the table, which indicates a broad consistency of judgements with Bayesian posteriors.

Result 3
In all treatments, judgements are more optimistic, as measured by expectation differences, when the Bayesian posterior probability is higher. Each cell is the probability that a randomly-selected judgement reported after the outcome in the row is greater than a randomlyselected judgement reported after the outcome in the column. Asterisks indicate significance of the difference using the Mann-Whitney-Wilcoxon test; * at 10%, ** at 5%, *** at 1% Support We follow the same approach as in Result 1, using the expectation differences in place of valuations. The details for Cuzick's test for trend are summarised in Table 8. If judgements are more optimistic when the Bayesian posterior is higher, then we expect effect sizes to be less than .5. This is generally the case for all three treatments. Notably, for the negative correlation treatment, this contrasts sharply with the data for the corresponding Result 1 test in Table 4. We reject the null hypothesis of no trend for all treatments, (p = .084 for independent, p = .048 for positive correlation, and p = .030 for negative correlation).
Our data therefore show a contrast between the patterns in participants' valuations and the patterns in their expectation differences in the negative treatment. Table 2 suggests stated valuations remain high in the negative treatment even when the Bayesian posterior gives a low chance the offered board is of the high type, and Result 1 formalises this statement. However, Result 3 shows expectation differences are more optimistic in situations in which the Bayesian posterior advises they should indeed be more optimistic. Each row reports the effect size of the pairwise comparison of judgements given two summary statistics. Asterisks indicate significance of the pairwise difference using the Mann-Whitney-Wilcoxon test; * at 10%, ** at 5%, *** at 1%

Discussion and conclusions
Our experiment was motivated by the conjecture that human decision-makers have a tendency to use case-based reasoning even when events are well-defined and when the objective probabilities of those events can be found by Bayesian reasoning from prior information. More specifically, we tested the "case-based weighting conjecture" that systematic violations of EUT occur when information about correlation is not embedded in experienced decision outcomes and when Bayesian reasoning involves inferences between lotteries that are saliently dissimilar. We designed the experiment in the belief that this conjecture was consistent with the psychological intuitions of CBDT. Had the evidence confirmed that conjecture, we would have interpreted our results as providing support for CBDT. In fact, we found no significant evidence of the effects implied by the conjecture. If we are right to claim that our conjecture reflects the assumed psychology of case-based reasoning, we must conclude that our results provide weak evidence against CBDT. However, we should point out that an experiment with some similarities to ours has found evidence of a type of correlation neglect which, although not the type that we tested for, is also consistent with CBDT. Charness and Levin (2005) report an individual choice experiment in which there were two urns ("left" and "right") from which balls were drawn with replacement. Balls were either "white" (losing balls) or "black" (winning balls). A prior random event, not revealed to subjects, determined the distribution of balls in the two urns in such a way that a winning draw from one urn increased the posterior probability of a winning draw from the other. (Compare our positive correlation treatment.) In the context of CBDT, the most interesting of Charness and Levin's treatments were those in which subjects were required to play one lottery on a specified urn and then, after the outcome of that lottery was revealed, played a second lottery on whichever urn they chose. The parameters of the experiment were fixed so that, if the first lottery was on the left urn, the Bayes-rational response was to stay with the left urn after a loss, but to shift to the right urn after a win. In fact, subjects' responses showed a strong tendency towards the opposite pattern, as implied by the win-stay-lose-shift heuristic (Robbins 1952). This heuristic is consistent with CBDT if the utility of a win is higher than the subject's aspiration level. 10 A significant difference between our experiment and Charness and Levin's is that in ours, subjects did not choose between "staying" and "shifting"; they saw a fixed number of trials of each lottery and then reported a valuation for one of the lotteries, which had been selected at random. Thus, our experiment tests for a possible effect of CBDT in a situation in which the win-stay-lose-shift heuristic is not applicable.
Although our experiment finds no evidence of the specific form of correlation neglect implied by our conjecture, it is clear our participants found it difficult to perform Bayesian reasoning about decisions that involved negative correlation. More precisely, when reporting valuations of an offered lottery i, participants were generally able to recognise the irrelevance of outcomes of lotteries that were independent of i, and to recognise the relevance of outcomes of lotteries that were positively correlated with i; but they failed to recognise the relevance of negative correlation. Surprisingly, however, their stated probability judgements about lottery i (expressed in qualitative statements about the "chance" of winning) showed Bayesian responses to independence, positive correlation, and negative correlation. Our post-hoc conjecture is that the problem was one of cognitive overload. Intuitively, negative correlation is a more difficult concept than independence or positive correlation, and working out one's willingness to exchange a lottery for money is more difficult than merely judging the chance of winning if one plays it. The combination of these two sources of difficulty may have been too challenging for our participants.
This surprising finding raises doubts about the almost universal practice in experimental economics of treating incentivised decision problems as the gold standard for the elicitation of participants' beliefs. If linking the elicitation of beliefs with problems of decision under uncertainty can lead to cognitive overload, experimentalists need to consider the possibility that direct, nonincentivised questions about beliefs might produce more accurate data.
In calculating the posteriors, recall that the samples from the board are drawn without replacement. Therefore, the distribution of hits is given by a suitable parameterisation of the hypergeometric distribution.
In the independent treatment, let H denote the state in which board i is of the high type, and L the state in which it is low. If board i is of the high type, then h i follows the hypergeometric distribution, The calculation of probabilities for the negative treatment deserves more detailed comment. Let H denote the state in which board i is high and board j is low, and L denote the state in which board i is low and board j is high. Suppose that among the five numbers selected as trials for board i there were h i winning numbers on board i, and s j winning numbers on board j . Among the five numbers selected as trials for board j , suppose there were h j winning numbers on board j , and s i winning numbers on board i. In our setup, participants do observe h i and h j , but do not observe s i and s j . Because in our design the winning numbers on the two boards were guaranteed to be distinct, the random variables h i and h j are not independent conditional on H or L.
The probability of having h i winning numbers on board i and s j winning numbers on board j among the five numbers revealed for board i is given by the multivariate hypergeometric distribution, P (h i , s j |H ) = . By symmetry, P (h i , h j |L) = P (h j , h i |H ). Using these formulas it is straightforward to tabulate the posterior probabilities as given in Table 1. The calculation above takes care to reflect the fact that the sets of winning numbers on the two boards are disjoint. As an alternative simplifying assumption, one could assume that h i and h j are independent conditional on the assignment of the board types. 11 Then, conditional on the board type, h i and h j would follow independent hypergeometric distributions, and the posterior probabilities would be These differ only slightly from the results of the precise calculation given in Table 1. The ranking of memories by posterior is not affected.