Aggregating Imprecise or Conflicting Beliefs: An Experimental Investigation Using Modern Ambiguity Theories

Two experiments show that violations of expected utility due to ambiguity, found in general decision experiments, also affect belief aggregation. Hence we use modern ambiguity theories to analyze belief aggregation, thus obtaining more refined and empirically more valid results than traditional theories can provide. We can now confirm more reliably that conflicting (heterogeneous) beliefs where some agents express certainty are processed differently than informationally equivalent imprecise homogeneous beliefs. We can also investigate new phenomena related to ambiguity. For instance, agents who express certainty receive extra weight (a cognitive effect related to ambiguity-generated insensitivity) and generate extra preference value (source preference; a motivational effect related to ambiguity aversion). Hence, incentive compatible belief elicitations that prevent manipulation are especially warranted when agents express certainty. For multiple prior theories of ambiguity, our findings imply that the same prior probabilities can be treated differently in different contexts, suggesting an interest in corresponding generalizations.


Introduction
People's beliefs are mostly shaped by what they learn from other people. For many important decisions, advice from others, preferably experts, is aggregated into a final judgment. Hence, a rich literature has developed on belief aggregation (Clemen & Winkler 1999;Cooke 1991;Dietrich 2010). This literature has mostly used Savage's (1954) subjective probabilities to quantify degrees of belief, implemented in his Bayesian (expected utility) model for decision making.
Belief aggregation typically concerns events with unknown probabilities. Such events, commonly called ambiguous, are known to generate non-Bayesian behavior (Ellsberg 1961;Keynes 1921;Knight 1921). Our paper will show that such deviations from Bayesianism are relevant for belief aggregation. We thus contribute to recent literature using ambiguity models rather than Bayesian models to analyze belief aggregation (Baraldi & Zio 2010;Gajdos et al. 2008;Zimper and Ludwig 2009;Teper 2010). These recent papers added decision models to earlier studies that investigated the aggregation of imprecise probabilities in statistics, fuzzy set theory, and artificial intelligence (Nau 2002 and its references). Whereas these recent papers and their predecessors were theoretical, our contribution will be empirical. By recognizing the empirical violations of Bayesianism, we obtain results for belief aggregation that are empirically more valid than those obtained before in Bayesian analyses. We can identify and isolate the relevant factors and their effects more reliably. This paper will use Abdellaoui et al.'s (2011) source method to analyze ambiguity. This method is based on axiomatized decision models (Gilboa 1987;Gilboa & Schmeidler 1989;Schmeidler 1989;Tversky & Kahneman 1992), and its tradeoff between parsimony and fit suits our purposes well. In particular, we will use Abdellaoui et al.'s indexes of pessimism and insensitivity, and will adapt them to our direct measurements of ambiguity. As explained by these authors, pessimism (ambiguity aversion in our case) is a motivational component, related to a general disliking or liking of ambiguity. Insensitivity (in our case ambiguity-generated, referred to as a-insensitivity) is another, cognitive, component (Kunreuther, Novemsky, and Kahneman 2001) prior to any preference and orthogonal to the aversion/seeking component. It reflects a lack of understanding of uncertainty and is needed, besides ambiguity aversion, to explain the ambiguity attitudes that are found empirically. It explains, for instance, that people take uncertainty too much as fiftyfifty, and do not sufficiently discriminate between different levels of likelihood. Ainsensitivity is the extension to ambiguity of the well-known inverse-S shaped probability weighting. The two components, ambiguity aversion and a-insensitivity, depend on the source of uncertainty considered, and can, for example, be different when the source of uncertainty concerns domestic stocks or foreign stocks.
For the sake of clarity, our paper will study the simplest possible situations of belief aggregation, where there is only one event to be judged by a decision maker and there are only two agents (we will use this term henceforth) whose judgments are aggregated by the decision maker. We also assume that there is no interaction between the agents themselves, or between the agents and the decision maker, so that no group process is involved.
We will investigate how decision makers aggregate belief judgments for three sources of uncertainty. The first source serves as a control treatment. Here both agents are Bayesian and agree with each other (and everyone else). This is the common case of generally accepted objective probabilities, with no ambiguity involved. We call this source risk.
For the second source of uncertainty, each agent alone fully satisfies Bayesianism, with a precise probability judgment. However, the two agents give different judgments, generating ambiguity for the decision maker aggregating their beliefs. 1 This source of uncertainty, which is characterized by between-agent ambiguity (heterogeneous beliefs), is called conflict (C-)ambiguity in this paper.
The third source of uncertainty is characterized by within-agent ambiguity and relates to the situation where each agent gives an imprecise probability judgment. 1 Studies with varying degrees of conflicting information include Budescu et al. (2003), Cameron (2005), Dean and Shepherd (2007), Einhorn and Hogarth (1985), Kopylov (2008), Kunreuther et al. (1995), Lobo & Yao (2010), and Smithson (1999). Viscusi and Chesson (1999) and Viscusi (1997) showed how conflicting information from different sources is especially prone to generating irrational behavior. This situation is closest to ambiguity as mostly studied in the literature. 2 In this paper, this third source of uncertainty is called imprecision (I-)ambiguity. To keep our analysis as simple as possible, we assume homogeneous beliefs in the I-ambiguity case; i.e., the two agents agree. Smithson (1999) and Cabantous (2007) found differences between the second and third sources of uncertainty (conflict versus imprecision) in experiments, but Cabantous et al. (2011) found no clear differences.
Our paper reconsiders the case using ambiguity theories.
Our experiment concerns loss outcomes. Risk and ambiguity attitudes for losses are subject to debate and, hence, their study is of special interest. Classical economics assumes universal risk aversion, but Kahneman and Tversky's (1979) prospect theory argued for risk seeking for losses (reviewed by Wakker 2010 p. 264). Most theoretical studies assume universal aversion to ambiguity, but most empirical studies find prevailing ambiguity seeking for losses (Viscusi and Chesson 1999;Wakker 2010 p. 354). Although losses and gains are equally important for applications, academic studies have focused almost exclusively on gains. This paper focuses on risk and ambiguity for losses.
In two experiments, we measure certainty equivalents of risky prospects and find the usual violations of expected utility, with inverse-S shaped probability weighting.
This finding underscores the desirability to use nonexpected utility in our descriptive analysis. We then measure matching probabilities (objective-probability gambles equivalent to gambles on conflict or imprecision ambiguity).
For I-ambiguity, which is close to the usual form of ambiguity, we find the common amplification of the overweighting of extreme events, reflecting increased insensitivity. For C-ambiguity, if analyzed in the usual way (taking midpoints of probability intervals), we find the opposite, with reduced insensitivity. We do not interpret the latter finding as a violation of common views on ambiguity, but rather as evidence against taking probability midpoints: In C-ambiguity, experts expressing certainty are believed more than experts expressing doubts. This interpretation is supported by direct measurements of belief that were added in the second experiment.

Prospects and their evaluation
The preferences of a decision maker concern prospects. A prospect yields outcome x or y, where it is uncertain which of the two will result. We assume x ≤ y ≤ 0 throughout. Hence, outcomes are losses (if negative) or 0. We assume that two agents have given their judgment on the likelihood of the outcomes, and we consider the following three situations, formally referred to as sources (of uncertainty).
• Risk: x p y denotes a prospect yielding x with known objective probability p and y with probability 1−p. Everyone agrees about the probabilities, including the two agents.
• Imprecision (I) ambiguity: The agents are not able to give a precise probability judgment, and they only indicate a probability interval. We assume that they give the same interval [ℓ,h]. x [ℓ,h] y denotes the resulting prospect. We assume 0 ≤ ℓ ≤ h ≤ 1 throughout.
• Conflict (C) ambiguity: x {ℓ,h} y denotes the prospect with no known probabilities available. Both agents give a precise probability judgment, but the two judgments are different. One agent judges the probability to be ℓ, whereas the other judges it to be h. We again assume 0 ≤ ℓ ≤ h ≤ 1 throughout.
Our study only uses prospects that yield no more than two outcomes, both nonpositive. Virtually all decision models existing today agree on this domain. 3 They all amount to the following evaluation, which we call binary rank-dependent utility. When choosing between prospects, the one with the highest evaluation is preferred.
x p y ! w(p)U(x) + (1−w(p))U(y); (1.1) x [ℓ,h]  In our experiments, we measure certainty equivalents of risky prospects. The certainty equivalent (CE) of a prospect is the sure amount that is equally preferred (indifferent, denoted ~) to the prospect. That is, U(CE) is equal to the above evaluation of the prospect. We compare different sources of uncertainty. For example, we define the matching probability of [ℓ,h] as the probability r such that x [ℓ,h] y ~ x r y for all x,y, and the matching probability of {ℓ,h} as the probability r such that x {ℓ,h} y ~ x r y for all x,y. A matching probability always exists and is unique.
Because an indifference for any one pair x < y implies the same indifference for all such x,y, we can use any such pair to find matching probabilities.

Properties of probability weighting functions (risk attitude)
Figure 1 depicts some possible properties of weighting functions w(p) for risk. We will later consider similar properties for other functions (matching probabilities). Fig.   1a depicts overweighting of losses, implying pessimism and enhancing risk aversion.  Figure 1c shows an inverse-S shape that combines optimism and pessimism, with overweighting of small probabilities and underweighting of large probabilities. All weights are then moved towards 0.5, suggesting a lack of sensitivity and of discriminatory power. It is a move in the direction of taking everything as fifty-fifty.

INSERT FIGURE 1 ABOUT HERE
Although there have not yet been many empirical studies into risk attitudes for losses, the prevailing shape so far has been the one in Fig. 1d (Wakker 2010 §9.5). It is the combination of some optimism and insensitivity. It is similar to the prevailing shape for gains (where the low elevation reflects pessimism rather than optimism), but is closer to linearity. The underweighting of high probabilities of worst outcomes implies, by complementarity, that small probabilities of good outcomes are overweighted. In what follows, the expression that extreme and rare events are overweighted refers to both these phenomena. For the purposes of this paper we need not define the aforementioned properties formally, because we can use graphs to illustrate them. Formal definitions are in Wakker (2010 Chs. 6 and 7).

Properties of matching probabilities (ambiguity attitude)
We now turn to ambiguity and events E without unknown probabilities. Properties similar to the ones explained for w(p) can be defined for general weighting functions W(E), even though we cannot draw graphs for W (Wakker 2010 Ch. 10). For the special case studied in this paper, graphs can still be devised and used, as will be done in what follows.
We will consider events related to probability pairs and W{p−r, p+r} = w c (p). (1.5) Here w i is the imprecision weighting function and w c is the conflict weighting function. In the literature, it is common to relate [p−r, p+r] and {p−r, p+r} to a degree of belief equal to the midpoint probability p, 4 and to interpret r as a measure of ambiguity. We take this approach as our working hypothesis, we test its plausibility, and we will later discuss deviations.
Weighting functions W can conveniently be summarized in terms of w, the weighting function for risk, and matching probabilities m, because, by the evaluations For later purposes, we rewrite it as m = w − 1 (W). (1.7) Thus, the general attitude towards uncertainty consists of the risk attitude comprised by w(p), and added on top of that, the ambiguity attitude comprised by the matching probability function m. Ambiguity is the difference between uncertainty and risk, and is thus captured by m. Before ambiguity theories became popular, matching probabilities were widely used in expected utility to measure subjective probabilities (Arrow 1951, Footnote 4;Holt 2007 §30.5;Raiffa 1968;Winkler 1972 p. 272 Given a fixed r, we can define matching probabilities m i (p) and m c (p) as the matching probabilities for I-and C-ambiguity (details follow below). They now are maps on the unit interval, and graphs can be drawn as before to depict their properties. We have W[p−r, p+r] = w i (p) = w(m i (p)); (1.8) W{p−r, p+r} = w c (p) = w(m c (p)); (1.9) m i (p) = w − 1 w i (p) and m c (p) = w − 1 w c (p). (1.10) Pessimism of the matching-probability function m reflects higher pessimism for uncertainty than for risk, i.e., ambiguity aversion. Insensitivity of m similarly reflects higher insensitivity for uncertainty than for risk, which we call a(mbiguity-generated) insensitivity. In other words, the effects of ambiguity (matching probabilities) reinforce those of risk.

Indexes of aversion and insensitivity towards ambiguity
This paper quantitatively analyses ambiguity attitudes through matching probabilities, using Abdellaoui et al.'s (2011) indexes of pessimism and insensitivity. We modify these indexes regarding two aspects of our study. First, we deal with losses.
Consequently, overweighting captures pessimism, and not optimism as for gains. We therefore multiply the original pessimism index by -1 so that it still corresponds with pessimism. Second, we consider these indexes for matching probabilities (as functions of midpoint probabilities) rather than for regular weighting functions. This means that the risk component w has been removed. Thus, the index of pessimism represents the extra pessimism generated by ambiguity on top of the pessimism for risk. That is, it reflects ambiguity aversion. Similarly, the index of a-insensitivity captures the extra insensitivity generated by ambiguity.
To compute the indexes, we use linear regression to find linear functions c + sp (truncated at values 0 and 1) (1.11) that best fit the (data points observed regarding the) matching probabilities. We emphasize that this regression line should not be interpreted as a statistical estimation.
It only serves to recode data using mathematical calculations. Thus, we choose c and s in Eq. 1.11 to minimize a squared distance without any reference to an underlying statistical model. Similarly, the linear regression should not be interpreted as any commitment to the neo-the additive weighting functions of Eq. 1.11. It can be applied to any weighting function chosen by a researcher, also if not neo-additive, and to any set of data points, so as to obtain indexes of ambiguity attitude.
We transforming additive subjective probabilities into decision weights, and capturing all deviations from expected utility. Ambiguity attitudes, reflecting differences between known and unknown probabilities, could then be derived from differences between source functions for unknown probabilities and those for known probabilities. We consider the indexes for matching probabilities. As Eqs. 1.6-1.10 show, the risk component has then been removed and, hence, our indexes directly reflect ambiguity.
It is a common working hypothesis in studies of probability intervals that matching probabilities equal to the midpoints of those intervals reflect ambiguity neutrality, as in Fig. 2e.

Experiment A: Measuring risk attitudes and ambiguity attitudes for imprecise and conflicting sources of information
This first, explorative, experiment examines risk and ambiguity attitudes for I-and Csources.
Analysis. We measure certainty equivalents of several prospects. We first derive utility and probability weighting for risk. Then we study ambiguity attitudes by analyzing how the two ambiguous sources differ from risk. The latter is done by analyzing how their matching probabilities differ from the corresponding midpoint probabilities. In what follows, we derive matching probabilities from parametric fitting. Appendix A3 reports on the results of an alternative, parameter-free analysis, based on direct comparisons of CEs, which gives results consistent with those reported in the main text. We use power utility, and Goldstein and Einhorn's (1987) probability weighting function, to fit data for risk. These families are commonly used. Utility u is concave (u´ decreasing as the negative x increases towards 0), enhancing risk aversion, whenever β ≥ 1, and it is convex whenever β ≤ 1.
The larger δ is, the more elevated the probability weighting curve is, generating more pessimism. The larger γ is, the more sensitive the curve is. If we calculate the pessimism and insensitivity indexes for the probability weighting functions for risk considered here 5 , then δ will be closely related to the pessimism index and γ will be closely related to the insensitivity index.
In both experiments, we used a standard nonlinear least square regression (Levendberg-Marquadt algorithm) to simultaneously obtain the estimates of the utility and probability weighting parameters. Throughout this paper, t-tests are two-sided unless stated otherwise.
Subjects. N = 61 post-graduate students (60 male, median age = 22) in civil engineering at Arts et Métiers ParisTech, Paris, France 6 were invited by email. None of them had participated in an experiment on decision making before.
Stimuli. We measured the certainty equivalents of the 20 prospects in Table 1.
Throughout this paper, the probability spread r is 0.1. For each source, we considered five different (midpoint) probability levels p, namely 0.1, 0.3, 0.5, 0.7, and 0.9. For example, prospect number A11 (−1000 [0,0.2] 0) was the second one presented to the subjects, as indicated by rank 2 in the table.

INSERT TABLE 1 ABOUT HERE
Incentives. The subjects were given a fixed €10 participation fee. Our use of hypothetical choice rather than real incentives is discussed in §4. 5 Here they do not reflect ambiguity attitudes but risk attitudes. 6 The instructions and the presentations of the prospects in the experiment were accordingly written in French.
Procedure. The prospects were presented to the subjects on a computer screen in a fixed random order (see the column Rank in Table 1). The sure loss was always displayed on the right-hand side of the screen and the other prospect was shown on the left-hand side. The subjects had to choose between these two options. The following text was given to the subjects to describe the risky prospects, where the agents were called experts: "The two experts have exactly the same best estimate of the risk. Each expert confidently estimates that there is a (100p)% risk of losing €x (otherwise, the loss is €y)" (screenshot A in Figure A.1 in the appendix). We substituted the appropriate numerical values for p. Screenshots of typical choice tasks are available in the appendix.
The explanation for C-ambiguity was as follows: "The two experts do not agree on the risk. They have different best estimates: Expert A confidently estimates that there is a 100(p−0.1)% risk of losing €1000 (otherwise, the loss is €0). Expert B confidently estimates that there is a 100(p+0.1)% risk of losing €1000 (otherwise, the loss is €0)" (screenshot B in Figure A.1). We substituted the appropriate numerical values for p + 0.1 and p − 0.1. In addition, we displayed two different pie graphs, one for each expert's prediction, to visually make clear that the two experts did not have the same estimate of the probability of the loss.
The explanation for I-ambiguity was as follows: "The two experts have exactly the same best estimate of the risk. Each expert confidently estimates that the risk of losing €x ranges from 100(p−0.1)% to 100(p+0.1)% (otherwise, the loss is €0)." A dynamic pie was shown on the screen to convey the imprecision of the forecast, with the size of the sectors of the pie chart slowly changing between the two bounds of the interval.
Measuring certainty equivalents. For each of the 20 prospects, the subjects were asked to make approximately 5 binary choices between the prospect and a sure loss in a bisection procedure. In this procedure, the midpoint between the best sure loss less preferred and the worst sure loss more preferred than the prospect is taken as the CE of the prospect. Details are in the appendix.
Checking consistency. At the end of the experiment, the subjects were asked to give their preferences between 6 prospects (A1, A3, A16, A18, A11, and A13) and their expected value a second time (the first time was as the first preference question in bisection). Table 2 gives the consistency rates for the six questions presented twice. The consistency rates vary between 69% and 89%, with an average of 77.32%. In other words, approximately three-quarters of the subjects gave the same answer the second time. This rate agrees with common findings in the field (Abdellaoui 2000;Camerer 1989). Table 3 displays the results from data fitting for risk.
Details are in the appendix.

Ambiguity attitudes from parametric fitting using matching probabilities
Having estimated the utility function U, weights W(E) can be obtained from With all weighting functions available, we obtain matching probabilities through Eq.
1.10. Figure 3 reports their mean values and 95% confidence intervals. This figure provides a graphical illustration of I-and C-ambiguity attitudes. The figures are similar to Figs. 2a and 2f, but are closer to linearity.

INSERT FIGURE 3 ABOUT HERE
The ambiguity aversion index for I is positive (mean = 0.06, p < 0.01, t 60 = 2.94).
These differences are influenced by the probability level (p < 0.01, F 205.17 = 9.24).
Using paired t-tests adjusted with the Bonferroni correction, we find that m i (0.1) > m c (0.1) (p < 0.01, t 60 = -6.71) and m i (0.9) < m c (0.9) (p < 0.01, t 60 = 3.50). Other differences are not significant. To illustrate the size of the differences found, consider a loss of €1000 with a midpoint probability p = 0.10. The average CE is €102 for conflict-ambiguity, €146 for risk, and €221 for imprecision-ambiguity. Such smallprobability losses are relevant for insurance, where the different sources generate big differences in insurance premiums.

Summary and discussion of results of Experiment A
Risk attitudes. Our results for probability weighting under risk agree with the prevailing findings in the literature. We find an inverse-S shape with overweighting of small probabilities and underweighting of moderate and large probabilities. The latter enhances optimism and risk seeking for losses. We find weakly concave utility.
Many papers have found that utility for losses is close to linear and preferences are close to risk neutrality (reviewed by Wakker 2010 p. 264). Abdellaoui, Bleichrodt, and L'Haridon (2008) also found weakly concave utility in combination with weakly prevailing risk seeking for losses. Viscusi, Phillips, & Kroll (2011) showed that observations of other people's choices do influence own decisions even when the own risks are fully known, so that the choices of the other people are not informative. This effect can play a role in our experiment if the experts' information is taken as reflecting the experts' decisions.
However, this effect is intrinsic in belief aggregation, and we do not consider it to be a distortion.
Evidence against the Bayesian model. As just explained, the probability weighting functions for risk deviate significantly from the Bayesian identity function w(p) = p, falsifying expected utility. The deviations of the curves in Figure 3 by themselves could be accommodated by the Bayesian model, by assuming that (Bayesian) subjective probabilities deviate from the midpoints of the intervals.
However, these deviations are too pronounced at the extremes, especially for Iambiguity, to be plausible. Experiment B will provide further evidence.  Viscusi 1989;Weber 1994 pp. 237-238). For losses, we are aware of only one study examining a-insensitivity (Abdellaoui, Vossmann, & Weber 2005); they confirmed it. It leads, for instance, to lower insurance premiums rather than the higher ones predicted by the universal ambiguity aversion often assumed in theoretical studies.

I-ambiguity aversion and
The findings for C-ambiguity regarding insensitivity do not agree with the common finding of a-insensitivity when analyzed in the usual way. We find less, rather than more, insensitivity than under risk. This finding is hard to reconcile with modern views on ambiguity, and suggests that background assumptions are violated.
We will explore and discuss this suggestion in more detail in Experiment B.
Explanation for over-sensitivity and less ambiguity aversion in C-ambiguity. In general, there must be a control for degrees of belief to measure ambiguity attitudes.
For example, if we find a preference for gambling on an ambiguous rather than on an unambiguous event and want to explain this finding as ambiguity aversion, then the two events must have the same degree of belief/likelihood in some sense. In situations of ambiguity as considered here, comparisons are usually made between events with the same midpoint probabilities, where the latter should provide the required control. We have followed this averaging tradition in our analysis for both Iand C-ambiguity.
For extreme events in C-ambiguity, the above belief/likelihood control may be problematic though. If the first agent assigns probability 1 to some event and the second agent assigns probability 0.8, then the first agent is apparently sure whereas the second is uncertain. It then makes sense to assign more confidence weight (Nau 2002) to the first, sure, agent's judgment than to the second, insecure, judgment. The perceived likelihood will then exceed the midpoint 0.9. Such a processing of information is perfectly sensible, and leads to the high weight assigned to such events.
It can be captured by rational Bayesian decision models, irrespective of any ambiguity. Our finding then implies that agents who express certainty receive extra weight in conflicting-belief aggregation. This is consistent with findings by Budescu & Yu (2006,2007), Yates et al. (1996), and Keren and Teigen (2001). It generates an effect counter to a-insensitivity, and, if it is not recognized, it may seem that neutrality or even oversensitivity was found, as happened in our experiment. happen. It will enhance S-shaped judged probabilities (as functions of midpoint probabilities) for C-ambiguity. If these conjectures are correct, then direct probability judgments can support them.

Experimental method
In most respects, this experiment is like Experiment A. We focus on the differences in what follows.
Protocol. N = 63 bachelor and master students (36 male, median age = 20.5, 40 Dutch) at Erasmus University, Rotterdam, the Netherlands participated, taken from an email list of students willing to participate in experiments on decision making. They were guaranteed a €15 flat participation fee. The experiment was conducted in six sessions of 10 or 11 subjects. 7 We first asked the subjects to answer binary choice questions, as in Experiment A, using the same software. We elicited CEs of the 18 prospects displayed in Table 4.
The order of the prospects was randomized for each subject.
Unlike in Experiment A, we also asked the subjects to give their judged beliefs for the prospects B13-B18. We randomly selected one prospect to check the stability of the answers. Two subjects gave erratic judged beliefs, showing lack of understanding, and were removed from the sample.

INSERT TABLE 4 ABOUT HERE
Stimuli. To elicit judged beliefs, we presented the subjects with a figure displaying a prospect (I-ambiguity or C-ambiguity) on the left-hand side of the screen.
On the right-hand side, there was an input box where subjects could type their best estimate (on a 0-1 scale) of the probability of losing €1000. As soon as the best estimate was entered, a pie appeared to visually represent the probability. Figure A.2 in the appendix displays an example of a screenshot.
Checking consistency. Unlike in Experiment A, in Experiment B we did not measure consistency by asking the subjects to repeat choices between some prospects and their expected value. Instead, we randomly selected two prospects and asked the subjects to go through the whole CE elicitation process again. As explained before, we also repeated the elicitation of one judged belief per subject.

Consistency checks
7 In this second experiment, unlike Experiment A, subjects were not individually interviewed by the experimenter. They could ask questions as often as needed. Instructions were in We find no difference between the two CEs elicited for consistency checks (p = 0.18, t 121 = -1.36). The repeated judged beliefs did not differ either (p = 0.49, t 60 = 0.69).

Risk attitudes from parametric fitting
We estimated the parameters of the utility function and the probability weighting function (Table 5) using the twelve risky prospects B1-B12 in Table 4. The estimate of β exceeds 1 (p < 0.01, t 60 = 4.36), indicating concave utility. This result is consistent with Experiment A.

INSERT TABLE 5 ABOUT HERE
As in Experiment A, the mean value of δ is smaller than 1 (p = 0.02, t 60 = −2.36), enhancing optimism. Unlike in Experiment A, the mean value of γ is not different from 1 here (t-test: p = 0.24, t 60 = 1.18). Experiment B's probability weighting function therefore is closer to linear, which is not uncommon for losses.

Ambiguity attitudes from parametric fitting using matching probabilities
We derived matching probabilities as in Experiment A. For I-ambiguity, the ainsensitivity index is positive (mean = 0.12, p < 0.01, t 60 = 3.64) and the ambiguity aversion index is positive but only marginally significant (mean = 0.03, p = 0.08, t 60 = 1.77). I-ambiguity therefore exhibits the same pattern as in Experiment A (i.e., Figure   4 is similar to Figs. 2a/2d). For C-ambiguity, both indexes do not differ from 0 (i.e., Fig 2e). The I-index of a-insensitivity exceeds the C-index (p < 0.01, t 60 = 3.31), but the I-index of ambiguity aversion is not larger (p = 0.28, t 60 = 1.09).

Fig. 4 is similar to
For I-ambiguity, we find overestimation for midpoint probability 0.1 (mean = 0.16, p < 0.01, t 60 = 3.30), and we find underestimation for midpoint probability 0.9 (mean = 0.86, p = 0.01, t 60 = −2.57). The matching probabilities for C-ambiguity do not deviate from the midpoint probabilities 0.1 and 0.9. They are only somewhat higher for midpoint probability 0.5 (mean = 0.54, p < 0.01, t 60 = 2.77). As in Experiment A, C-matching probabilities do not exhibit an inverse-S shape.
English, as is often done in the Netherlands where many students are non-Dutch.

INSERT FIGURE 4 ABOUT HERE
An ANOVA corrected for repeated measures and with two factors (the three probability levels and the two types of ambiguity) and their interaction reveals that, in addition to the probability level (p < 0.01, F 1.89 = 757.32), the interaction term is significant (p < 0.01, F 1.80 = 7.96). Using paired t-tests adjusted with the Bonferroni correction, we find m i (0.1) > m c (0.1) (p < 0.01, t 60 = -4.38) as in Experiment A. Figure 5 displays the mean values of the I-and C-judged beliefs as a function of midpoint probabilities. It shows that the judged beliefs never differ from the midpoint probabilities for I-ambiguity. However, they exceed the midpoint probability at probability 0.9 for C-ambiguity (mean = 0.91, p < 0.01, t 60 = 2.72). By t-tests, only the a-insensitivity index for C-ambiguity is marginally significant, and it is negative (mean = -0.01, p = 0.09, t 60 = 1.75), suggesting over-sensitivity. Apart from this, perceived probabilities seem to agree well with midpoint probabilities.

INSERT FIGURE 5 ABOUT HERE
We next compare matching probabilities with judged beliefs. Matching probabilities exhibit more insensitivity. The difference is significant for I-ambiguity (mean difference = 0.11, p < 0.01, t 60 = 3.44) but not for C-ambiguity (mean difference = 0.04, p = 0.12, t 60 = 1.58). An ANOVA with two factors (the elicitation technique-matching probabilities vs judged beliefs; and the types of ambiguity) and their interaction confirms the results of t-tests: the main effect of elicitation technique is significant (p < 0.01, F 0.1 = 8.39), like the source (p < 0.01, F 1 = 14.46) and the interactions term (p = 0.02, F 1 = 5.90). The same analysis on the ambiguity aversion indexes does not give significant results.

Summary of the results of Experiment B
The results of Experiment B are consistent with those of Experiment A for risk and ambiguity (matching probabilities) attitudes. Again, we find significant violations of expected utility for risk and we find the usual a-insensitivity for I-ambiguity but not for C-ambiguity. For I-ambiguity, the judged beliefs agree with midpoint probabilities, supporting the use of midpoint probabilities as levels of belief. This supports our claim in the discussion of Experiment A, under "evidence against the Bayesian model," that the weights in Figures 3 and 4 do not reflect (just) subjective probabilities, but that something else is going on: nonneutral attitudes towards ambiguity, deviating from expected utility.
The judged beliefs for C-ambiguity deviate from the midpoint probabilities. The judged belief at 0.9 exceeded 0.9, and the index of a-insensitivity was (marginally) below the neutral value 0. This extra sensitivity went against the usual a-insensitivity, and these two effects together gave the end result of no a-insensitivity for C- There are extra reasons for using hypothetical choice when studying the, empirically important, losses. First, implementing real losses by making the subjects lose their own money is ethically questionable and hard to implement. The common implementation of losses, with prior endowments from which subjects pay back, has another serious drawback. Many subjects will integrate the payments and will not perceive any losses. Even if, as some studies have found, this number of subjects is a minority, say it is one-third, then still this minority may generate large distortions in the experiment, large enough to be responsible for any significant effects found. Onethird of the subjects misperceiving the stimuli entails too big a distortion, and is too high a price to pay for implementing real incentives. Another drawback of prior endowments is that they may generate house money effects (Thaler and Johnson 1990).
Directly measuring matched probabilities. We measured matching probabilities m(E) indirectly from decision weights, through m(E) = w − 1 (W(E)). Matching probabilities can be inferred directly from equivalences (x q 0) ~ (x E 0). Substitution of Eqs. 1.1-1.3 then gives m(E) = q. We tried such direct measurements in pilots.
Unfortunately, many subjects would routinely take q equal to the midpoint probability, not as an expression of true preference but as an easy heuristic. For this reason, and because the measurement of w and W is useful anyhow, we decided not to measure matching probabilities directly.
Alternative statistics for measuring ambiguity: avoiding commitment to parametric families. We could have chosen several equivalent ways to report the results from the three sources and the comparisons between them. We first reported absolute results for risk. For uncertainty, we reported differences with risk, i.e. how uncertainty deviates from risk. Those differences are referred to as ambiguity, an important and popular topic in the literature today. Absolute results about decisions under uncertainty can now be derived indirectly, as when taking compositions of the curves in Figures 1 and 2. For example, we find some overweighting for risk, and some additional overweighting due to ambiguity for midpoint probability 0.1.
Together this means that there is considerable overweighting under uncertainty.
To analyze ambiguity, we could also have investigated differences of certainty equivalents of the risky and the uncertain prospects. This analysis is consistent with the one reported here and is in the appendix. Thus, our conclusions do not depend on the particular parametric families that we chose to fit the risk attitudes.
Formal difference with the source method of Abdellaoui et al. (2011). In Abdellaoui et al.'s (2011) source method, sources refer to different algebras, i.e. different collections of subsets (events) of the state space. We, instead, compared the same (algebra of) events under different informational circumstances. Although our sources are formally different, we can nevertheless readily use the same techniques of comparing ambiguity attitudes by comparing differences in subjective-probability weighting. Abdellaoui et al.'s (2011) analysis of source functions rather than matching probabilities was discussed in §1.
Specifying outcomes when measuring beliefs. We decided to specify outcomes in the elicitation of beliefs, to stay close to the other stimuli in the experiment. It implies that some subjects may have taken these questions as preference questions, directly measuring the matching probabilities. We obviously asked for judged likelihoods and not for preferences, hoping to trigger enough direct belief perceptions to obtain significant effects, which we did indeed obtain. The distortion due to misperception as decision question goes opposite to our findings and, hence, our significance inferences are not invalidated but are conservative.
The two-stage model. Several authors have studied a two-stage decomposition where J denotes directly-judged belief, and w;~ is the function carrying those judgments to decision weights ( 1999). Earlier allusions to the two-stage decompositions with directly-judged beliefs J are in Fellner (1961 p. 672) and Kahneman & Tversky (1975 pp. 14, 15). The twostage decomposition differs from our decomposition W(E) = w(m(E)) where m denotes matched probability and w is the risky weighting function. We can study the two-stage model in Experiment B, where we measured judged probabilities. For Cambiguity, w;~ of Eq. 4.1 then exhibits the usual inverse-S shape. This illustrates once more that our findings for C-ambiguity are due to beliefs deviating from midpoint probabilities rather than to unusual ambiguity attitudes.

Conclusion
This paper introduced modern ambiguity models into the empirical study of belief aggregation. These ambiguity models are descriptively more accurate than the common classical Bayesian models, and explain the violations of the latter that we found in belief aggregation. They allow the distinction of cognitive factors (ability to discriminate different levels of likelihood) and motivational factors (aversion to ambiguity), and the analysis of their separate effects. In between-agent uncertainty (conflict; C-ambiguity), extreme beliefs generate extra preference value (source preference) and, as regards cognitive effects, are overweighted in belief aggregation.
These phenomena do not occur in within-agent uncertainty (imprecision; Iambiguity). The latter is closer to traditional ambiguity. For C-ambiguity, the cognitive effects entail an empirical violation of the commonly assumed averaging of beliefs.
An implication of our findings for belief aggregation is that agents may want to overpresent their certainty. Hence, it is extra warranted for principals to implement incentive-compatible bonuses. An implication for modern theories of ambiguity is that identical (convex hulls of) possible priors can be treated differently by the same individual depending on the source of uncertainty. We conclude that modern ambiguity theory has allowed a more refined, and empirically more valid, analysis of belief aggregation in our paper than would have been possible using traditional theories.

Appendix. Experimental details and further results
This appendix gives experimental details and further results. We first describe details pertaining to both experiments.

A1. Experimental details for both experiments
Stimuli for eliciting certainty equivalents (CEs) Figure A.1 displays some stimuli.

INSERT FIGURE A.1 ABOUT HERE
To simplify the subjects' task, the screenshots for risky, I and C sources of uncertainty had exactly the same structure. Risky/I/C prospects (Option 1) were systematically displayed on the left-hand side and the sure loss (Option 2) was displayed on the right-hand side of the computer screen. Whatever the source of uncertainty, x (high loss) was assigned to purple and y (small loss) to yellow. There was no time pressure. Subjects were given the time they needed and were encouraged to think carefully about the questions. The software allowed the subjects to modify their answers if they wished, by going backward.

Procedure for eliciting certainty equivalents
We developed computerized bisection software to estimate the CEs. Bisection does not require subjects to state precise indifference values. It involves choices only, and generates more reliable data than direct matching (Bostic et al. 1990;Fischer et al. 1999;Noussair, Robbin, & Ruffieux 2004).
Each CE measurement started with a choice between the prospect considered and a sure loss equal to the expected value under the (midpoint) probability. A preference for the sure loss (the prospect) generated a higher (lower) sure loss in the next question. The new sure loss was the midpoint of the highest sure loss accepted up to that point (or the worst outcome of the prospect if no sure loss had been accepted) and the lowest sure loss rejected up to that point (or the best outcome of the prospect if no sure loss had been rejected).
Subjects were asked to make choices until a sure loss resulted with a precision of ±1% of the difference between the two outcomes of the prospect, and this sure loss was taken as the CE. Between three and seven choices were usually required to estimate CEs. The precision was implemented as the stopping rule of the bisection process. For instance, if Option 1 involved outcomes €0 and €1000, the program stopped when the subjects had rejected a sure loss, but had accepted another sure loss that was €20 lower. Table A.1 gives an example.

INSERT TABLE A.1 ABOUT HERE
This elicitation process prevented subjects from evaluating prospects higher (lower) than their best (worst) outcomes. It also ensured that a sure loss would not be rejected when a higher one had been already accepted (by not asking such a question). It did not preclude all violations of stochastic dominance. For example, subjects could assign a lower certainty equivalent to x p y than to x p' y despite p' > p.

A2. Experimental details for Experiment B
Stimuli for eliciting judged beliefs

Consistency checks
The main text reports consistency checks.

Certainty equivalents
In Tables A.2, A .3, and A.4, each line has 61 observations and the degree of freedom of each test is 60. EV means expected value and StD means standard deviation.

Risk, uncertainty, and ambiguity attitudes directly inferred from certainty equivalents
The main text reported tests of risk and ambiguity attitudes using the parameters estimated through parametric fitting. We can also analyze risk and ambiguity attitudes by directly comparing CEs with expected values and other CEs. These tests are consistent with the results in the main text, and are reported next.

Experiment A
To test risk attitudes, we compare CEs with expected values for the risky prospects A1-A10 using t-tests. We find that people are mainly risk averse for low probabilities (A1: p = 0.02) and risk seeking for moderate and high probabilities (A4-A10: p < 0.01). Risk neutrality cannot be rejected for the other two prospects. These results are consistent with the common findings for prospect theory for risk (Wakker 2010 §9.5). For the other prospects, both risk and ambiguity attitudes play a role. We use the term uncertainty attitude for the combination of the two. We use midpoint probabilities for I and C to obtain (analogs of) expected values.
We find ambiguity aversion for I-ambiguity in the sense that CE < EV for low probabilities (A11-A12: p < 0.01) and ambiguity seeking for moderate and high probabilities (A14, A15: p < 0.01). Neutrality cannot be rejected for A13. These results confirm the prediction of prospect theory that phenomena for risk similarly occur for ambiguity, even in an amplified manner (referenced in the main text). The latter prediction is less clearly confirmed for C-ambiguity. We only find ambiguity aversion for one low probability (A17: p = 0.04). We do find ambiguity seeking for moderate and high probabilities (A19, A20: p < 0.01). Ambiguity neutrality cannot be rejected for A16 and A18. These results show again that other factors play a role for C-ambiguity.
We can directly infer ambiguity attitudes from CEs by comparing the CEs elicited for I-or C-ambiguity (A-11-A15 and A16-A20) with those elicited under risk (A1-A5). Although this analysis will, obviously, not give exactly the same results and significance values as the analysis in the main text (which involves more datawe ignore A6-A10 here-and nonlinear transformations), most results, and all main phenomena described in the main text remain the same. First, an ANOVA corrected for repeated measures (with the Greenhouse-Geisser correction) with two factors (the five probability levels and the two types of ambiguity plus risk) confirms that CEs depend on the probability level (p < 0.01), which is obvious, but they also depend on the type of ambiguity or risk (p < 0.01). The interaction term is also significant (p < 0.01), confirming that the degree of ambiguity aversion depends on how likely the occurrence of the bad outcome is. These results are the same as those obtained for matching probabilities in the main text.
Second, we can use CEs to analyze overall ambiguity aversion, similarly as we Third, pairwise comparisons of CEs at each probability level are also instructive to observe the effect of a-insensitivity. Consistent with the a-insensitivity index being negative for C-ambiguity, CEs tend to be higher under C-ambiguity than under risk at r = 0.1 (p = 0.06) and lower at r = 0.9 (p < 0.01). This indeed reveals that CEs vary more under C-ambiguity than under risk when the midpoint probability varies, suggesting a-generated over-sensitivity for C-ambiguity. However, CEs vary less under I-ambiguity than under risk when probabilities vary: they are significantly lower at 0.1 (p = 0.01) and not significantly different at 0.9 (p = 0.72). Again, this is consistent with the positive a-insensitivity index that we obtain for I-ambiguity.
Finally, we report in the main text that m i (0.1) > m c (0.1) (p < 0.01) and m i (0.9) < m c (0.9) (p < 0.01). We similarly find that the CEs are lower under I-than under Cambiguity at 0.1 (p < 0.01) and higher under I-than under C-ambiguity at 0.9 (p = 0.04). This inversion, which appears both for matching probabilities and CEs, can be explained by more a-insensitivity under I-than under C-ambiguity (as we found using indexes for matching probabilities).

Experiment B
As with two factors (the five probability levels and the two types of ambiguity plus risk) shows that the probability factor and the interaction term are significant (p < 0.01).
The effect of the type of ambiguity or risk (not distinguishing probability levels; i.e., studying the main effect of the first factor of the ANOVA) does not differ from one type to the others (p > 0.4 for all pairwise comparisons). This is consistent with the finding that the ambiguity aversion index is not significantly different from 0 for C-ambiguity and only marginally significant for I-ambiguity.
In the main text, we report a-insensitivity for I-ambiguity and more ainsensitivity for I-ambiguity than in C-ambiguity. This is consistent with CEs in Iambiguity being significantly lower than CEs under risk and C-ambiguity at midpoint 0.1 (p = 0.04 and p < 0.01) and not significantly different at midpoint 0.9 (p = 1 and p = 0.4). For low likelihoods, our subjects indeed disliked the I-ambiguity prospect more than the other prospects, but this difference disappeared for higher likelihoods (and was even reversed, though not significantly, in CEs). Consequently, as for matching probabilities, the CEs at 0.1 and 0.9 tended to be closer to each other for Iambiguity than they were for C-ambiguity or risk. This property for matching probability is what we defined as a-insensitivity.
As a conclusion, for both Experiments A and Experiment B, all phenomena observed for matching probabilities are consistent with those derived from the CEs.
Therefore, the parametric family that we used to obtain matching probabilities does not influence the results. However, the CE analysis is more complex, for instance involving multiple, simultaneous comparisons to observe a-insensitivity. This is why we decided to report the analysis in terms of matching probability in the main text.

Risk attitude
In both Experiments A and B, the mean estimate of the parameter of the utility function significantly exceeds 1, indicating concavity of the utility function.
Although in the loss domain, individuals are supposed to mostly have convex utility functions (Tversky and Kahneman, 1992), experimental studies in the loss domain have reported mixed attitudes. Some studies have reported convex utility function at least at the individual level, while others have reported linear (Abdellaoui Bleichrodt and L'Haridon, 2008) and concave utility functions (Abdellaoui, Bleichrodt and Paraschiv, 2007;Fennema and Van Assen, 1998;Etchart-Vincent, 2004).
In both experiments, the probability weighting function exhibits some optimism. This result is consistent with other studies in the loss domain (Abdellaoui, 2000;Etchart-Vincent 2004). As regards sensitivity, the probability weighting function of Experiment A exhibits the usual inverse S-shape whereas the function is convex in Experiment B. INSERT