1 Introduction

There are many reasons we might want to take the opinions of various individuals and pool them to give the opinions of the group they constitute. They might be demographic modellers, and we wish to summarise their views for policymakers. Or they might be ice sheet modellers and we wish to pool the probabilities they assign to various future sea level scenarios in order to include these in our global climate models (Bamber & Aspinall, 2013; Bamber et al., 2019). We might be producing a textbook on the epidemiology of respiratory viruses, and we wish to present something that we might legitimately call the view of the scientific community (French, 1987, 2011). Or we might be the lead author on a scientific paper with many co-authors and we wish to ensure that the conclusions presented in the paper are genuinely those of the entire group of authors (Bright et al., 2017; Dang, 2019). Outside science, the individuals whose opinions we wish to pool might be employees of a company or institution whose collective opinion we wish to assess in order to determine liability for some harm, such as the board members of tobacco, oil, or social media companies, or the senior management of a university or a police force (Lackey, 2020). Or they might be superforecasters, renowned for the accuracy of their previous predictions of future political or sporting events, and we wish to learn what they, as a group, think about the outcome of a forthcoming election or the next World Cup (Tetlock & Gardner, 2015). And so on.

If all the individuals in the group have probabilistic opinions about the same propositions, there is a host of pooling functions we might deploy. For instance, linear pooling takes the group’s probability for a proposition to be the arithmetic mean of the probabilities that its members assign to that proposition. Or, to calculate the group’s probabilities for the possible states of the world, geometric pooling takes, for each state, the geometric mean of the probabilities that its members assign to that state, and then normalizes the results to ensure the pooled probabilities for the possible states sum to one. And so on. Each of these methods has its own desirable and undesirable features, which have been explored extensively (Genest & Zidek, 1986; Dietrich & List, 2015).

However, there are also cases where different members of the group assign probabilities to different sets of propositions, and these sets might overlap a lot, a little, or not at all. Indeed, unless the probabilities are elicited by asking the same roster of questions to each individual in the group, this is the situation we are most likely to encounter in the wild. For instance, if we glean the probabilities that academic experts assign by looking at what they report in their scholarly publications, we will find that they do not all report probabilities in the same propositions. One climate scientist might assign a probability to sea levels rising by at least 60 cm by 2100, but nothing more fine-grained, while another might assign probabilities to it rising by 60–80 cm, 80–100 cm, and more than 100 cm by that date. As they are usually formulated, most pooling functions don’t cover these cases; more precisely, they don’t tell us which credence the group assigns to a proposition to which some of its members fail to assign a credence. In this paper, I explore how we might fill that gap.

In Sect. 2, I’ll introduce the formal framework in which we’ll explore our problem. In Sects. 36, I’ll consider four proposals and argue that they don’t work. Some of these exist in the literature explicitly as an answer to our question; some exist as answers to different questions, but are naturally repurposed to address ours; and some simply occur to us naturally when we consider the question. Because so little has been written on this question, I begin with these four unsatisfactory proposals partly in order to clear the ground. But we will also see that, by doing so, an alternative proposal suggests itself. This is described in Sect. 8. It is designed to cover those situations in which our purpose in pooling the opinions of the individuals in the group is to assign an opinion to the group itself, considered as an agent in its own right.

2 The formal framework

Let me begin by laying out the formal framework we’ll be working within.

  • Individuals Let’s assume there are \(n \ge 2\) individuals whose opinions we wish to pool.

  • Propositions Let \(\mathcal {F}_i\) be the set of propositions to which individual i assigns subjective probabilities or degrees of belief, which we will call credences throughout. We might call \(\mathcal {F}_i\) their agenda. Let \(\mathcal {F}= \bigcup ^n_{i=1} \mathcal {F}_i\) be the union of all the individuals’ agendas. Throughout, we assume that each \(\mathcal {F}_i\) is finite, and therefore \(\mathcal {F}\) is finite too.

  • Possible states of the world Let \(\mathcal {W}\) be the set of possible worlds grained just finely enough to assign truth values to each proposition in \(\mathcal {F}\). We might represent \(\mathcal {W}\) as the set of classically consistent assignments of truth values to the propositions in \(\mathcal {F}\). Since each \(\mathcal {F}_i\) is finite and therefore \(\mathcal {F}\) is finite, \(\mathcal {W}\) is also finite. If a proposition X in \(\mathcal {F}\) is true at world w in \(\mathcal {W}\), we write \(w \models X\), and we represent X by the set \(\{w \in \mathcal {W}: w \models X\}\) of worlds at which it is true.

  • Subjective probabilities/credences Let \(P_i\) record the credences that individual i assigns to the propositions in \(\mathcal {F}_i\). We’ll call this their credence function. For X in \(\mathcal {F}_i\), \(P_i(X)\) is the credence that individual i assigns to X. It is at least 0 and at most 1. We assume that these credences functions are coherent: that is, if \(\mathcal {F}^+_i\) is the smallest Boolean algebra that includes \(\mathcal {F}_i\), then it is possible to extend \(P_i\) to a credence function \(P^+_i\) on \(\mathcal {F}^+_i\) that satisfies the probability axioms—that is, \(P^+_i\) assigns credence 1 to the tautology, 0 to the contradiction, and the credence it assigns to a disjunction of pairwise incompatible propositions is the sum of the credences it assigns to the disjuncts.

  • Pooling functions A pooling function \(\Delta \) takes a sequence of n credence functions, \(P_1, \ldots , P_n\), where \(P_i\) assigns credences to the propositions in \(\mathcal {F}_i\), and returns a credence function \(\Delta (P_1, \ldots , P_n)\), which assigns credences to the propositions in \(\mathcal {F}= \bigcup ^n_{i=1} \mathcal {F}_i\). In this definition, we don’t assume that a pooling function must give a coherent output for any sequence of coherent inputs, but this is a desirable feature and in fact nearly all the examples we consider boast it.

Existing accounts of probabilistic opinion pooling deal with the particular case in which \(\mathcal {F}_1 = \ldots = \mathcal {F}_n = \mathcal {F}\). They often also assume that \(\mathcal {F}\) is a Boolean algebra.Footnote 1 In such cases, for every world w in \(\mathcal {W}\), there is a proposition in \(\mathcal {F}\) that is true at w and only at w—these are sometimes called the atoms of the Boolean algebra \(\mathcal {F}\). We abuse notation and write w for that proposition. We can then define linear and geometric pooling as follows:

Linear pooling Suppose \(P_1, \ldots , P_n\) are defined on the same agenda \(\mathcal {F}\). Then, if X is in \(\mathcal {F}\), then

$$\begin{aligned} \Delta _{\mathrm {LP}}(P_1, \ldots , P_n)(X) = \frac{1}{n}\sum ^n_{i=1} P_i(X) \end{aligned}$$

That is, the credence that the linear pool of \(P_1, \ldots , P_n\) assigns to a possible world is the arithmetic mean of the credences that each \(P_i\) assigns to it.

Geometric pooling Suppose \(P_1, \ldots , P_n\) are defined on the same agenda \(\mathcal {F}\), which is a Boolean algebra. And suppose that there is w in \(\mathcal {W}\) such that, for each \(P_i\), \(P_i(w) > 0\). Then, if w is in \(\mathcal {W}\), then

$$\begin{aligned} \Delta _{\mathrm {GP}}(P_1, \ldots , P_n)(w) = \frac{\root n \of { \prod ^n_{i=1} P_i(w) }}{\sum _{w' \in \mathcal {W}} \root n \of { \prod ^n_{i=1} P_i(w') }} \end{aligned}$$

And, for X in \(\mathcal {F}\),

$$\begin{aligned} \Delta _{\mathrm {GP}}(P_1, \ldots , P_n)(X) = \sum _{w \models X} \Delta _{\mathrm {GP}}(P_1, \ldots , P_n)(w) \end{aligned}$$

That is, the credence that the geometric pool of \(P_1, \ldots , P_n\) assigns to a possible world (or, more precisely, the corresponding atom of the algebra) is the normalized geometric mean of the credences that each \(P_i\) assigns to it; and the credence it assigns to a proposition is the sum of the credences it assigns to the worlds at which the proposition is true (or, more precisely, the atoms that entail the proposition).

A couple of things to note:

  • Since we assume throughout that each \(P_i\) is coherent, so is their linear pool and so is their geometric pool. In fact, we needn’t even assume that each \(P_i\) is coherent in order to ensure that their geometric pool is coherent, but we do in order to ensure their linear pool is.

  • Linear pooling is defined directly for each proposition in \(\mathcal {F}\); as a result, we need not assume anything about the structure of \(\mathcal {F}\).

  • Geometric pooling is defined first for the states of the world in \(\mathcal {W}\), and then for each proposition in \(\mathcal {F}\); as a result, we must assume that \(\mathcal {F}\) contains the proposition w for each w in \(\mathcal {W}\).

In this paper, we ask: how should we pool in other cases? That is, how should we pool when two individuals have different agendas; that is, when \(\mathcal {F}_i \ne \mathcal {F}_j\) for some individuals i and j?

In the following four sections, I consider different answers to this question. None of them work. I consider them partly to situate my proposal within the literature and clear the ground, but also because solving the problem that rules out the first two proposals motivates the account that I will go on to give in the remainder of the paper. The third proposal also attempts to solve that problem. It fails for a different reason, but one that is equally illuminating. Those impatient to hear the solution I propose for a particular important case can skip to Sect. 8.

3 Extending linear and geometric pooling

As we saw in the previous section, linear and geometric pooling are only defined in the special case in which \(\mathcal {F}_1 = \ldots = \mathcal {F}_n = \mathcal {F}\); moreover, geometric pooling requires that \(\mathcal {F}\) is a Boolean algebra. But perhaps we might generalize them so that they apply when \(\mathcal {F}_i \ne \mathcal {F}_j\) for some individuals i and j?

For instance, suppose \(\{X, Y, Z\}\) is a three-cell partition, and suppose the first of two individuals assigns credences to X, Y, and Z, so that \(\mathcal {F}_1 = \{X, Y, Z\}\), while the second assigns credences only to X and Y, so that \(\mathcal {F}_2 = \{X, Y\}\). Suppose their probability assignments are as follows:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.1 &{} 0.4 &{} 0.5 \\ P_2 &{} 0.2 &{} 0.6 &{} - \end{array} \end{aligned}$$

Then extending linear pooling to this case and taking the arithmetic means of the credences assigned to each gives:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {LP}'}(P_1, P_2) &{} 0.15 &{} 0.5 &{} 0.5 \end{array} \end{aligned}$$

But that’s not coherent: the credences in X, Y, and Z sum to more than 1.

On the other hand, extending geometric pooling to this case and taking the geometric mean of the probabilities assigned to X, Y, and Z, and then normalizing, gives:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {GP}'}(P_1, P_2) &{} 0.125 &{} 0.433 &{} 0.442 \end{array} \end{aligned}$$

Obviously that is coherent, because geometric pooling requires us to normalise the geometric means; so the result will always be coherent.

Perhaps we should follow the lead of geometric pooling and do this for our extended version of linear pooling in such cases as well? So first we take the arithmetic means, and then we normalise the result. That would give:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {LP}''}(P_1, P_2) &{} 0.1304 &{} 0.4347 &{} 0.4347 \end{array} \end{aligned}$$

Unfortunately, both normalized extended linear pooling (\(\Delta _{\mathrm {LP}''}\)) and extended geometric pooling (\(\Delta _{\mathrm {GP}'}\)) violate a principle that I take to govern judgment pooling in the cases we are considering, where the agendas of some of our individuals differ.

Extension Invariance (EI) If, for each individual i, there is a unique coherent credence function \(P^\star _i\) defined on \(\mathcal {F}= \bigcup ^n_{i=1} \mathcal {F}_i\) that extends \(P_i\), then \(\Delta (P_1, \ldots , P_n) = \Delta (P^\star _1, \ldots , P^\star _n)\).Footnote 2

The point is well illustrated by the example we’ve been considering in this section. While \(P_2\) does not assign a credence to Z, it does assign credences to X and Y and together those determine the credence it would have to assign to Z in order to remain coherent—since X, Y, Z form a partition, it must assign 0.2. Extension Invariance (EI) says that, in cases like this, where the probabilities that an individual assigns to the propositions in \(\mathcal {F}_i\) determine the probabilities they must assign to the remaining propositions in \(\mathcal {F}\), the result of pooling the original probability assignments on \(\mathcal {F}_1, \ldots , \mathcal {F}_n\) should be the same as the result of pooling the probability functions on \(\mathcal {F}\) that are obtained by filling in the gaps in the way that coherence requires. The idea is that, if the credences you have reported commit you to further credences, then adding those further credences explicitly shouldn’t change the outcome of pooling your credences with the credences of others. We will offer a partial accuracy-based justification of the principle in Sect. 7 below.

Thus, return to our case above:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.3 &{} 0.4 &{} 0.3 \\ P_2 &{} 0.2 &{} 0.6 &{} - \\ P^\star _2 &{} 0.2 &{} 0.6 &{} 0.2 \end{array} \end{aligned}$$

So (EI) says that \(\Delta (P_1, P_2) = \Delta (P_1, P^\star _2)\). But notice that neither normalized extended linear pooling (\(\Delta _{\mathrm {LP}''}\)) nor extended geometric pooling (\(\Delta _{\mathrm {GP}'}\)) deliver this:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {LP}''}(P_1, P_2) &{} 0.1304 &{} 0.4347 &{} 0.4347 \\ \Delta _{\mathrm {LP}}(P_1, P^\star _2) &{} 0.15 &{} 0.5 &{} 0.35 \\ \hline \Delta _{\mathrm {GP}'}(P_1, P_2) &{} 0.125 &{} 0.433 &{} 0.442\\ \Delta _{\mathrm {GP}}(P_1, P^\star _2) &{} 0.149 &{} 0.517 &{} 0.333 \end{array} \end{aligned}$$

(EI) will cause problems for the proposal we consider in the following section as well. But before we move on to that, there is another problem with our attempt to extend linear and geometric pooling to the case in which \(\mathcal {F}_i \ne \mathcal {F}_j\) for some ij. Suppose \(\mathcal {F}_1 = \{X \vee Y\}\) and \(\mathcal {F}_2 = \{Y \vee Z\}\), where again X, Y, and Z form a partition. And suppose \(P_1\) assigns credences only to the proposition in \(\mathcal {F}_1\), while \(P_2\) assigns only to the proposition in \(\mathcal {F}_2\). In particular,

$$\begin{aligned} \begin{array}{c|cc} &{} X \vee Y &{} Y \vee Z \\ \hline P_1 &{} 0.2 &{} - \\ P_2 &{} - &{} 0.3 \\ \end{array} \end{aligned}$$

Now, first try to apply the extended linear pooling operator, \(\Delta _{\mathrm {LP}''}\). By averaging the credences in each proposition, we get:

$$\begin{aligned} \begin{array}{c|cc} &{} X \vee Y &{} Y \vee Z \\ \hline \Delta _{\mathrm {LP}'} &{} 0.2 &{} 0.3 \\ \end{array} \end{aligned}$$

But that is incoherent: the credences in \(X \vee Y\) and \(Y \vee Z\) must sum at least to 1. So now we need to normalize. But how to do this? To normalize a credence function, we need to know the credences it assigns to the possible worlds. But in this case, we don’t know that. So \(\Delta _{\mathrm {LP}''}(P_1, P_2)\) is undefined. And of course the same fate befalls \(\Delta _{\mathrm {GP}'}(P_1, P_2)\): indeed, it can’t even get started, since it is defined initially on possible worlds, and then only at the second stage on logically weaker propositions.

3.1 A concern about extension invariance

You might think it is not reasonable to demand that our pooling function satisfy (EI). After all, it is easy to imagine cases in which, were an individual i to assign credences to all the propositions in \(\mathcal {F}\), rather than merely those in \(\mathcal {F}_i\), the credences they would assign to the propositions in \(\mathcal {F}_i\) would be different from the ones they actually assign. Here are two reasons this might happen. First, it might be the sort of case discussed in the literature on awareness growth, in which the individual becomes aware of a possibility they hadn’t considered before and this leads them to reevaluate their opinions about the possibilities that they had considered before (Karni & Vierø, 2013; Wenmackers & Romeijn, 2016; Bradley, 2017; Steele & Stefánsson, 2021; Mahtani, 2021). For instance, I might assign credences only to the propositions It will rain tomorrow and It will be sunny tomorrow, and assign credence 50% to each, but then come to consider a third possibility, namely, It will be misty tomorrow, and that might lead me to reduce my credence in the original two propositions in order to assign some credence to this new one. Secondly, it might be a case in which, at the nearest world in which individual i has agenda \(\mathcal {F}\) rather than \(\mathcal {F}_i\), their evidence is different. For instance, consider the example from the introduction in which one climate scientist assigns a credence only to the possibility that sea levels will rise by at least 60cm by 2100, but nothing more fine-grained, while another assigns credences to it rising by 60-80cm, 80-100cm, and more than 100cm by that date. Now, it might be that it is only climate scientists who work specifically on sea level modelling who assign credences to these more fine-grained possibilities. And it might be that such modellers have substantially different evidence from others. So, let’s suppose that, at the actual world, the first climate scientist, who assigns credences only to the coarse-grained possibility, is not a sea level modeller. And now consider the nearest possible world in which they assign credences to the more fine-grained possibilities. In that world, they are a sea level modeller and so their evidence is very different from what it is in the actual world. And that might lead them, in that world, to assign different credences to the coarse-grained possibility.

These situations are indeed possible. However, (EI) holds of the individuals in them all the same. After all, it is not justified by saying that, whenever there is, for each individual i, a unique coherent \(P^\star _i\) defined on \(\mathcal {F}\) that extends \(P_i\), this \(P^\star _i\) gives the credences that the individual i would assign were there agenda \(\mathcal {F}\) instead of \(\mathcal {F}_i\). It is not a counterfactual claim at all. Rather, as I sketched the justification above, (EI) is justified by noting that the credences that \(P_i\) assigns to the propositions in \(\mathcal {F}_i\) commit individual i to the credences that \(P^\star _i\) assigns to the propositions in \(\mathcal {F}\). So it does not say that a pooling function should give the same result whether applied to the individuals’ actual credence functions or the credence functions they would have were they all to have \(\mathcal {F}\) as their agenda; it says that a pooling function should give the same result whether applied to the individuals’ actual credence functions or to the credence functions on \(\mathcal {F}\) to which their actual credences commit them.

Notice that this justification for (EI) applies equally whether we use our pooling function to provide what Christian List (2014) calls aggregate or corporate group opinions. In List’s terminology, an aggregate collective attitude provides a summary of the attitudes of the members of the collective, while a corporate collective attitude treats the group as an agent in its own right and ascribes to that agent the attitude in question. When we determine the sort of summary that is encoded in an aggregate group opinion, we surely wish to include not only the credences that the individuals have explicitly, but also those to which they are committed by those they have explicitly. And we surely do not wish to include the opinions they would have had in some nearby possible world in which they do explicitly assign credences to these other propositions. After all, we are summarising the group’s actual opinions, not their counterfactual ones. And when we treat the group as an agent, we want to include in the supervenience basis for that group agent’s opinions not only the credences the members have explicitly, but also those to which they are committed.

We will return to (EI) below. So far, we have appealed only to its intuitive plausibility. In Sect. 7, we will compare the accuracy of the credences you obtain if you use it with the accuracy of the credences you obtain if you violate it in various ways.

4 The coherent approximation principle

In Sect. 3, we saw that it is difficult to extend linear and geometric pooling so that they apply to the problem of pooling credence functions defined on different agendas—that is, when \(\mathcal {F}_i \ne \mathcal {F}_j\) for some ij. In this section, we turn to one of the few treatments of the current problem from the literature. It is due to Daniel Osherson and Moshe Vardi (Osherson & Vardi, 2006).

In fact, Osherson and Vardi treat two problems at once. Not only do they not assume that the individuals to be pooled assign credences to the same propositions; they also do not assume that those individuals assign coherent credences. So they seek a pooling function that takes possibly incoherent credence functions over possibly different agendas and pools them into a coherent credence function on the union of the agendas. Their approach, which draws on the pioneering work of Sébastien Konieczny and Ramón Pino Pérez, is distance-based (Konieczny & Pino Pérez, 1998, 1999). That is, we begin by identifying a measure of distance from one credence to another. We then take the pool of a set of credence functions to be the credence function for which the sum of the sum of the distances from the credences that it assigns to the credences that the individuals assign is minimal. Osherson and Vardi consider two such measures of distance:

Absolute deviation For credences \(0 \le p, q \le 1\),

$$\begin{aligned} \mathrm {AD}(p, q) = |p - q| \end{aligned}$$

Squared deviation For credences \(0 \le p, q \le 1\),

$$\begin{aligned} \mathrm {SD}(p, q) = |p - q|^2 \end{aligned}$$

And there are many others, including the popular Kullback-Leibler divergence:

Kullback-Leibler divergence For credences \(0 \le p\le 1\) and \(0 < q \le 1\),

$$\begin{aligned} \mathrm {KL}(p, q) = p \log \frac{p}{q} - p + q \end{aligned}$$

We say that a measure \(\mathfrak {d}\) of distance from one credence to another is a divergence if (i) \(\mathfrak {d}(p, q) \ge 0\) for all \(0 \le p, q \le 1\) and (ii) \(\mathfrak {d}(p, q) = 0\) iff \(p = q\). \(\mathrm {AD}\), \(\mathrm {SD}\), and \(\mathrm {KL}\) are all divergences. Now, given a divergence \(\mathfrak {d}\), here is Osherson and Vardi’s pooling function, where \(\mathcal {P}_\mathcal {F}\) is the set of coherent credence functions on \(\mathcal {F}= \bigcup ^n_{i=1} \mathcal {F}_i\):

Coherent Approximation Principle\(_\mathfrak {d}\) (CAP\(_\mathfrak {d}\)) For \(P_i\) defined on \(\mathcal {F}_i\),

$$\begin{aligned} \Delta ^\mathfrak {d}_\mathrm {CAP}(P_1, \ldots , P_n) = \mathop {\mathrm {arg\,inf}}\limits _{P \in \mathcal {P}_\mathcal {F}} \sum ^n_{i=1} \sum _{X \in \mathcal {F}_i} \mathfrak {d}(P(X), P_i(X)) \end{aligned}$$

That is, \(\Delta ^\mathfrak {d}_\mathrm {CAP}(P_1, \ldots , P_n)\) is the coherent credence function for which the sum of the sums of the divergences from its credences to the credences assigned by \(P_1, \ldots , P_n\) is minimal.Footnote 3

In fact, if we wish the minimizer to be unique here, we must restrict the divergences that we use. For instance, recall our example from the previous section:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.1 &{} 0.4 &{} 0.5 \\ P_2 &{} 0.2 &{} 0.6 &{} - \\ P^\star _2 &{} 0.2 &{} 0.6 &{} 0.2 \end{array} \end{aligned}$$

Then, if we use the absolute deviation to measure the distance from one credence to another—that is, if \(\mathfrak {d}= \mathrm {AD}\)—then, providing \(0.1 \le P(X) \le 0.2\), \(0.4 \le P(Y) \le 0.6\), and \(0.2 \le P(Z) \le 0.5\), P minimises the average distance to \(P_1\) and \(P^\star _2\). Presumably for this reason, when Osherson writes about CAP again with different co-authors, they focus on squared deviation (Predd et al., 2008). We’ll focus on squared deviation and Kullback-Leibler divergence for the moment.Footnote 4 Here are the results of pooling \(P_1\) and \(P_2\) using \(\Delta ^\mathrm {SD}_\mathrm {CAP}\) and using \(\Delta ^\mathrm {KL}_\mathrm {CAP}\), and the results of pooling \(P_1\) and \(P^\star _2\) using \(\Delta ^\mathrm {SD}_\mathrm {CAP}\) and using \(\Delta ^\mathrm {KL}_\mathrm {CAP}\).

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2) &{} 0.1125 &{} 0.4625 &{} 0.425 \\ \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2) &{}0.15 &{} 0.5 &{} 0.35 \\ \hline \Delta ^\mathrm {KL}_\mathrm {CAP}(P_1, P_2) &{} 0.13 &{} 0.45 &{} 0.42 \\ \Delta ^\mathrm {KL}_\mathrm {CAP}(P_1, P^\star _2) &{} 0.15 &{} 0.52 &{} 0.33 \\ \end{array} \end{aligned}$$

Since the first and second row differ, \(\Delta ^\mathrm {SD}_\mathrm {CAP}\) violates (EI); since the third and fourth row differ, \(\Delta ^\mathrm {KL}_\mathrm {CAP}\) violates (EI).

Now, you might try to save the Coherent Approximation Principle in one of two ways. First, you might seek a divergence \(\mathfrak {d}\) for which \(\Delta ^\mathfrak {d}_\mathrm {CAP}\) satisfies (EI). However, the following fact shows that this is impossible:

Proposition 1

If \(\mathfrak {d}\) is differentiable in its first argument, \(\Delta ^\mathfrak {d}_\mathrm {CAP}\) violates (EI).

(The proof is given in the Appendix.)

Second, you might think that the problem arises because the single credence assigned to Z is given exactly as much weight as the two credences assigned to X and the two credences assigned to Y. But it’s easy to check that assigning twice as much weight to \(\mathfrak {d}(P(Z), P_1(Z))\) as to \(\mathfrak {d}(P(X), P_1(X))\) or \(\mathfrak {d}(P(Y), P_1(Y))\) doesn’t bring the Coherent Approximation Principle into agreement with (EI). For instance,

$$\begin{aligned} ( \mathrm {SD}(P(X), 0.1)&+ \mathrm {SD}(P(X), 0.2)) + (\mathrm {SD}(P(Y), 0.4) \\&+ \mathrm {SD}(P(Y), 0.6)) + 2 \times \mathrm {SD}(P(Z), 0.5) \end{aligned}$$

is minimized among coherent functions at \(P = (0.1, 0.45, 0.45)\), while

$$\begin{aligned}&( \mathrm {SD}(P(X), 0.1) + \mathrm {SD}(P(X), 0.2)) + \\&\quad (\mathrm {SD}(P(Y), 0.4) + \mathrm {SD}(P(Y), 0.6)) + \\&\quad (\mathrm {SD}(P(Z), 0.5) + \mathrm {SD}(P(Z), 0.2)) \end{aligned}$$

is minimized among coherent credence functions at \(P = (0.15, 0.5, 0.35)\).

5 Pooling the sets of coherent credence functions that extend the individuals’ credence functions

Like the Coherent Approximation Principle, the third proposal we’ll consider asks us to pool by minimizing the average distance from some representation of the individuals’ opinions. But whereas CAP represents individual i by the precise credences they explicitly assign to the propositions in \(\mathcal {F}_i\), the third proposal represents them by the imprecise credences they assign to the propositions in \(\mathcal {F}\). That is, instead of representing individual i by the single credence function \(P_i\) defined on \(\mathcal {F}_i\), we represent them by the following set of credence functions defined on \(\mathcal {F}\):

$$\begin{aligned} R_i = \{P \in \mathcal {P}_\mathcal {F}\, |\, (\forall X \in \mathcal {F}_i)[P(X) = P_i(X)]\} \end{aligned}$$

where \(\mathcal {P}_\mathcal {F}\) is the set of coherent credence functions defined on \(\mathcal {F}\), as above. So, \(R_i\) is the set of coherent extensions of \(P_i\) to \(\mathcal {F}\). And we pool \(P_1, \ldots , P_n\) by pooling \(R_1, \ldots , R_n\). And we pool \(R_1, \ldots , R_n\) by finding a credence function P that minimizes the average distance from P to the \(R_i\)s. Now, there are two natural definitions of the distance from P to \(R_i\). On the first, it is the minimum distance between P and a member of \(R_i\); on the second, it is the maximum distance between P and a member of \(R_i\). I’ll consider both.

For many divergences and many \(P_1, \ldots , P_n\), these minimization problems will have a unique solution. In that case, we use the first definition of distance and define:

$$\begin{aligned} \Delta _{\mathrm {MW}}^{\mathfrak {d}, \inf }(P_1, \ldots , P_n) = \mathop {\mathrm {arg\,inf}}\limits _{P \in \mathcal {P}_\mathcal {F}} \sum ^n_{i=1}\left( \inf _{Q \in R_i} \sum _{X \in \mathcal {F}} \mathfrak {d}(P(X), Q(X)) \right) \end{aligned}$$

And we use the second definition of distance and define:

$$\begin{aligned} \Delta _{\mathrm {MW}}^{\mathfrak {d}, \sup }(P_1, \ldots , P_n) = \mathop {\mathrm {arg\,inf}}\limits _{P \in \mathcal {P}_\mathcal {F}} \sum ^n_{i=1}\left( \sup _{Q \in R_i} \sum _{X \in \mathcal {F}} \mathfrak {d}(P(X), Q(X)) \right) \end{aligned}$$

I use the subscript ‘MW’ for these pooling functions because this general method for combining sets of probability functions is proposed by Martin Adamčík and George Wilmers (Adamčík & Wilmers, 2014; Wilmers, 2015). Seamus Bradley (2019) criticizes it as a pooling function for sets of probability functions that represent uncertainty in the imprecise credence framework. But his criticisms are less worrying when it is used to pool sets of probability functions that represent gaps in credal reporting, as we do here, so I won’t repeat them.

It is easy to see that these two pooling functions will satisfy (EI). After all, if there is a unique coherent credence function \(P^\star _i\), defined on \(\mathcal {F}\), that extends \(P_i\), which is defined on \(\mathcal {F}_i\), then the set of coherent probability functions that extends \(P_i\) is the same as the set of coherent probability functions that extends \(P^\star _i\)—both contain only \(P^\star _i\). That is:

$$ \begin{aligned} R_i&= \{P : \mathcal {F}\rightarrow [0, 1]\, |\, P \in \mathcal {P}_\mathcal {F}\ \& \ (\forall X \in \mathcal {F}_i)[P(X) = P_i(X)]\} = \\ \{P^\star _i\}&= \{P : \mathcal {F}\rightarrow [0, 1]\, |\, P \in \mathcal {P}_\mathcal {F}\ \& \ (\forall X \in \mathcal {F}_i)[P(X) = P^\star _i(X)]\} = R^\star _i \end{aligned}$$

So these proposals do not suffer from the same problem as the previous two. But they do face a problem: they give implausible answers in reasonably straightforward cases. For instance, suppose \(\mathcal {F}= \{X, Y, Z\}\), where X, Y, and Z form a partition, and \(\mathcal {F}_1 = \{X\}\) and \(\mathcal {F}_2 = \{Y\}\). And suppose

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.8 &{} - &{} - \\ P_2 &{} - &{} 0.8 &{} - \\ \end{array} \end{aligned}$$

So:

  • \(R_1 = \{P \in \mathcal {P}_\mathcal {F}: P(X) = 0.8\}\)

  • \(R_2 = \{P \in \mathcal {P}_\mathcal {F}: P(Y) = 0.8\}\)

Fig. 1
figure 1

The barycentric plot of the 2-simplex with (1, 0, 0) at bottom left, (0, 1, 0) at bottom right, and (0, 0, 1) at the top. The dotted lines represent \(R_1\) and \(R_2\), respectively. And the result of applying \(\Delta ^{\mathfrak {d}, \inf }_\mathrm {MW}\) and \(\Delta ^{\mathfrak {d}, \sup }_\mathrm {MW}\) to \(P_1\) and \(P_2\) is plotted

We can illustrate these two sets of probabilities by plotting them within the three-dimensional simplex on a barycentric plot (see Fig. 1). The problem is that, if \(\mathfrak {d}\) is squared deviation (\(\mathrm {SD}\)) or Kullback-Leibler divergence (\(\mathrm {KL}\)), then \(\Delta ^{\mathfrak {d}, \inf }_\mathrm {CAP}(P_1, P_2)\) and \(\Delta ^{\mathfrak {d}, \sup }_\mathrm {CAP}(P_1, P_2)\) are as follows:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{MW}^{\mathfrak {d}, \inf }(P_1, P_2) &{} 0.5 &{} 0.5 &{} 0 \\ &{}&{}&{}\\ \Delta _{MW}^{\mathfrak {d}, \sup }(P_1, P_2) &{} 0.4 &{} 0.4 &{} 0.2 \end{array} \end{aligned}$$

These are plotted on the simplex as well. The problem here is that both seem too extreme. \(\Delta _{MW}^{\mathfrak {d}, \inf }(P_1, P_2)\) assigns credence 0 to Z, even though nothing in the opinions of either agent forces that. It is the same pool we would obtain if both agents were to assign credence 0 to Z and fill in Y in such a way that they remained coherent. That is,

$$\begin{aligned} \Delta _{\mathrm {MW}}^{\mathfrak {d}, \inf }(P_1, P_2) = \Delta ^\mathfrak {d}_\mathrm {MW}(P^\circ _1, P^\circ _2) \end{aligned}$$

where

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P^\circ _1 &{} 0.8 &{} 0.2 &{} 0 \\ P^\circ _2 &{} 0.2 &{} 0.8 &{} 0 \end{array} \end{aligned}$$

And \(\Delta _{MW}^{\mathfrak {d}, \sup }(P_1, P_2)\) assigns credence 0.2 to Z, even though nothing in the opinions of either agent forces that. It is the same pool we would obtain if agent 1 were to assign credence 0 to Y and fill in Z in such a way that they remain coherent, and agent 2 were to assign credence 0 to X and fill in Z in such a way that they remain coherent. That is,

$$\begin{aligned} \Delta _{\mathrm {MW}}^{\mathfrak {d}, \sup }(P_1, P_2) = \Delta ^\mathfrak {d}_\mathrm {MW}(P^\dag _1, P^\dag _2) \end{aligned}$$

where

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P^\dag _1 &{} 0.8 &{} 0 &{} 0.2 \\ P^\dag _2 &{} 0 &{} 0.8 &{} 0.2 \end{array} \end{aligned}$$

6 Maximising entropy within the set of possible pools

Here’s another proposal that arises naturally. Let

$$\begin{aligned} R_{LP} = \{\Delta _{\mathrm {LP}}(P'_1, \ldots , P'_n) : P'_1 \in R_1, \ldots , P'_n \in R_n\} \end{aligned}$$

That is, \(R_{LP}\) is the set of linear pools of coherent extensions of the individuals’ credence functions. Then let the pool of \(P_1, \ldots , P_n\) be the credence function in \(R_{LP}\) with maximum entropy.Footnote 5 First, define the Shannon entropy of a probability function P defined over a set \(\mathcal {W}\) of possible worlds as follows (Shannon, 1948):

$$\begin{aligned} H(P) = -\sum _{w \in \mathcal {W}} P(w) \log P(w) \end{aligned}$$

Then let

$$\begin{aligned} \Delta _{\mathrm {ME}}^{\mathrm {LP}}(P_1, \ldots , P_n) := \mathop {\mathrm {arg\,max}}\limits _{P \in R_{LP}} H(P) \end{aligned}$$

The problem with this approach is that it gives the same implausible answer as \(\Delta ^{\mathfrak {d}, \sup }_{\mathrm {MW}}\) gave in the case we considered in the previous section. That is, if

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.8 &{} - &{} - \\ P_2 &{} - &{} 0.8 &{} - \\ \end{array} \end{aligned}$$

Then

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {ME}}^{\mathrm {LP}}(P_1, P_2) &{} 0.4 &{} 0.4 &{} 0.2 \\ \end{array} \end{aligned}$$

Again, we illustrate this in a barycentric plot—see Fig. 2.

Fig. 2
figure 2

The barycentric plot of the 2-simplex with (1, 0, 0) at bottom left, (0, 1, 0) at bottom right, and (0, 0, 1) at the top. The dotted lines represent \(R_1\) and \(R_2\), respectively. And the result of applying \(\Delta ^\mathrm {LP}_\mathrm {ME}\) to \(P_1\) and \(P_2\) is plotted

7 Extension invariance and the accuracy of pooling functions

In Sects. 3 and 4, we criticized the extensions of linear and geometric pooling, \(\Delta _{\mathrm {LP}''}\) and \(\Delta _{\mathrm {GP}'}\), and the Coherent Approximation Principle, \(\Delta ^\mathfrak {d}_\mathrm {CAP}\), because they both violate (EI), the principle that says that, when there’s a unique coherent extension of each credence function to the full algebra, pooling those extensions should give the same result as pooling the original credence functions. At that point, I merely appealed to the intuitive force of (EI); I gave no further argument in its favour. But there is something to be said for pooling functions that satisfy it, at least when they are compared with CAP.

Let’s begin with a slight adaptation of the simple example from above:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.1 &{} 0.4 &{} - \\ P_2 &{} 0.2 &{} 0.6 &{} - \\ P^\star _1 &{} 0.1 &{} 0.4 &{} 0.5 \\ P^\star _2 &{} 0.2 &{} 0.6 &{} 0.2 \end{array} \end{aligned}$$

(EI) says that pooling \(P_1\) and \(P_2\) should give the same result as pooling \(P_1\) and \(P^\star _2\), which should give the same result as pooling \(P^\star _1\) and \(P_2\), which should give the same result as pooling \(P^\star _1\) and \(P^\star _2\). That is, if \(\Delta \) is our pooling function,

$$\begin{aligned} \Delta (P_1, P_2) = \Delta (P_1, P^\star _2) = \Delta (P^\star _1, P_2) = \Delta (P^\star _1, P^\star _2) \end{aligned}$$

But let’s apply CAP using the squared deviation:

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline &{} &{} &{} \\ \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2) &{} \frac{12}{80} &{} \frac{40}{80} &{} \frac{28}{80} \\ &{} &{} &{} \\ \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2) &{} \frac{15}{80} &{} \frac{43}{80} &{} \frac{22}{80} \\ &{} &{} &{} \\ \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2) &{} \frac{9}{80} &{} \frac{37}{80} &{} \frac{34}{80} \\ &{} &{} &{} \\ \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2) &{}\frac{12}{80} &{} \frac{40}{80} &{} \frac{28}{80} \end{array} \end{aligned}$$

Now, notice that \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2)\) is the same as \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2)\), and both are the midpoint between \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\) and \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2)\). That is,

\(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2) = \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2) = \)

     \(\frac{1}{2}(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2)) = \)

            \(\frac{1}{4} (\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2))\)

That is, when we include the credal assignment to Z that \(P_1\) determines, but not the assignment that \(P_2\) determines, \(\Delta ^\mathrm {SD}_\mathrm {CAP}\) pulls the pool towards \(P_1\) and away from \(P_2\); and, mutatis mutandis, if we include the credal assignment to Z that \(P_2\) determines, but not the one that \(P_1\) determines. And, moreover, the pull is the same but in opposite directions in the two cases. So, when we average them, we obtain what we would have obtained if we’d left out both assignments to Z (and pooled \(P_1\) and \(P_2\)) or if we’d included both assignments to Z (and pooled \(P^\star _1\) and \(P^\star _2\)).

What does this tell us? Well, suppose our favoured pooling function for those cases in which all individuals have the same agenda is linear pooling; and suppose we extend that pooling function in line with (EI). Then we can say that following in favour of our approach and against CAP. First, we note the following corollary of the Diversity Prediction Theorem (Galton, 1907; Page, 2007):

Theorem 2

For any \(\mathcal {F}\) and credence functions \(Q, Q_1, \ldots , Q_n\) defined on \(\mathcal {F}\),

$$\begin{aligned} \sum _{X \in \mathcal {F}} \mathrm {SD}(\Delta _\mathrm {LP}(Q_1, \ldots , Q_n)(X), Q(X) ) < \frac{1}{n}\sum ^n_{i=1} \sum _{X \in \mathcal {F}} \mathrm {SD}(Q_i(X), Q(X)) \end{aligned}$$

This says that, for any credence function Q and any set of credence functions \(Q_1, \ldots , Q_n\) all defined on the same set of propositions, the distance of the linear pool of \(Q_1, \ldots , Q_n\) from Q is always less than the average distance of the \(Q_i\)s from Q, when the distance between credences is measured using squared deviation.Footnote 6

How does this help? Well, given a possible world w, let \(V_w\) be the credence function that assigns maximal credence to all propositions that are true at w and minimal credence to all propositions that are false at w: that is, \(V_w(X) = 1\) if X is true at w, and \(V_w(X) = 0\) if X is false at w. We might call \(V_w\) the omniscient credence function. It is natural to say that the ideal credence function for an individual to have at a world is the omniscient credence function at that world, and that a credence function is more accurate the closer it lies to that omniscient credence function. So we might say that the inaccuracy of a credence function \(Q_i\) at world w is the sum of the squared deviations between the credences it assigns and the credences that \(V_w\) assigns: we call this the Brier score of inaccuracy. So, if P is defined on \(\mathcal {F}\),

$$\begin{aligned} \mathfrak {B}(P, w) = \sum _{X \in \mathcal {F}} (P(X) - V_w(X))^2 \end{aligned}$$

And we might think that a credence function is doing better, epistemically speaking, the greater its inaccuracy and the lower its Brier score. That is, P is better than Q at w just in case \(\mathfrak {B}(P, w) < \mathfrak {B}(Q, w)\) (Brier, 1950; Rosenkrantz, 1981; Pettigrew, 2016). Then, by Theorem 2,

Corollary 3

For any \(\mathcal {F}\), any world w, and any credence functions \(Q_1, \ldots , Q_n\) defined on \(\mathcal {F}\),

$$\begin{aligned} \mathfrak {B}(\Delta _\mathrm {LP}(Q_1, \ldots , Q_n), w ) < \frac{1}{n} \sum ^n_{i=1} \mathfrak {B}\left( Q_i, w \right) \end{aligned}$$

That is, the inaccuracy of the linear pool of \(Q_1, \ldots , Q_n\) is less than the average inaccuracy of the \(Q_i\)s.

Now, recall that the linear pool of \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\), and \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2)\) is just \(\Delta _\mathrm {LP}(P^\star _1, P^\star _2)\). Then it follows that, for any world, the inaccuracy of \(\Delta _\mathrm {LP}(P^\star _1, P^\star _2)\) at that world is less than the average inaccuracy of \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\), and \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2)\) at that world. That is,

\(\mathfrak {B}\left( \Delta _\mathrm {LP}(P^\star _1, P^\star _2), w \right) < \)

        \(\frac{1}{4} \left( \mathfrak {B}\left( \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2), w \right) + \mathfrak {B}\left( \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2), w \right) + \right. \)

               \(\left. \mathfrak {B}\left( \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2), w \right) + \mathfrak {B}\left( \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2), w \right) \right) \)

So pooling in line with (EI) is more accurate than pooling in line with CAP, at least in expectation and if you are equally likely to find yourself pooling \(P_1\) and \(P_2\) as you are to find yourself pooling \(P_1\) and \(P^\star _2\), or \(P^\star _1\) and \(P_2\), or \(P^\star _1\) and \(P^\star _2\).

Does this generalise beyond the specific case of \(P_1\) and \(P_2\)? Yes, as the following theorem shows:

Theorem 4

Suppose \(\mathcal {F}, \mathcal {F}'\) are two sets of propositions and \(\mathcal {F}' \subseteq \mathcal {F}\). Suppose \(P_1\) is a credence function on \(\mathcal {F}'\) and \(P^\star _1\) is the unique coherent extension of \(P_1\) to \(\mathcal {F}\); and suppose \(P_2\) is a credence function on \(\mathcal {F}'\) and \(P^\star _2\) is the unique coherent extension of \(P_2\) to \(\mathcal {F}\). Then

$$\begin{aligned}&\Delta _\mathrm {LP}(P^\star _1, P^\star _2) \\&\quad = \frac{1}{4}\left( \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P_2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P^\star _2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P^\star _1, P_2) + \Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2) \right) \end{aligned}$$

Now, suppose you enter a pooling task knowing only that the individuals will assign credences either to the propositions in \(\mathcal {F}'\) or to the propositions in \(\mathcal {F}\), where \(\mathcal {F}' \subseteq \mathcal {F}\). Then, if we assume that there is no correlation between the particular credences the individuals assign and whether they assign them only to the propositions in \(\mathcal {F}'\) or to the propositions in \(\mathcal {F}\), then it is as likely that the group whose opinions you wish to pool consists of \(P_1\) and \(P^\star _2\) as it is that it will consist of \(P^\star _1\) and \(P_2\), and as likely that it consists of \(P_1\) and \(P_2\), and as likely that it consists of \(P^\star _1\) and \(P^\star _2\). And if that’s right then the expected inaccuracy of using a rule that respects (EI) and thus sets \(\Delta (P^\star _1, P_2) = \Delta (P_1, P^\star _2) = \Delta _{\mathrm {LP}}(P^\star _1, P^\star _2)\) is lower than the expected inaccuracy of using CAP, which will give each of \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\), \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\), and \(\Delta ^\mathrm {SD}_\mathrm {CAP}(P_1, P^\star _2)\) a probability of 25%.

8 Beyond extension invariance

Extension Invariance (EI) tells us how our pooling function should work when, for each individual i, there is a unique coherent extension \(P^\star _i\) of \(P_i\) from \(\mathcal {F}_i\) to \(\mathcal {F}\). In such a case, (EI) tells us, you pick the pooling function you favour for those cases in which all the credence functions to be pooled are defined on the same set of propositions, and you apply it to the extended credence functions \(P^\star _1, \ldots , P^\star _n\), which are all defined on \(\mathcal {F}\). As it stands, however, (EI) does not tell us how to proceed when, for some individual i, there is more than one coherent credence function that extends \(P_i\) from \(\mathcal {F}_i\) to \(\mathcal {F}\). In this final section, I consider an important sort of case in which we face this problem and propose a solution for that case.

In the case I want to consider, we pool credences in order to give what Christian List (2014) calls a corporate collective attitude. Recall from above: in List’s terminology, when we ascribe a corporate collective attitude, we assert first that the group counts as an agent in its own right and second that this group agent has the attitude in question. From the examples given in the introduction, these are the ones I envisage falling into this category: the epidemiologists of viruses whose views we wish to present as the view of the scientific community in our textbooks; the co-authors on a multi-authored scientific paper whose group view as a collective author we wish to present to the scientific community; and the employees of a company or institution whose collective view we wish to identify in order to assess liability for some harm. In all three cases, the group agents play a role in some normative enterprise. In the first two, it is the enterprise of science, which has norms that govern the assertions included in textbooks and scientific papers. In the third, it is the legal system, and there are norms here that govern the beliefs we ascribe to an individual whose liability for some harm we are assessing.

While these normative enterprises and the roles within them that the groups play are quite different, I will argue that a similar norm governs how we should pool the credences of the individuals in such a group to give the group’s credences. It is a conservative norm. It says that we should first pick, for each individual i, a particular credence function \(P^\star _i\) that extends \(P_i\) from \(\mathcal {F}_i\) to \(\mathcal {F}\); in particular, it says that we should pick \(P^\star _i\) in the most conservative and unopinionated way possible; that is, we should introduce as little in the way of further opinions as we can when we extend; and then, second, we should aggregate these extended credence functions using whatever pooling function we favour for those cases in which all credence functions are defined on the same agenda—perhaps linear pooling, perhaps geometric pooling, perhaps something else.

Why is this the appropriate norm in the scientific case? In particular, why does the norm require us to extend \(P_i\) to \(P^\star _i\) in the most conservative way possible. In fact, I think there are two reasons. The first reason is the duty of the textbook’s author or the paper’s lead author to represent fairly the views of the individuals on behalf of whom they write. The textbook’s author presents the views of that part of the scientific community; the lead author presents the views of their fellow co-authors. In both cases, they have a duty not to impute to those individuals any further opinions beyond what is necessary to extend their credences to the full set \(\mathcal {F}\). The second reason is the duty of scientific authors to their audience. Now, I don’t think it is the duty of each scientist not to form opinions beyond what is strictly implied by their evidence. Over years of training and experience in their field, scientists gain an ability to form opinions on the basis of the evidence that sometimes seems to go beyond what the non-expert might conclude, and yet which it is legitimate to report in a scientific publication because of the expertise of the scientist. Nonetheless, when the scientist hasn’t formed any opinion about a proposition and when we must nonetheless ascribe an opinion to them in order to carry out the pooling, we are obliged to make that opinion as conservative as possible. In other words, deviations from a sort of Cliffordian conservatism about opinion are permitted, but only when they are made explicitly by the scientist, and not when they are made by a textbook author or lead author on a paper who is filling in the gaps in another scientist’s opinions.

Why is conservatism the appropriate norm in the legal case? Here, I think the key lies in the legal notion of the ‘reasonable person’. Often this abstract individual is invoked to personify a certain standard of proof that is required in order to find a defendant liable or guilty. On the websites of many US police departments, you will find a definition of ‘probable cause’ in terms of what a ‘reasonable person’ would believe on the basis of the evidence in hand. But it is also used to determine when a defendant’s actions are reasonable. For instance, in Brown vs.@ Kendall, Chief Justice Shaw determined that the ‘ordinary care’ that is necessary for the defendant to avoid liability is “that kind and degree of care, which prudent and cautious men would use”.Footnote 7 And in Commonwealth vs.@ Horsfall, Chief Justice Rugg declared that “every traveller upon a highway is bound to exercise the care of the ordinarily prudent and cautious person under all circumstances”.Footnote 8 In both of these cases, we see that the ‘reasonable person’ is identified with the ‘prudent and cautious person’. In the cases cited, the prudence and caution relate to the individual’s practical choices about their actions; but it seems reasonable to infer that the same condition is placed on the individual’s beliefs. Take the case of Commonwealth vs.@ Horsfall, where a car on a public highway hit an individual, who then died from their injuries. The individual who was killed was stationary, and the driver had seen them from some distance off and sounded their horn. While there was plenty of room to pass, the driver didn’t take it, presumably thinking that the person would move out of the way at the sound of the horn. Even if it might have been rationally permissible to have a high credence that the person would move out of the way, given the driver’s evidence, if that high credence is unusually high or incautious or unreasonable, it seems that its rationality would not exculpate them. Rather, when they are assessed for liability, their action is assessed from the point of view of a person who is cautious in both their beliefs and the actions they perform on the basis of those beliefs.

Now let me explain how we might respect these conservative norms formally. If we wish to extend a credence function in the most conservative way possible, it’s natural to appeal to the Principle of Maximum Entropy (Jaynes, 2003; Paris & Vencovská, 1990; 1997; Williamson, 2010). Typically, that principle applies to an individual whose evidence constrains their credences to some extent, but still permits a range of different credence functions. It is then used to pick out a single credence function from among those: it picks the one that has maximal Shannon entropy.Footnote 9 The idea is this: Shannon entropy measures how unopinionated a probability distribution is. The higher its entropy, the less opinionated it is. Thus, a uniform distribution over a finite partition, which is maximally unopinionated, receives the highest entropy among probability functions over that partition, while a probability function that places all of its mass on a single possible world, and is therefore maximally opinionated, receives the lowest entropy. The idea is that your credence function should respect your evidence; but among credence functions that do this, it should be the least opinionated. In this sense, it should not go beyond the evidence; it should not encode opinions that aren’t demanded by the evidence.Footnote 10

In our case, the situation is a little different. It is not only the individual’s evidence that constrains how we might extend their credences to the propositions that lie in \(\mathcal {F}\) but not in \(\mathcal {F}_i\). It is also the credences that they assign to the propositions in \(\mathcal {F}_i\). So we might imagine that each individual has their own body of evidence \(\mathbf {E}_i\), and we might model this as the set of credence functions on \(\mathcal {F}\) that respect that evidence. Thus, for instance, if among individual i’s body of evidence is the fact that the coin in their pocket is fair, then each credence function in \(\mathbf {E}_i\) should assign credence 50% to that coin landing heads if tossed; and so on. Now, just as we are supposing that all individuals have coherent credence functions, so we might suppose that they all have credence functions that respect their evidence. Thus, for all i, \(R_i\) and \(\mathbf {E}_i\) overlap. Then we might say: when we extend individual i’s credence function from \(\mathcal {F}_i\) to \(\mathcal {F}\), we should ascribe the credence function \(P^{\mathrm {ME}}_i\), which is defined on \(\mathcal {F}\) as follows:

$$\begin{aligned} P^{\mathrm {ME}}_i = \mathop {\mathrm {arg\,sup}}\limits _{P \in \mathbf {E}_i \cap R_i} H(P) \end{aligned}$$

where, recall:

  • \(\mathbf {E}_i\) is the set of credence functions on \(\mathcal {F}\) that respect the evidence that individual i has;

  • \(R_i\) is the set of coherent credence functions on \(\mathcal {F}\) that extend \(P_i\); and

  • H(P) is the Shannon entropy of P.

The motivation is the same as in the standard application of maximal entropy reasoning, where an individual’s credences are constrained only by their evidence, and we demand that they pick among those that satisfy the constrains the one that is least opinionated. Similarly here, where both the individual’s evidence and their existing credences impose constraints, we ascribe to them the credence function among those that satisfies both constraints that is least opinionated. Thus, we define

$$\begin{aligned} \Delta _{\mathrm {ME}^*}(P_1, \ldots , P_n) = \Delta (P^{\mathrm {ME}}_1, \ldots , P^{\mathrm {ME}}_n) \end{aligned}$$

where \(\Delta \) is our favoured pooling function for credence functions defined on the same set of propositions—e.g., linear pooling (\(\Delta _\mathrm {LP}\)) or geometric pooling (\(\Delta _\mathrm {GP}\)).

Figure 3 illustrates the result of this process in the case we’ve considered before where:

  • each individual has no evidence, so that \(\mathbf {E}_1 = \mathbf {E}_2 = \mathcal {P}_\mathcal {F}\); and

  • the propositions X, Y, and Z form a partition and the individuals’ credences are as follows:

    $$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P_1 &{} 0.8 &{} - &{} - \\ P_2 &{} - &{} 0.8 &{} - \\ \end{array} \end{aligned}$$

Then

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline P^{ME}_1 &{} 0.8 &{} 0.1 &{} 0.1 \\ P^{ME}_2 &{} 0.1 &{} 0.8 &{} 0.1 \\ \end{array} \end{aligned}$$

Then, if \(\Delta _{\mathrm {LP}}\) is linear pooling, then

$$\begin{aligned} \begin{array}{c|ccc} &{} X &{} Y &{} Z \\ \hline \Delta _{\mathrm {ME}^*}^{\mathrm {LP}}(P_1, P_2) &{} 0.45 &{} 0.45 &{} 0.1 \\ \end{array} \end{aligned}$$
Fig. 3
figure 3

The barycentric plot of the simplex with (1, 0, 0) at bottom right, (0, 1, 0) at bottom left, and (0, 0, 1) at the top

It’s worth noting that, when we combine this with the illustration from above, we see that taking the credence function that maximises entropy among all linear pools of the possible extensions of \(P_1\) and \(P_2\) is not the same as taking the linear pool of the extensions of \(P_1\) and \(P_2\) that maximise entropy. That is, \(\Delta _{\mathrm {ME}}^{\mathrm {LP}}(P_1, P_2) \ne \Delta _{\mathrm {ME}^*}^{\mathrm {LP}}(P_1, P_2)\). And, it seems to me at least, the latter gives the more sensible result.

9 Conclusion

We’ve met a lot of different pooling functions that purport to cover those cases in which the individuals in the group in question have different agendas. I have argued that they all fail except \(\Delta _{\mathrm {ME}^*}\), which I introduced in the previous section. There, I argued that it is the pooling function we ought to use when we wish to determine the corporate credences of a scientific community in order to present them in a textbook, or the corporate credences of a company or institution we are assessing for liability.

Perhaps there are other situations in which it is the pooling function we ought to use, or at least one of the pooling functions we are permitted to use? I think that may well be true when our purpose is not to determine the corporate credences of a group but to determine its aggregate credences. Recall, in Christian List’s terminology, the aggregate credences of a group provide a condensed summary of the credences of the individuals that make it up. In this case, there is no suggestion that the group is an agent in its own right. It seems right to say that, when we summarise the credences of a group of individuals, and we need to fill in a particular individual’s credence in some proposition in order to perform the summary, we should add as little by way of new opinion as we can. But I don’t think this mere appeal to intuition is as convincing as an argument, so I leave this case to future work, when more compelling considerations might be adduced.