Learning and Pooling, Pooling and Learning

Abstract

We explore which types of probabilistic updating commute with convex IP pooling (Stewart and Ojea Quintana 2017). Positive results are stated for Bayesian conditionalization (and a mild generalization of it), imaging, and a certain parameterization of Jeffrey conditioning. This last observation is obtained with the help of a slight generalization of a characterization of (precise) externally Bayesian pooling operators due to Wagner (Log J IGPL 18(2):336–345, 2009). These results strengthen the case that pooling should go by imprecise probabilities since no precise pooling method is as versatile.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    Not all merging of opinions results require probabilities to converge to certainty (Blackwell and Dubins 1962). Under certain conditions, Bayesian conditionalizing can bring probabilities close even if they do not converge to 1 or 0.

  2. 2.

    \(\Omega \) may be thought of as a partition of a space of agent-relative serious possibilities determined by consistency with a state of full belief. As is a state of full belief, \(\Omega \) is open to being revised, refined, etc., as judged appropriate (Levi 1980).

  3. 3.

    Notice that, due to the way geometric pooling is defined, there are profiles for which \(F(\varvec{p}_1,\ldots , \varvec{p}_n)(\omega ) = 0\) for all \(\omega \in \Omega \)—in violation of the probability axioms. Such a situation arises if for each \(\omega \in \Omega \) there is a \(\varvec{p}_i \in (\varvec{p}_1,\ldots , \varvec{p}_n)\) such that \(\varvec{p}_i(\omega ) = 0\). Circumventing this problem, Wagner restricts the domain of pooling operators to the set of profiles for which this does not happen. That is, the domain of a pooling function is the set of profiles such that there is some \(\omega \in \Omega \) for which \(\varvec{p}_i(\omega ) > 0\) for all \(i=1,\ldots , n\).

  4. 4.

    See Schervish and Seidenfeld (1990), Herron et al. (1997) for studies of convergence relevant to IP.

  5. 5.

    Within the IP research community, convexity is a matter of some controversy. For attacks on the requirement, see Seidenfeld et al. (1989, 2010), Kyburg and Pittarelli (1992). For defenses, see Levi (1990, 2009).

  6. 6.

    In the IP setting, conditionalization can actually lead to greater uncertainty in the short-run, a very interesting phenomenon known as dilation (Seidenfeld and Wasserman 1993; Pedersen and Wheeler 2014).

  7. 7.

    For any \(A \in \mathscr {A},\quad \varvec{p}^E(A) = \frac{\varvec{p}(A \cap E)}{\varvec{p}(E)} = \frac{\sum _{\omega \in A \cap E}\varvec{p}(\omega )}{\sum _{\omega \in E}\varvec{p}(\omega )}\). By the definition of a probability measure, \(\varvec{p}(A) = \sum _{\omega \in A} \varvec{p}(\omega )\),   so \(\sum _{\omega \in A} \varvec{p}^\lambda (\omega ) = \frac{\sum _{\omega \in A} \varvec{p}(\omega )\lambda (\omega )}{\sum _{\omega ' \in \Omega } \varvec{p}(\omega ')\lambda (\omega ')}\) gives us \(\varvec{p}^\lambda (A)\). We show that these two fractions are equal by showing the equality of both the numerators and denominators. Since, for all \(\omega \in A\), \(\varvec{p}(\omega )\lambda (\omega ) = \varvec{p}(\omega )\) if \(\omega \in E\) and 0 otherwise, \(\sum _{\omega \in A}\varvec{p}(\omega )\lambda (\omega ) = \sum _{\omega \in A \cap E} \varvec{p}(\omega ) = \varvec{p}(A \cap E)\). Hence, the numerators are equal. And since, for all \(\omega ' \in \Omega , \varvec{p}(\omega ')\lambda (\omega ') = \varvec{p}(\omega ')\) if \(\omega ' \in E\) and 0 otherwise, we have \(\sum _{\omega ' \in \Omega } \varvec{p}(\omega ')\lambda (\omega ') = \sum _{\omega ' \in E} \varvec{p}(\omega ') = \varvec{p}(E)\). Hence, the denominators are equal, too. So, \(\varvec{p}^E = \varvec{p}^\lambda \).

  8. 8.

    Thanks to Paul Pedersen for emphasizing this point to us.

  9. 9.

    Wagner contends that identical learning should be thought of as identical Bayes factors rather than identical posteriors. One alleged reason is that posteriors are tainted by the prior, whereas Bayes factors are an uncontaminated measure of the impact of the evidence. How do Bayes factors measure the impact of the evidence in isolation from the prior? Consider the case in which \(\varvec{q}\) comes from \(\varvec{p}\) by Bayesian conditionalization on E. Then,

    $$\begin{aligned} \varvec{q}(A)/\varvec{q}(B) = \frac{\varvec{p}(A|E)}{\varvec{p}(B|E)} \end{aligned}$$

    and

    $$\begin{aligned} {\mathcal {B}}(\varvec{q}, \varvec{p}; A:B) = \frac{\varvec{p}(A|E)/\varvec{p}(B|E)}{\varvec{p}(A)/\varvec{p}(B)}. \end{aligned}$$

    So, \({\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)\) is a measure of the change the evidence, E, induces in favor of A over B. \({\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)\) can also be rearranged using Bayes’ theorem.

    $$\begin{aligned} \frac{\varvec{q}(A)}{\varvec{q}(B)} = \frac{\varvec{p}(A|E)}{\varvec{p}(B|E)} = \frac{\frac{\varvec{p}(A)\varvec{p}(E|A)}{\varvec{p}(E)}}{\frac{\varvec{p}(B)\varvec{p}(E|B)}{\varvec{p}(E)}} = \frac{\varvec{p}(A)\varvec{p}(E|A)}{\varvec{p}(B)\varvec{p}(E|B)} = \frac{\varvec{p}(A)}{\varvec{p}(B)} \times \frac{\varvec{p}(E|A)}{\varvec{p}(E|B)} \end{aligned}$$

    Dividing now by \(\frac{\varvec{p}(A)}{\varvec{p}(B)}\), the denominator of \({\mathcal {B}}(\varvec{q}, \varvec{p}; A:B)\), gives us

    $$\begin{aligned} {\mathcal {B}}(\varvec{q}, \varvec{p}; A:B) = \frac{\varvec{p}(E|A)}{\varvec{p}(E|B)} \end{aligned}$$

    The quantity \(\varvec{p}(E|A) \big / \varvec{p}(E|B)\) is sometimes referred to as the likelihood ratio. So, the Bayes factor is a ratio of the non-prior quantities involved in Bayes’ theorem, the quantities that revise the prior.

  10. 10.

    Wagner’s version of commutativity with Jeffrey conditionalization involves some additional technical assumptions. First, that \(\varvec{p}_i(E_k) > 0\) for all i and all k. Second, that \(b_1 = 1\) and \(\sum _k b_k \varvec{p}_i(E_k) < \infty \) for \(i = 1,\ldots , n\). Third, where \(\varvec{q}_i(\omega ) = \frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}\), it is the case that \(0< \sum _k b_k F(\varvec{p}_1,\ldots , \varvec{p}_n)(E_k) < \infty \). In the IP setting, this last assumption may be adjusted to be a requirement for each \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\).

  11. 11.

    In finite spaces, any revision method can be represented as conditionalization in a richer space via superconditioning provided the posterior probability is absolutely continuous with repsect to the prior.

  12. 12.

    A metaphysically deflationary conception of possible worlds has it that a possible world is just a maximally complete set of sentences in some propositional language, instead of a “possible totality of facts.”.

  13. 13.

    Others, however, have offered more uniform accounts of supposition (e.g., Levi 1996).

  14. 14.

    Though, as Diaconis and Zabell’s aforementioned result shows us, in a range of cases there is no mathematical necessity in adopting Jeffrey conditionalization in order to obtain the results of Jeffrey conditionalization.

  15. 15.

    Though it is not uncontroversial that conditionalization or some other type of updating of represents learning. Isaac Levi, for instance, writes, “All conditions of rationality are equilibrium conditions. In a sense they are synchronic conditions [...] Furthermore, in stating conditions of rational equilibrium, no prescription is made regarding the psychological path to be taken in moving from disequilibrium or from one equilibrium position to another. In other words, there are no norms prescribing rational learning processes” (Levi 1970).

References

  1. Arló-Costa, H. (2007). The logic of conditionals. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Summer 2014 ed.). Stanford University: Metaphysics Research Lab.

    Google Scholar 

  2. Baratgin, J., & Politzer, G. (2010). Updating: A psychologically basic situation of probability revision. Thinking & Reasoning, 16(4), 253–287.

    Article  Google Scholar 

  3. Blackwell, D., & Dubins, L. (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics, 33, 882–886.

    Article  Google Scholar 

  4. Christensen, D. (2009). Disagreement as evidence: The epistemology of controversy. Philosophy Compass, 4(5), 756–767.

    Article  Google Scholar 

  5. de Finetti, B. (1964). Foresight: Its logical laws, its subjective sources. In H. E. Kyburg & H. E. Smoklery (Eds.), Studies in Subjective Probability. Hoboken: Wiley.

    Google Scholar 

  6. Diaconis, P., & Zabell, S. L. (1982). Updating subjective probability. Journal of the American Statistical Association, 77(380), 822–830.

    Article  Google Scholar 

  7. Dietrich, F., & List, C. (2014). Probabilistic opinion pooling. In A. Hájek & C. Hitchcock (Eds.), Oxford Handbook of Probability and Philosophy. Oxford: Oxford University Press.

    Google Scholar 

  8. Elga, A. (2007). Reflection and disagreement. Noûs, 41(3), 478–502.

    Article  Google Scholar 

  9. Elkin, L., & Wheeler, G. (2016). Resolving peer disagreements through imprecise probabilities. Noûs. doi:10.1111/nous.12143.

    Google Scholar 

  10. Field, H. (1978). A note on jeffrey conditionalization. Philosophy of Science, 45, 361–367.

    Article  Google Scholar 

  11. Gaifman, H., & Snir, M. (1982). Probabilities over rich languages, testing and randomness. The Journal of Symbolic Logic, 47(03), 495–548.

    Article  Google Scholar 

  12. Gaifman, H., & Vasudevan, A. (2012). Deceptive updating and minimal information methods. Synthese, 187(1), 147–178.

    Article  Google Scholar 

  13. Gärdenfors, P. (1982). Imaging and conditionalization. The Journal of Philosophy, 79, 747–760.

    Article  Google Scholar 

  14. Genest, C. (1984). A characterization theorem for externally bayesian groups. The Annals of Statistics, 12, 1100–1105.

    Article  Google Scholar 

  15. Genest, C., McConway, K. J., & Schervish, M. J. (1986). Characterization of externally bayesian pooling operators. The Annals of Statistics,14, 487–501.

    Article  Google Scholar 

  16. Genest, C., & Wagner, C. G. (1987). Further evidence against independence preservation in expert judgement synthesis. Aequationes Mathematicae, 32(1), 74–86.

    Article  Google Scholar 

  17. Genest, C., & Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1, 114–135.

    Article  Google Scholar 

  18. Girón, F. J., & Ríos, S. (1980). Quasi-bayesian behaviour: A more realistic approach to decision making? Trabajos de Estadística y de Investigación Operativa, 31(1), 17–38.

    Article  Google Scholar 

  19. Good, I. J. (1983). Good Thinking: The Foundations of Probability and Its Applications. Minneapolis: U of Minnesota Press.

    Google Scholar 

  20. Hájek, A., & Hall, N. (1994). The hypothesis of the conditional construal of conditional probability. In E. Eells & B. Skyrms (Eds.), Probability and conditionals: Belief revision and rational decision (pp. 75–112). Cambridge: Cambridge University Press.

    Google Scholar 

  21. Hartmann, S. (2014). A new solution to the problem of old evidence. In Philosophy of Science Association 24th Biennial Meeting, Chicago, IL.

  22. Herron, T., Seidenfeld, T., & Wasserman, L. (1997). Divisive conditioning: Further results on dilation. Philosophy of Science, 64, 411–444.

    Article  Google Scholar 

  23. Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(04), 611–648.

    Article  Google Scholar 

  24. Jeffrey, R. (2004). Subjective Probability: The Real Thing. Cambridge: Cambridge University Press.

    Google Scholar 

  25. Joyce, J. M. (1999). The Foundations of Causal Decision Theory. Cambridge: Cambridge University Press.

    Google Scholar 

  26. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86.

    Article  Google Scholar 

  27. Kyburg, H. E. (1987). Bayesian and non-bayesian evidential updating. Artificial Intelligence, 31(3), 271–293.

    Article  Google Scholar 

  28. Kyburg, H.E., Pittarelli, M. (1992). Some problems for convex bayesians. In Proceedings of the Eighth International Conference on Uncertainty in Artificial Intelligence, pp. 149–154. Morgan Kaufmann Publishers Inc.

  29. Leitgeb, H. (2016). Imaging all the people. Episteme. doi:10.1017/epi.2016.14.

  30. Levi, I. (1967). Probability kinematics. British Journal for the Philosophy of Science, 18(3), 197–209.

    Article  Google Scholar 

  31. Levi, I. (1970). Probability and evidence. In M. Swain (Ed.), Induction, Acceptance, and Rational Belief (pp. 134–156). New York: Humanities Press.

    Google Scholar 

  32. Levi, I. (1978). Irrelevance. In C. Hooker, J. Leach, & E. McClennen (Eds.), Foundations and Applications of Decision Theory (Vol. 1, pp. 263–273). Boston: Springer.

    Google Scholar 

  33. Levi, I. (1980). The Enterprise of Knowledge. Cambridge, MA: MIT Press.

    Google Scholar 

  34. Levi, I. (1985). Consensus as shared agreement and outcome of inquiry. Synthese, 62(1), 3–11.

    Article  Google Scholar 

  35. Levi, I. (1990). Pareto unanimity and consensus. The Journal of Philosophy, 87(9), 481–492.

    Article  Google Scholar 

  36. Levi, I. (1996). For the Sake of the Argument: Ramsey Test Conditionals, Inductive Inference and Nonmonotonic Reasoning. Cambridge: Cambridge University Press.

    Google Scholar 

  37. Levi, I. (2009). Why indeterminate probability is rational. Journal of Applied Logic, 7(4), 364–376.

    Article  Google Scholar 

  38. Lewis, D. (1976). Probabilities of conditionals and conditional probabilities. The Philosophical Review, 85, 297–315.

    Article  Google Scholar 

  39. Madansky, A. (1964). Externally Bayesian Groups. Santa Monica, CA: RAND Corporation.

    Google Scholar 

  40. Nau, R. F. (2002). The aggregation of imprecise probabilities. Journal of Statistical Planning and Inference, 105(1), 265–282.

    Article  Google Scholar 

  41. Pedersen, A. P., & Wheeler, G. (2014). Demystifying dilation. Erkenntnis, 79(6), 1305–1342.

    Article  Google Scholar 

  42. Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainty. Random House.

  43. Ramsey, F. P. (1990). Truth and probability. In D. H. Mellor (Ed.), Philosophical Papers (pp. 52–109). Cambridge University Press.

  44. Russell, J. S., Hawthorne, J., & Buchak, L. (2015). Groupthink. Philosophical Studies, 172(5), 1287–1309.

    Article  Google Scholar 

  45. Savage, L. (1972, originally published in 1954). The Foundations of Statistics. New York: Wiley.

  46. Schervish, M., & Seidenfeld, T. (1990). An approach to consensus and certainty with increasing evidence. Journal of Statistical Planning and Inference, 25(3), 401–414.

    Article  Google Scholar 

  47. Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53, 467–491.

    Article  Google Scholar 

  48. Seidenfeld, T., Kadane, J. B., & Schervish, M. J. (1989). On the shared preferences of two bayesian decision makers. The Journal of Philosophy, 86(5), 225–244.

    Article  Google Scholar 

  49. Seidenfeld, T., Schervish, M. J., & Kadane, J. B. (2010). Coherent choice functions under uncertainty. Synthese, 172(1), 157–176.

    Article  Google Scholar 

  50. Seidenfeld, T., & Wasserman, L. (1993). Dilation for sets of probabilities. The Annals of Statistics, 21(3), 1139–1154.

    Article  Google Scholar 

  51. Skyrms, B. (1986). Choice and Chance: An Introduction to Inductive Logic (3rd ed.). Belmont: Wadsworth Publishing Company.

    Google Scholar 

  52. Spohn, W. (2012). The Laws of Belief: Ranking Theory and Its Philosophical Applications. Oxford: Oxford University Press.

    Google Scholar 

  53. Stewart, R. T. & Ojea Quintana, I. (2017). Probabilistic opinion pooling with imprecise probabilities. Journal of Philosophical Logic. doi:10.1007/s10992-016-9415-9.

    Google Scholar 

  54. van Fraassen, B. C. (1989). Laws and Symmetry. Oxford: Clarendon Press.

    Google Scholar 

  55. Wagner, C. (2002). Probability kinematics and commutativity. Philosophy of Science, 69(2), 266–278.

    Article  Google Scholar 

  56. Wagner, C. (2009). Jeffrey conditioning and external bayesianity. Logic Journal of IGPL, 18(2), 336–345.

    Article  Google Scholar 

  57. Williams, P. M. (1980). Bayesian conditionalisation and the principle of minimum information. British Journal for the Philosophy of Science, 31, 131–144.

    Article  Google Scholar 

Download references

Acknowledgements

The bulk of this work was done while we were on a Junior Group Visiting Fellowship at the Munich Center for Mathematical Philosophy. The paper benefited from conversations with Stephan Hartmann and Hannes Leitgeb. We would especially like to thank Greg Wheeler for feedback, numerous relevant discussions, and support. We are grate- ful to Matt Duncan, Robby Finley, Arthur Heller, Isaac Levi, Michael Nielsen, Rohit Parikh, Paul Pedersen, Teddy Seidenfeld, and Reuben Stern for their excellent comments on drafts or presentations of the pa- per. Finally, thanks to an anonymous referee for his or her meticulous and valuable review.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rush T. Stewart.

Appendices

Appendix: Proofs

Proof of Proposition 2

Proof

We follow through Wagner’s proof for the precise case (2009, Theorem 3.3), adapting it for IP where necessary.

\((\Rightarrow )\) Assume that \({\mathcal {F}}\) is externally Bayesian, i.e., for all profiles and any likelihood function, \({\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )\). We want to show that, for all partitions \(\varvec{E} = \{E_k\}\) of \(\Omega \) and all profiles in \({\mathbb {P}}^n\),

$$\begin{aligned} {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)= & {} \left\{ \dfrac{\sum _k b_k \varvec{p}[\cdot \in E_k]}{\sum _k b_k \varvec{p}(E_k)}: \varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\right\} \\= & {} {\mathcal {F}}\left( \dfrac{\sum _k b_k \varvec{p}_1[\cdot \in E_k]}{\sum _k b_k \varvec{p}_1(E_k)},\ldots , \dfrac{\sum _k b_k \varvec{p}_n[\cdot \in E_k]}{\sum _k b_k \varvec{p}_n(E_k)}\right) \\= & {} {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}}) \end{aligned}$$

where the first and last equalities are definitional. Recall the definition of \(b_k\): \(b_k = {\mathcal {B}}(\varvec{q},\varvec{p};E_k:E_1) = \dfrac{\varvec{q}(E_k)/\varvec{q}(E_1)}{\varvec{p}(E_k)/\varvec{p}(E_1)}\), \(k = 1, 2,\ldots \) Set \(\lambda (\omega ) = \sum _k b_k [\omega \in E_k]\). Wagner observes the following chain of equalities then obtains for \(\varvec{p}_i, i = 1,\ldots , n\) (2009, (3.10), p. 342):

$$\begin{aligned} (\star )\sum _{\omega \in \Omega } \lambda (\omega )\varvec{p}_i(\omega ) = \sum _{\omega \in \Omega }\varvec{p}_i(\omega )\sum _k b_k [\omega \in E_k] = \sum _k b_k \sum _{\omega \in \Omega }\varvec{p}_i(\omega )[\omega \in E_k] = \sum _k b_k \varvec{p}_i(E_k) \end{aligned}$$

Since each of the terms \(b_k \varvec{p}_i(E_k)\) is positive and \(\sum _k b_k \varvec{p}_i(E_k) < \infty \), \(\lambda \) is a likelihood function for \(\varvec{p}_i,\) with \(\varvec{p}^{\lambda}_{i}\) a defined, updated pmf for \(i = 1,\ldots , n.\) Using \((\star )\), we can obtain

$$\begin{aligned} {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}}) = {\mathcal {F}}\left( \frac{\varvec{p}_1\lambda (\cdot )}{\sum _{\omega ' \in \Omega }\varvec{p}_1(\omega ')\lambda (\omega ')},\ldots , \frac{\varvec{p}_n\lambda (\cdot )}{\sum _{\omega ' \in \Omega } \varvec{p}_n(\omega ')\lambda (\omega ')}\right) \end{aligned}$$

by substituting, for each \(i=1,\ldots , n\), \(\lambda (\cdot )\) for \(\sum _k b_k [\omega \in E_k]\) in the numerator and \(\sum _{\omega ' \in \Omega } \varvec{p}_i(\omega ')\lambda (\omega ')\) for \(\sum _k b_k \varvec{p}_i(E_k)\) in the denominator. But by definition,

$$\begin{aligned} {\mathcal {F}}\left( \frac{\varvec{p}_1\lambda (\cdot )}{\sum _{\omega ' \in \Omega }\varvec{p}_1(\omega ')\lambda (\omega ')},\ldots , \frac{\varvec{p}_n\lambda (\cdot )}{\sum _{\omega ' \in \Omega } \varvec{p}_n(\omega ')\lambda (\omega ')}\right) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda ) \end{aligned}$$

and by assumption \({\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )={\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n)\). By definition, \({\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = \{\varvec{p}^\lambda : \varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\}\). But, for all \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\), \(\varvec{p}^\lambda = \frac{\sum _k b_k \varvec{p}[\cdot \in E_k]}{\sum _k b_k \varvec{p}(E_k)}\). Hence, \({\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)\). So, \({\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}})\) follows from the assumption.

\((\Leftarrow )\) Suppose that \({\mathcal {F}}\) satisfies \(\textit{CJC}_W\) and that \(\lambda \) is a likelihood function for \(\varvec{p}_i, i = 1,\ldots , n\). Let \((\omega _1, \omega _2,\ldots )\) be a list of all of those \(\omega \in \Omega \) such that \(\lambda (\omega ) > 0\), and let \(\varvec{E} = \{E_1, E_2,\ldots \},\) where \(E_i:\,= \{\omega _i\}.\) Setting \(b_k = \frac{\lambda (\omega _k)}{\lambda (\omega _1)}\) for \(k = 1, 2,\ldots \), it follows that \(b_k>0\) and that \(b_1=1\). Since \(\lambda \) is a likelihood for \(\varvec{p}_i, i = 1,\ldots , n,\) we have \(\sum _k b_k \varvec{p}_i(E_k)<\infty , i = 1,\ldots , n,\) and that \((\varvec{q}_1,\ldots , \varvec{q}_n) \in {\mathbb {P}}^n,\) where \(\varvec{q}_i(\omega ):\,= \frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}.\) From \(\textit{CJC}_W\), it follows that \(1)\ 0< \sum _k b_k \varvec{p}(E_k) < \infty \) for all \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n),\) and that \(2)\ {\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_{1J}^{\varvec{E}},\ldots , \varvec{p}_{nJ}^{\varvec{E}})\). 1) implies that \(0<\sum _{\omega \in \Omega } \lambda (\omega ) \varvec{p}(\omega ) < \infty \) for all \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\), and 2) implies that \({\mathcal {F}}^\lambda (\varvec{p}_1,\ldots , \varvec{p}_n) = {\mathcal {F}}(\varvec{p}_1^\lambda ,\ldots , \varvec{p}_n^\lambda )\) (since substituting the definition of \(b_k\) in terms of \(\lambda \) in \(\frac{\sum _k b_k \varvec{p}_i(\omega )[\omega \in E_k]}{\sum _k b_k \varvec{p}_i(E_k)}\), the formula for obtaining the \(\varvec{q}_i\), reduces that formula to the formula for updating on that \(\lambda \)). \(\square \)

Proof of Proposition 5

Proof

We provide a case in which convex IP pooling and Jeffrey conditionalization as standardly construed do not commute. Let \(\varvec{q}_i\) come from \(\varvec{p}_i\) by Jeffrey conditionalization, and let \(\varvec{q}\) be a common posterior distribution over partition \(\varvec{E}\) for \(\varvec{p}_i\), \(i = 1,\ldots , n\). Let \({\mathcal {F}}_{J}^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n)\) come from \({\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\) by Jeffrey conditionalizing each \(\varvec{p}_i\) using \(\varvec{q}\), the common posterior distribution over \(\varvec{E}\). We offer a counterexample to commutativity in which \({\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1,\ldots , \varvec{p}_n) \ne {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)\).

Let \(\Omega = \{\omega _1, \omega _2, \omega _3, \omega _4\}\), and consider the following two pmfs listed in Table 2. Let \(\varvec{E} = \{E_1, E_2\}\) with \(E_1 = \{\omega _1, \omega _2\}\) and \(E_2 = \{\omega _3, \omega _4\}\) be a partition of \(\Omega \). Jeffrey updating both pmfs using \(\varvec{q}\), where \(\varvec{q}(E_1) = 2/3\) and \(\varvec{q}(E_2) = 1/3\), we obtain the following posteriors listed in (Table 3).

Table 2 Priors
Table 3 Posteriors

Consider the \(.50-.50\) mixture of \(\varvec{p}_1\) and \(\varvec{p}_2\), \(\varvec{p}^\star = 0.5\varvec{p}_1 + 0.5\varvec{p}_2\). It is clear that \(\varvec{p}^\star \in {\mathcal {F}}(\varvec{p}_1, \varvec{p}_2)\). Jeffrey conditionalizing \(\varvec{p}^\star \) with \(\varvec{q}\) gives us \(\varvec{q}^\star \). In particular, \(\varvec{q}^\star (\omega _1) = 2/9\) and \(\varvec{q}^\star (\omega _3) = 4/21\). It is clear that \(\varvec{q}^\star \in {\mathcal {F}}^J_{\varvec{E}}(\varvec{p}_1, \varvec{p}_2)\). Any \(\varvec{q}_\star \in {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)\) is of the form \(\varvec{q}_\star = \alpha \varvec{q}_1 + (1 - \alpha ) \varvec{q}_2\) for \(\alpha \in [0, 1]\).

Suppose that \({\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1, \varvec{p}_2) = {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)\). Then, there is a \(\varvec{q}_\star \in {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)\) such that \(\varvec{q}^\star = \varvec{q}_\star \). In particular, \(\varvec{q}_\star (\omega _1) = 2/9\) and \(\varvec{q}_\star (\omega _3) = 4/21\). Letting \(\varvec{q}_\star (\omega _1) = 2/9\), we can compute \(\alpha \).

$$\begin{aligned} 2/9 = \varvec{q}_\star (\omega _1) = \alpha \varvec{q}_1(\omega _1) + (1 - \alpha )\varvec{q}_2(\omega _1) = \alpha 1/3 + (1 - \alpha ) 2/15 \end{aligned}$$

Solving, we get \(\alpha = 4/9\). However, we are supposed to have \(\varvec{q}_\star (\omega _3) = 4/21\). For \(\alpha = 4/9\), that is not the case.

$$\begin{aligned} \varvec{q}_\star (\omega _3) = \alpha \varvec{q}_1(\omega _3) + (1 - \alpha ) \varvec{q}_2(\omega _3) = 4/9(1/6) + 5/9(2/9) = 16/81 > 4/21 = \varvec{q}^\star (\omega _3) \end{aligned}$$

It follows that \({\mathcal {F}}_J^{\varvec{E}}(\varvec{p}_1, \varvec{p}_2) \ne {\mathcal {F}}(\varvec{q}_1, \varvec{q}_2)\). \(\square \)

Proof of Proposition 6

Proof

We want to show that \({\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n) = {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)\), where \(\varvec{q}_i\) comes from \(\varvec{p}_i\) by general imaging on E, and \({\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)\) comes from \({\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\) by general imaging each \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\) on E. Again, we show both inclusions. In the proofs, we appeal to the fact any element of a convex set is some convex combination of the generating, extreme points: For any \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n), \varvec{p}=\sum _{i=1}^n \alpha _i\varvec{p}_i\), where \(\alpha _i \ge 0\) for \(i = 1,\ldots , n\), and \(\sum _{i=1}^n \alpha _i= 1\) (see, e.g., Stewart & Ojea Quintana 2017, Lemma 1).

Let \(\varvec{q}\in {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)\). So, \(\varvec{q}= \sum _{i=1}^n \alpha _i\varvec{q}_i\). Since \(\varvec{q}\) is a linear pool of \(\varvec{q}_i\) for \(i = 1,\ldots , n\), by Gärdenfors’ result, Theorem 5, \(\varvec{q}\) is also the result of imaging \(\varvec{p}= \sum _{i=1}^n\alpha _i\varvec{p}_i\) on E, because linear pooling and general imaging commute. Since \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\), it follows that \(\varvec{q}\in {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)\).

For the other direction, assume that \(\varvec{q}\in {\mathcal {F}}_I^E(\varvec{p}_1,\ldots , \varvec{p}_n)\). So, \(\varvec{q}\) is the result of general imaging some \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n)\) on E. For any \(\varvec{p}\in {\mathcal {F}}(\varvec{p}_1,\ldots , \varvec{p}_n), \varvec{p}= \sum _{i=1}^n\alpha _i\varvec{p}_i\). By Gärdenfors’ result, \(\varvec{q}= \sum _{i=1}^n \alpha _i \varvec{q}_i\), where the \(\varvec{q}_i\) come from the \(\varvec{p}_i\) by general imaging on E, because general imaging and linear pooling commute. But then it follows that \(\varvec{q}\in {\mathcal {F}}(\varvec{q}_1,\ldots , \varvec{q}_n)\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stewart, R.T., Quintana, I.O. Learning and Pooling, Pooling and Learning. Erkenn 83, 369–389 (2018). https://doi.org/10.1007/s10670-017-9894-2

Download citation