Skip to main content
Log in

A Further Look at the Bayes Blind Spot

  • Original Research
  • Published:
Erkenntnis Aims and scope Submit manuscript

Abstract

Gyenis and Rédei (G&R) have shown that any prior p on a finite algebra A, however chosen, significantly restricts the set of posteriors derivable from p by Jeffrey conditioning (JC) on a nontrivial measurable partition (i.e., a partition consisting of members of A, at least one of which is not an atom of A). They support this claim by proving that the set of potential posteriors not derivable from p in this way, which they call the Bayes blind spot of p, is large, having cardinality c and normalized Lebesgue measure 1, as well as being of second Baire category for a natural metrizable topology. In the present paper, we establish results analogous to those of G&R for probability measures on any infinite sigma algebra of subsets of a countably infinite set (which requires distinctly different treatments of the topological and measure-theoretic cases). We also show, in both the finite and infinite cases, that all of the limitative results for a single prior p continue to hold for the intersection of the Bayes blind spots of countably many priors. This leads us to reject the claim of G&R that the large size of blind spots in the single prior case is attributable to the limitations imposed by priors. We argue instead that it is the so-called rigidity property of JC that accounts for the large size of Bayes blind spots. G&R also prove that any potential posterior q can be derived from a prior p by at most two applications of JC on nontrivial partitions. But they remark that their particular two-stage derivation of an \(r \in BS(p)\) from p is barely distinguishable from a single, complete reassessment on a trivial partition. We show, however, that there are two-stage derivations of r from p that require no direct assessment of the probability of any atom at either stage. Finally, in order to situate the aforementioned limitative results in the proper context, we demonstrate that a probability revision that would amount to complete reassessment within a Jeffrey evidentiary framework may arise, in a different evidentiary framework, in a way that involves no direct assessment of the probability of any atom, as, for example, in a generalization of Jeffrey’s solution to the problem of old evidence and new explanation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This assumption, which is also made in G&R (2021), allows us to avoid continually having to specify in various definitions and theorems that certain probabilities are nonzero. In particular, if p is strictly coherent, then every probability measure q is absolutely continuous with respect to p (i.e., if \(p(A) = 0\), then \(q(A) = 0)\), and hence not ruled out a priori as coming from \(p\) by strict conditioning or by Jeffrey conditioning. Recall that if probabilities are construed as the (linear utility) prices one is willing to pay for certain bets, then conforming your probabilities with the axioms of finitely additive probabilities (so-called coherent probabilities) protects you against a Dutch book, i.e., accepting a finite sequence of bets on which you are sure to sustain a net loss. It is commonly believed that probability measures must be strictly coherent in order to avoid accepting bets on which a net gain is impossible, but a net loss is possible (a so-called weak Dutch book). But see Wagner (2007) regarding a slightly modified conception of subjective probability in which mere additivity suffices not only to protect you against sure loss, but also against vulnerability to a weak Dutch book.

  2. This demonstrates that the assertion, once made to one of us, that “requiring rigidity as a precondition for employing JC turns it into a tautology,” is ill-founded. For you can always go wrong in judging that the rigidity assumption is warranted.

  3. As an illustration of the perils of parochialism, see Weisberg’s (2009) critique of JC, based on a parochial construal of JC, and its rebuttal in Wagner (2013).

  4. Jeffrey conditioning may also arise by strict conditioning on a single event E, as illustrated by the following example: Suppose that A is the algebra generated by the partition E \(= \{ E_{1} ,...,E_{n} \}\) of X, along with the hypothesis \(H \subset X\) and some \(E \subset X.\) Let p be a probability measure on A such that \(p(EE_{i} ) > 0\) for \(i = 1,...,n,\) with \(0 < p(H) < 1,\) and let \(q(A):\) \(= p(A|E)\), for all \(A \in\) A. Suppose that H and E are p-conditionally independent, given \(E_{i}\) (i.e., \(p(H|EE_{i} ) = p(H|E_{i} ))\), for \(i = 1,...,n.\) Then, for \(i = 1,...,n,\) \(q(H|E_{i} ) = p(H|EE_{i} ) = p(H|E_{i} )\) and, indeed, \(q(A|E_{i} ) = p(A|E_{i} )\), for all \(A \in\) A*, the subalgebra of A generated by E and H. So, on A*, q comes from p by JC on E. Jeffrey’s motivation for developing probability kinematics was of course that the experience prompting the revision of p often fails to be representable as the occurrence of a single event E. But it is sometimes useful, when thinking about rigidity, to imagine such a notional “phenomenological event” E, and to think about conditional independence instead. When A is finite, this is always a formal possibility (see Diaconis and Zabell (1982, Theorem 2.1) for full details).

  5. If E and F are distinct partitions of a set X, then F is coarser than E if, for every \(E \in\) E, there exists an \(F \in\) F such that \(E \subseteq F\) (equivalently, if every \(F \in\) F is a union of members of E).

  6. It may be worth recalling here that a finite algebra of subsets of an arbitrary set is ipso facto a sigma algebra, since every countable union of such subsets is equal to some finite union of those subsets. Similarly, every finitely additive probability measure on a finite algebra is countably additive, since, in every infinite sequence of pairwise disjoint sets from that algebra, all but finitely many are equal to the empty set.

  7. Since \(S \subseteq [0,1]^{P}\), where \(P = \{ 1,2,...\}\), \(|S|\) \(\le\) c \(^{{aleph_{0} }}\) = (2 \(^{{aleph_{0} }}\))\(^{{aleph_{0} }}\) = 2 \(^{{(aleph_{0} \times aleph_{0}) }} = 2^{{aleph_{0} }} = c.\)

    Also, S \(\supseteq \{ (x,1 - x,0,0,...):0 \le x \le 1\}\), and so \(|S|\) \(\ge c\).

  8. Their proof extends easily to a countably infinite set \(X = \{ a,b,x_{1} ,x_{2} ,...\}\), using the partitions

    E \(= \{ \{ a\} ,\{ b\} ,\{ x_{1} ,x_{2} ,...\} \}\) and F \(= \{ \{ a,b\} ,\{ x_{1} \} ,\{ x_{2} \} ,...\}\).

  9. The best known example is Einstein’s explanation of the previously observed advance in the perihelion of Mercury (E) by means of the general theory of relativity (H). See Weinberg (1992, p. 94).

  10. Glymour actually identified more than one old evidence problem. The problem considered here is what Garber (1983) calls the historical problem of old evidence. Jeffrey (1995) calls it the problem of new explanation.

  11. So the conditions under which generalized reparation is applied (unconditional probabilities remain unchanged, conditional probabilities are revised) are antipodal to those under which JC is applied (unconditional probabilities are revised, conditional probabilities remain unchanged).

  12. See Wagner (2003) for an elaboration of this point, and additional references.

  13. It is easy to show that (5.1) is equivalent to the simpler, equivalent conditions

    \(\frac{r(A)/r(HE)}{{p(A)/p(HE)}} = \frac{{p_{1} (A)/p_{1} (HE)}}{{p_{0} (A)/p_{0} (HE)}},\, {\text{for}}\;{\text{all}}\;A \in \{ HE^{c} ,H^{c} E,H^{c} E^{c} \} .\)

  14. Especially in cases involving measure and category, disagreements abound. For example, the “fat” Cantor set \(C_{n}\) in \([0,1]\) has measure > \(1 - n^{ - 1}\). So \(\cup_{n \ge 1} C_{n}\) has measure 1, yet is a countable union of nowhere dense sets, hence of first category. On the other hand, let \((r_{i} )_{i \ge 1}\) be a list of the rational numbers in the interval (0,1). For all \(n,m \ge 1\), let \(K_{{n,m}} :=\) \( (r_{n}-2^{-(n+m)} ,r_{n}+2^{-(n+m)} ) \cap (0,1)\), and for all \(m \ge 1\), let \(K_{m} := \cup_{n \ge 1} K_{{ n,m }} .\) Finally, let \(K:\) \(= \cap_{m \ge 1} K_{m} .\) Then \(K\) in [0,1] is of second category in the usual topology (since its complement is of first category, being a countable union of nowhere dense sets), yet has measure zero.

  15. By comparison, there are obviously second category subsets of a topological space whose complements are also second category, and sets of cardinality c whose complements in some superset also have cardinality c.

References

  • Diaconis, P., & Zabell, S. (1982). Updating subjective probability. Journal of the American Statistical Association, 77(380), 822–830.

    Article  Google Scholar 

  • Friedman, A. (1982). Foundations of modern analysis. Dover.

    Google Scholar 

  • Garber, D. (1983). Old evidence and logical omniscience. In J. Earman (Ed.), Testing scientific theories (pp. 99–131). University of Minnesota Press.

    Google Scholar 

  • Glymour, C. (1980). Theory and evidence. Princeton University Press.

    Google Scholar 

  • Gyenis, Z., & Rédei, M. (2017). General properties of Bayesian learning as statistical inference determined by conditional expectations. The Review of Symbolic Logic, 10(4), 719–737.

    Article  Google Scholar 

  • Jeffrey, R. (1983). The logic of decision (2nd ed.). University of Chicago Press.

    Google Scholar 

  • Jeffrey, R. (1992). Probability and the art of judgment. Cambridge University Press.

    Book  Google Scholar 

  • Jeffrey, R. (1995). Probability reparation: The problem of new explanation. Philosophical Studies, 77, 97–102.

    Article  Google Scholar 

  • Rédei, M., & Gyenis, Z. (2021). Having a look at the Bayes blind spot. Synthese, 198(4), 3801–3832.

    Article  Google Scholar 

  • Rényi, A. (1970). Foundations of probability. Holden-Day.

    Google Scholar 

  • Van Fraassen, B. (1980). Rational belief and probability kinematics. Philosophy of Science, 47, 165–187.

    Article  Google Scholar 

  • Wagner, C. (1997). Old evidence and new explanation. Philosophy of Science, 64, 677–691.

    Article  Google Scholar 

  • Wagner, C. (1999). Old evidence and new explanation II. Philosophy of Science, 66, 283–288.

    Article  Google Scholar 

  • Wagner, C. (2001). Old evidence and new explanation III. Philosophy of Science, 68, S165–S175.

    Article  Google Scholar 

  • Wagner, C. (2003). Commuting probability revisions: The uniformity rule. Erkenntnis, 59, 349–364.

    Article  Google Scholar 

  • Wagner, C. (2007). The Smith–Walley interpretation of subjective probability: An appreciation. Studia Logica, 86, 343–350.

    Article  Google Scholar 

  • Wagner, C. (2013). Is conditioning really incompatible with holism? Journal of Philosophical Logic, 42, 409–414.

    Article  Google Scholar 

  • Weinberg, S. (1992). Dreams of a final theory. Pantheon Press.

    Google Scholar 

  • Weisberg, J. (2009). Commutativity or holism? A dilemma for conditionalizers. The British Journal for the Philosophy of Science, 60, 793–812.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carl Wagner.

Ethics declarations

Conflict of interest

The authors have no financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mathematical Appendix

Mathematical Appendix

Theorem 3.1.

Let P be a nonempty, countable set of strictly coherent probability distributions on the countable set X, with \(|X|\) \(\ge 2.\) Then the cardinality of \(BS(\)P\()\) is equal to c.

Proof.

We treat only the case where X is denumerably infinite, as the simple modifications required to accommodate the finite case of X will be apparent. We show first that \(BS(\)P\()\) is nonempty. Let P = {p(1),p(2),…}, where \({\mathbf{p}}^{(k)}= (p_{i}^{(k)} )_{i \ge 1}\) with \(\sum\nolimits_{i \ge 1} {p_{i}^{(k)} } = 1\) and \(p_{i}^{(k)} > 0\) for all \(i\), for each \(k \ge 1.\) Let \(m_{1}\) be any positive real number. If \(i \ge 2\), and the positive real numbers \(m_{1} ,...,m_{i - 1}\) have been chosen, choose \(m_{i}\) such that (i) \(0<{m}_{i}<{2}^{-i}\) and (ii) for all \(1 \le j \le i - 1\) and \(k \ge 1,\) \(m_{i} /p_{i}^{(k)} \ne m_{j} /p_{j}^{(k)} .\) Note that such a sequence can be constructed since, at each step, only a countable number of possible values for \(m_{i}\) are being excluded from the (uncountable) set \((0,2^{ - i} )\). Further, the series \(\sum\nolimits_{i \ge 1} {m_{i} }\) is convergent, with, say, \(\sum\nolimits_{i \ge 1} {m_{i} }\) \(= m\). Define q \(= (q_{i} )_{i \ge 1}\) by setting \(q_{i} = m_{i} /m\). Then the sequence (\(q_{i} /p_{i}^{(k)} )_{i \ge 1}\) has all its terms distinct, for each \(k \ge 1\), by construction, with \(\sum\nolimits_{i \ge 1} {q_{i} } = 1\), so q \(\in BS(\)P\()\).

Let \(\varepsilon = \min \{ 1 - q_{1} ,q_{2} \} .\) Given \(0 < \delta < \varepsilon\), let \(s_{\delta }^{(k)}\) be the sequence defined by \(\frac{{q_{1} + \delta }}{{p_{1}^{(k)} }},\frac{{q_{2} - \delta }}{{p_{2}^{(k)} }},\frac{{q_{3} }}{{p_{3}^{(k)} }},\frac{{q_{4} }}{{p_{4}^{(k)} }},...\) for each \(k \ge 1\). Note that q \(\in BS(\)P\()\) implies that there are only a countable number of values of \(\delta\) for which \(s_{\delta }^{(j)}\) fails to have all its terms distinct for some \(j \ge 1.\) Then for all other values of \(\delta\) in the interval \((0,\varepsilon )\), we have \(q_{\delta } = (q_{1} + \delta ,q_{2} - \delta ,q_{3} ,q_{4} ,...) \in\) \(BS(\)P\()\). Hence \(|BS(\)P\()|\) \(\ge c\), as it contains a set of cardinality \(c\). On the other hand, \(BS(\)P\() \subset [0,1]\)P, where P = {1,2,…}, and so \(|BS(\)P\()|\) \(\le\) \(|[0,1]|^{|P|}\) \(= c\), whence \(|BS(\)P\()|\) \(= c.\)\(\square\)

Theorem 3.2.

Let p\(= (p_{i} )_{i \ge 1} \in S\), with \(p_{i} > 0\) for all i. Then \(BS(\)p\()\) in the \({l}^{1}\)-norm topology on S (i) is of the second category, (ii) is dense in S, and (iii) has an empty interior.

Proof.

(i) Given \(1 \le i < j\), let \(S_{i,j}\) denote the set consisting of those members q\(= (q_{i} )_{i \ge 1}\) such that \(q_{i} /p_{i} = q_{j} /p_{j}\). Note that \(S_{i,j}\) is a closed subset of S having an empty interior, as one can clearly find v \(\in S - S_{i,j}\) with \(|\)v\(-\)q\(|\) \(< \varepsilon\) for any given \(\varepsilon > 0\) and q (upon slightly perturbing the entries \(q_{i}\) and \(q_{j}\) in q). Since the closure of \(S_{i,j}\) has empty interior, it is nowhere dense for each i and j, and thus.

$$ BS({\mathbf{p}})^{c} = \bigcup\limits_{1 \le i < j} {S_{i,j} } $$

is a countable union of nowhere dense sets. Hence, by definition, \(BS(\)p\()^{c}\) is of the first category. Now the fact that S is complete implies that it is of the second category, by the Baire category theorem (Friedman, 1982, p. 106). But then the fact that S = \(BS(\) p) \(\cup\) \(BS(\)p\()^{c}\) implies that \(BS(\)p\()\) must be of the second category, for otherwise \(BS(\)p) \(\cup\) \(BS(\)p\()^{c}\) would be of the first category, being a countable union of nowhere dense sets.

(ii) Let q\(\in S\). We will find a member of \(BS(\)p) whose \({l}^{1}\)-distance from q is arbitrarily small. Let \(0 < \varepsilon < 1/2\) be given. Define a sequence r\(= (r_{1} ,r_{2} ,...)\) of nonnegative real numbers as follows: choose any \(r_{1}\), and for \(n > 1,\) let \(r_{n} \ge 0\) be such that \(r_{n} /p_{n}\) is distinct from all of the values \(r_{i} /p_{i}\), for \(1 \le i \le n - 1.\) Further, we may assume for all \(n \ge 1\) that \(|r_{n} - q_{n} |\) \(< \varepsilon \cdot 2^{ - n}\). Then \(\sum\nolimits_{n \ge 1} {r_{n} }\) is a convergent series with

$$ |{\mathbf{r}} - {\mathbf{q}}| < \sum\limits_{n \ge 1} {\varepsilon \cdot 2^{ - n} } = \varepsilon , $$

and hence | |q|\(-\) |r| | \(< \varepsilon\) implies that \(1/2 < 1 - \varepsilon <\)|r| \(< 1 + \varepsilon\). Let \(r_{n}^{\prime } = r_{n} /\)|r| and \({\mathbf{r}}^{\prime}\)= \((r_{n}^{\prime } )_{n \ge 1} .\) Then \(\mathbf{r}^{\prime}\) \(\in BS(\)p\()\) and

$$ \begin{aligned} |{\mathbf{r}}^{\prime} - {\mathbf{q}}| & = |{\mathbf{r}}|^{ - 1} |{\mathbf{r}} - |{\mathbf{r}}| \cdot {\mathbf{q}}| \le 2|{\mathbf{r}} - |{\mathbf{r}}| \cdot {\mathbf{q}}| \\ \le 2(|{\mathbf{r}} - {\mathbf{q}}| + |1 - |{\mathbf{r}}|| \cdot |{\mathbf{q}}|) < 2(\varepsilon + \varepsilon \cdot 1) \\ = 4\varepsilon , \\ \end{aligned} $$

whence BS(p) is dense in S.

(iii) Let q\(= (q_{i} )_{i \ge 1} \in\) BS(p). We will find \({\mathbf{q}}^{\prime}\) \(= (q_{i}^{\prime } )_{i \ge 1} \in S -\)BS(p) such that |q\(-\) \({\mathbf{q}}^{\prime}\)| is arbitrarily small. Let \(0 < \varepsilon < q_{2}\), where we assume for now that \(q_{2} > 0\). Let \(n \ge 1\) be large enough so that \(\max \{ q_{i} , p_{i} /p_{1} \} < \varepsilon\) for all \(i \ge n\). First assume \(q_{n} /p_{n} - q_{1} /p_{1} > 0\) and let \(\delta = q_{n} - p_{n} q_{1} /p_{1} > 0.\) Let the sequence \({\mathbf{q}}^{\prime}\)=\((q_{i}^{\prime } )_{i \ge 1}\) be defined by \(q_{i}^{\prime } = q_{i}\) if \(i \ne n,n + 1,\) with \(q_{n}^{\prime } = q_{1} p_{n} /p_{1}\) and \(q_{n + 1}^{\prime } = q_{n + 1} + \delta .\) One may verify that \({\mathbf{q}}^{\prime}\) \(\in S \)BS(p). Then \(p_{n} q_{1} /p_{1} \le \varepsilon q_{1} < \varepsilon\) implies \(|q_{n} - q_{n}^{\prime } |\) \(< \varepsilon ,\) being the difference between two nonnegative real numbers less than \(\varepsilon\), and, further, \(|q_{n + 1} - q_{n + 1}^{\prime } |\) \(= \delta < \varepsilon .\) Thus we get |q \(-\) \({\mathbf{q}}^{\prime}\)|\(= |q_{n} - q_{n}^{\prime } | + |q_{n + 1} - q_{n + 1}^{\prime } |\) \(< 2\varepsilon\). Now assume \(q_{n} /p_{n} - q_{1} /p_{1}\) < \(0\). Let \(\rho = p_{n} q_{1} /p_{1} - q_{n} > 0\) and note that \(\rho \le p_{n} q_{1} /p_{1} \le \varepsilon q_{1} < q_{2}\). Define the \(q_{i}^{\prime }\) in this case by \(q_{i}^{\prime } = q_{i}\) if \(i \ne 2,n\), with \(q_{2}^{\prime } = q_{2} - \rho\) and \(q_{n}^{\prime } = p_{n} q_{1} /p_{1}\) (note that \(n > 2\) since \(\varepsilon < q_{2}\) by assumption). Then we have \(|q_{2} - q_{2}^{\prime } |\), \(|q_{n} - q_{n}^{\prime } |\) \(< \varepsilon\) and thus |q \(-\) \({\mathbf{q}}^{\prime}\)|< \(2\varepsilon\). On the other hand, if \(q_{2} = 0\), then q \(\in\) BS(p) implies \(q_{3} > 0\), and one may proceed similarly as before with \(q_{3}\) in place of \(q_{2}\), upon requiring \(0 < \varepsilon < q_{3}\). This implies in all cases that q cannot be an interior point of BS(p), whence BS(p) has an empty interior.\(\square\)

The preceding result may be extended as follows:

Theorem 3.3.

With respect to the \(l^{1}\)-norm topology on S, the blind spot BS(P), where P is a nonempty countable subset of S, (i) is of the second category, (ii) is dense in S, and (iii) has an empty interior.

Proof.

We make simple modifications to the preceding proof where required. To establish (i), note that the countable intersection of second category sets, each of whose complement is of first category, is of second category. For (ii), observe that the same proof applies when there are countably many distributions p, since for each \(n > 1\), one has an interval of potential values for \(r_{n}\) wherein at most a countably infinite number of values are excluded as possibilities. Property (iii) follows from the fact that the interior operator respects subset inclusions.\(\square\)

Discussion: Due to property (iii), Theorem 3.2 demonstrates that the countably infinite case of X differs fundamentally from the result of G&R in the finite case, where it was shown that BS(p) was an open dense subset of S. (For a more basic example of a subset T in a metric space Y such that T is dense in Y and of second category, yet has an empty interior, consider the subset of irrationals within the set of reals.) Further, it is seen that BS(p) is neither open nor closed when X is denumerably infinite, as both BS(p) and BS(p)c have empty interior.

Recall that, for each \(p \ge 1\), the \(l^{p}\)-norm of s \(= (s_{i} )_{i \ge 1}\) is defined by \(\left( {\sum\nolimits_{i \ge 1} {|s_{i} |^{p} } } \right)^{1/p}\), with the limiting case as \(p \to \infty\), denoted by \(l^{\infty }\), given by \(\max \{ |s_{i} |\) \(:i \ge 1\}\). Since the topology of the \(l^{p}\)-norm strictly refines that of the \(l^{q}\)-norm for \(1 \le p < q \le \infty ,\) with the \(l^{p}\)-metric complete for each \(p \ge 1\), Theorems 4.1 and 4.2 also hold for all of the \(l^{p}\)-norms on S. Further, the results of these theorems also hold for the topology induced by the complete metrics d(u,v)\(:\) =|u \(-\) v| /(1 +|u \(-\) v|), where |\(\cdot \cdot \cdot\)| denotes any \(l^{p}\)-norm. Note that since the topology on S induced by d is bounded, it is not equivalent to the topology induced by any \(l^{p}\)-norm.

Finally, Theorem 3.2 can be generalized in another way, as follows: For all \(l \ge 1\), let \(S_{l}\) be the subset of S consisting of those distributions q \(= (q_{i} )_{i \ge 1}\) in which there are at least l pairs \((n,m)\), where \(1 \le n < m\) such that \(q_{n} /p_{n} = q_{m} /p_{m}\). Then, for all \(l \ge 1\), \(S_{l + 1} \subset S_{l}\), and it can be shown that \(S_{l}\) is dense in S (the proof of which we leave to interested readers). Clearly, \(S_{1}^{c} = BS(\) p\()\), and, for all \(l \ge 1\), \(S_{l + 1}^{c} \supset S_{l}^{c}\). Since \({S}_{l}^{c}\supseteq BS(\)p\()\), \(S_{l}^{c}\) is trivially dense in S, and of the second category. More significantly, since \(S_{l}\) is dense in S, each \(S_{l}^{c}\) has an empty interior. Note that, in enlarging \(BS(\)p\()\), \(S_{l}^{c}\) is implicitly enlarging the set consisting only of the trivial partition of \(X\) to include partitions of \(X\) with a limited number of non-singleton blocks.

Open questions: (1) Is \(\bigcap\nolimits_{l \ge 1} {S_{l} }\) dense in S? (2) To what degree can the results in Theorems 3.2 and 3.3 be extended beyond the class of topologies on S corresponding to \(l^{p}\)-norms? In particular, is it possible to find general sufficient (more desirably, necessary and sufficient) conditions on a complete metric d on S which would ensure that BS(p) is of second category in (S, d) for all p?

Theorem 3.4.

There exists a probability measure on S such that, for each strictly coherent p \(= (p_{i} )_{i \ge 1}\), \(BS(\)p\()\) has probability 1 with respect to this measure.

Proof.

Let \(X_{1}\) be the uniform random variable on the interval \([0,1)\). Define the random variables \(X_{i}\) for \(i > 1\) recursively by letting \(X_{i}\) be uniform on the interval \([0,1 - \sum\nolimits_{j = 1}^{i - 1} {x_{j} } )\), where \(X_{j} = x_{j}\) for \(1 \le j \le i - 1.\) Then \(X_{1} + X_{2} + \cdot \cdot \cdot \to 1\) almost surely, and we consider the set of possible outcomes \((x_{1} ,x_{2} ,...)\) wherein \(X_{i} = x_{i}\) for all i, which are synonymous with members of the set S. Note that the outcomes \((x_{1} ,x_{2} ,...)\) which are finitely nonzero have probability zero of occurring and hence the subset of all such members of S has measure zero.

Given integers \(1 \le a < b\), consider \(P(X_{a} = cX_{b} )\), where \(c = c_{a,b}\) is defined by \(c = p_{a} /p_{b}\). Then we have

$$ \begin{aligned} P(X_{a} = cX_{b} ) & = \int {\int {P(X_{a} } } = ct|\sum\limits_{i = 1}^{a - 1} {X_{i} } = s\;{\text{and}}\;X_{b} = t) \\ \quad \times \left( {{\text{Joint density of}}\sum\limits_{i = 1}^{a - 1} {X_{i} } \,{\text{and}}\,X_{b} \;{\text{evaluated}}\;{\text{at}}\;s\;{\text{and}}\;t} \right) dsdt. \\ \end{aligned} $$

Note that the probability \(P(X_{a} = ct|...)\) in the above integral is zero since for each s and t, it can be shown that \(X_{a}\) has a (conditional) density, whence \(P(X_{a} = cX_{b} ) = 0.\) Considering all pairs \((a,b)\) where \(a < b\) (a countable number of possibilities) implies that the probability that some \((x_{1} ,x_{2} ,...)\) is not in BS(p) is zero. Thus, BS(p) has probability 1 in this measure on S.\(\square\)

Discussion: Since BS(p)c has probability 0 with respect to the measure M defined on S in the preceding proof, then so does BS(P)c, where P consists of a countable number of distributions p. Hence, BS(P) has probability 1 with respect to M.

Note that there is apparently not a straightforward extension to the infinite case of the argument from G&R (2021) for X finite, which demonstrated, when \(|X|\) \(= n\), that the complement of BS(p) has measure zero with respect to the Lebesgue measure on \(\mathbb{R}^{n-1}\). This is due in part to the fact that there is apparently not an analogous measure on \(\mathbb{R}^{\infty }\) that permits a comparable analysis. It should be remarked that the proof of Theorem 3.4 can be applied when X is finite by setting \(X_{n} = 1 - \sum\nolimits_{i = 1}^{n - 1} {x_{i} }\), and terminating the recursive procedure at that point. Note, however, that the resulting measure on the set of probability distributions on X differs from that supplied by (normalized) Lebesgue measure on \(\mathbb{R}^{n-1}\).

Further, different measures on S for which BS(p) has probability 1 can be obtained by allowing the \(X_{i}\) to assume other continuous distributions on a finite interval. For example, one could let \(X_{1}\) be a \(\beta\) distribution on \([0,1)\) and then define the \(X_{i}\) for \(i > 1\) as appropriately scaled versions of the \(X_{1}\) distribution on intervals of decreasing length. Note that it is not a requirement that the \(X_{i}\) all have the same kind of distribution, provided they are continuous and \(\sum\nolimits_{i \ge 1} {X_{i} }\) converges to 1 almost surely.

Open question: Is it possible to find a general criterion for a probability measure on S to ensure that BS(p) has probability 1 for every strictly coherent p?

Theorem 4.2.

Suppose that \(X = \{ x_{1} ,x_{2} ,x_{3} ,x_{4} \}\), that p and r are strictly coherent probability distributions on X and E \(= \{ \{ x_{1} ,x_{2} \} ,\{ x_{3} ,x_{4} \} \}\), F \(= \{ \{ x_{1} ,x_{3} \} \{ x_{2} ,x_{4} \} \}\). Then there exists a (unique) strictly coherent probability distribution q on X such that q comes from p by JC on E, and r comes from q by JC on F if and only if

$$ \frac{{p_{1} p_{4} }}{{p_{2} p_{3} }} = \frac{{r_{1} r_{4} }}{{r_{2} r_{3} }}. $$
(1)

Proof.

Necessity was proved in the main body of the paper. Sufficiency. To determine a strictly coherent \(q = (q_{1} ,q_{2} ,q_{3} ,q_{4} )\) such that q comes from p by JC on E, it suffices, by (2.3) and Theorem 2.1(iii), to specify a value \(t = q_{1} + q_{2}\) such that \(0 < t < 1\) and

$$ \frac{{p_{1} }}{{q_{1} }} = \frac{{p_{2} }}{{q_{2} }}\quad {\text{and}}\quad \frac{{p_{3} }}{{q_{3} }} = \frac{{p_{4} }}{{q_{4} }}. $$
(2)

Of course, this alone will not imply that r comes from q by JC on F. But, as we show below, setting

$$ t: = \frac{1}{{1 + \frac{{p_{1} r_{2} r_{3} + r_{1} p_{2} r_{4} }}{{r_{1} r_{2} (p_{1} + p_{2} )}}}}, $$
(3)

along with (1) and (2), implies that

$$ \frac{{q_{1} }}{{r_{1} }} = \frac{{q_{3} }}{{r_{3} }}\quad {\text{and}}\quad \frac{{q_{2} }}{{r_{2} }} = \frac{{q_{4} }}{{r_{4} }}, $$
(4)

so that, again by Theorem 2.1(iii), r comes from q by JC on F. To prove (4), note that we are given two linear equations for \(q_{1} ,q_{2}\) in terms of the coordinates of p and r, and likewise for \(q_{3} ,q_{4} .\) Solving for \(q_{1} ,q_{2}\) explicitly then gives

$$ q_{1} + q_{2} = q_{1} + \frac{{p_{2} }}{{p_{1} }}q_{1} = t, $$

which implies

$$ q_{1} = \frac{{p_{1} }}{{p_{1} + p_{2} }}t\quad {\text{and}}\quad q_{2} = \frac{{p_{2} }}{{p_{1} + p_{2} }}t, $$
(5)

where t is given by (3). Similarly, we have

$$ q_{3} + q_{4} = q_{3} + \frac{{p_{4} }}{{p_{3} }}q_{3} = \frac{{p_{3} + p_{4} }}{{p_{3} }}q_{3} = 1 - t, $$

which implies

$$ q_{3} = \frac{{p_{3} }}{{p_{3} + p_{4} }}(1 - t)\quad {\text{and}}\quad q_{4} = \frac{{p_{4} }}{{p_{3} + p_{4} }}(1 - t). $$
(6)

So we need to show that the \(q_{i}\), for \(1 \le i \le 4\), given by formulas (5) and (6) satisfy both equations in (4). For the first, observe that, by (3),

$$ \frac{1 - t}{t} = \frac{1}{t} - 1 = \frac{{p_{1} r_{2} r_{3} + r_{1} p_{2} r_{4} }}{{r_{1} r_{2} (p_{1} + p_{2} )}}, $$

and thus

$$ \begin{aligned} \frac{{q_{3} }}{{q_{1} }} = & \frac{{\frac{{p_{3} }}{{p_{3} + p_{4} }}(1 - t)}}{{\frac{{p_{1} }}{{p_{1} + p_{2} }}t}} = \frac{{p_{3} (p_{1} + p_{2} )}}{{p_{1} (p_{3} + p_{4} )}} \cdot \frac{{1 - t}}{t} = \frac{{p_{3} (p_{1} r_{2} r_{3} + r_{1} p_{2} r_{4} )}}{{p_{1} r_{1} r_{2} (p_{3} + p_{4} )}} \\ = & \frac{{p_{1} r_{2} p_{3} r_{3} + r_{1} p_{2} p_{3} r_{4} }}{{p_{1} r_{1} r_{2} (p_{3} + p_{4} )}} = \frac{{p_{1} r_{2} p_{3} r_{3} + p_{1} r_{2} r_{3} p_{4} }}{{p_{1} r_{1} r_{2} (p_{3} + p_{4} )}} = \frac{{r_{3} }}{{r_{1} }}, \\ \end{aligned} $$
(7)

as desired, where in the penultimate equality we used (1). The uniqueness of q follows from the uniqueness of t. [Since the function \(t \mapsto 1/t - 1\) is injective on \((0,1)\), any t different from that given by formula (3) would, in light of formula (1), fail to yield the first equation in (4).] For the second equation in (4), note that

$$ \frac{{q_{4} }}{{q_{2} }} = \frac{{\frac{{p_{4} }}{{p_{3} }}q_{3} }}{{\frac{{p_{2} }}{{p_{1} }}q_{1} }} = \frac{{p_{1} p_{4} }}{{p_{2} p_{3} }} \cdot \frac{{q_{3} }}{{q_{1} }} = \frac{{p_{1} r_{3} p_{4} }}{{r_{1} p_{2} p_{3} }} = \frac{{r_{4} }}{{r_{2} }}, $$

by (2), (7), and (1).\(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shattuck, M., Wagner, C. A Further Look at the Bayes Blind Spot. Erkenn (2024). https://doi.org/10.1007/s10670-023-00770-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10670-023-00770-8

Navigation