## Abstract

It has been argued that if the rigidity condition is satisfied, a rational agent operating with uncertain evidence should update her subjective probabilities by Jeffrey conditionalization (JC) or else a series of bets resulting in a sure loss could be made against her (the Dynamic Dutch Book Argument). We show, however, that even if the rigidity condition is satisfied, it is not always safe to update probability distributions by JC because there exist such sequences of non-misleading uncertain observations where it may be foreseen that an agent who updates her subjective probabilities by JC will end up nearly certain that a false hypothesis is true. We analyze the features of JC that lead to this problem, specify the conditions in which it arises and respond to potential objections.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
When we say that the evidence is non-misleading, we mean that when an agent becomes more certain of some evidence

*E*than its negation \(\lnot E\),*E*is actually the case. - 2.
Although an agent cannot know whether some evidence is misleading or not in the described sense, a well-performing update rule should at least not lead to problematic outcomes in the latter cases.

- 3.
She is, after all, named after Richard Jef

*frey*. - 4.
The rigidity condition is satisfied when \({\text {Pr}}^*(H|E_k)={\text {Pr}}(H|E_k)\) for all

*k*, where \({\text {Pr}}^*(\cdot )\) represents the posterior and \({\text {Pr}}(\cdot )\) the prior probability function (Jeffrey 1983, 174). - 5.
If Freya would not always inspect different parts of her sample, she would only update her subjective probability distribution on the first inspection because her subsequent observations would contain no new evidence. I am thankful to an anonymous referee for raising this point.

- 6.
Freya’s priors are largely irrelevant for her subsequent high probability of the false \(H_A\), as we show below (Theorem 1).

- 7.
This does not mean that the observations are probabilistically independent. Rather, the conditional probability of \(E_n\) given each hypothesis remains fixed throughout the process. If the reader finds such a conditional independence assumption unrealistic, then it should be noted that the example may be rephrased into one with a series of biased coins or dice throws instead of microbiological samples of different strains (for similar examples see, e.g., Douven 2013; Trpin and Pellert 2018). This conditional independence is important as \({\text {Pr}}(E_n|H_i)\) is one of the key parameters in conditionalization.

- 8.
\({\text {Pr}}_n(\cdot )\) represents the prior probability of some proposition before the

*n*th update and \({\text {Pr}}^*_n(\cdot )\) the posterior probability after the*n*th update. We represent the likelihood of evidence being present in the*n*th part of the sample given some hypothesis by \({\text {Pr}}(E_n|H_i)\) for all*i*,*n*. Note that, given any hypothesis*i*, this likelihood is constant for all*n*because the presence of*E*in any part is conditionally independent of its presence in the other parts given the hypothesis under consideration. Learning about the presence of*E*in the*n*th part of the sample does therefore not affect*the conditional probability*of*E*being present in the \((n+1)\)th part of the sample given any hypothesis. The set \(\{H_i\}\), where*i*is either*A*or*B*, is a set of 2 mutually exclusive and jointly exhaustive hypotheses corresponding to the two strains. JC is more generally defined as \({\text {Pr}}^*(H)=\sum {\text {Pr}}(H|E_i){\text {Pr}}^{*}(E_i)\), \(E_i \in E\), where*E*is a partition with non-zero probabilities. - 9.
The rigidity condition is also satisfied. Schwan and Stern (2017) provide a convincing Causal Updating Norm (CUN) according to which rigidity is satisfied when

*D*(a dummy variable representing the ineffable learning experience) and any arbitrary*A*are d-separated by an initial partition of propositions*B*(Schwan and Stern 2017, 11). The causal network in Freya’s example can be represented as \(S \rightarrow E \rightarrow D\), where*S*is a variable representing the two strains which causes \(E_n\), the presence or absence of characteristic*E*in the*n*th part, which in turn, causes*D*, the learning experience, because it affects whether Freya observes the characteristic in such a way that she is less than fully certain about it. CUN (and, therefore, rigidity) is satisfied because*S*is d-separated from*D*by*E*. - 10.
JC would also not lead to a problematic outcome if the largest likelihood of evidence was less than 1, given that the posterior probability of evidence would be equal to it or greater; e.g., if she was inspecting strain

*B*and \({\text {Pr}}(E_n|H_A)=0.7; {\text {Pr}}(E_n|H_B)=0.8\) and \({\text {Pr}}^*_n(E_n)=0.9\) for all*n*. In this case, \({\text {Pr}}_n(H_B)\) would (accurately) converge toward 1 with increasing*n*. Note that Theorem 1 does not present a problem for standard Bayesian conditionalization because the posterior probability of evidence (i.e., \({\text {Pr}}_n^*(E_n)=1\)) is always greater than or equal to the largest likelihood (see also Fig. 1). - 11.
Particularly, it takes her 20 observations of

*E*with 0.9 certainty to assign the highest probability of all to the exceptional strain*A*hypothesis, 27 observations to assign it a probability above 0.5, and 177 observations to assign it a very high probability (i.e., above 0.9). - 12.
We calculate the prior probability of evidence by the law of total probability: \({\text {Pr}}_n(E_n)=\sum _{i=1}^m{\text {Pr}}(H_i){\text {Pr}}(E_n|H_i)\), where

*m*is the number of hypotheses. - 13.
As we show below, the results are independent of the priors. That is, if Freya’s prior probabilities for the strains were different, it would merely take her a different number of updates by JC to assign an arbitrarily high probability to a false hypothesis.

- 14.
This is because Freya’s posterior probabilities of

*E*shift in a less ordered fashion, so \(H_D\) does not always provide the correct prediction. \({\text {Pr}}(H_D)\) thus also decreases but less than other hypotheses (by Theorem 3). Such uniformly random levels of evidential uncertainty, however, typically require that Freya inspects the sample 7000 times before her probability of the false \(H_D\) reaches a very high level of 0.99. In 1000 simulations of this scenario, she on average needed to inspect the sample 6853 times \((\sigma = 106)\) before \({\text {Pr}}^*(H_D)>0.99\). - 15.
The value of

*k*cannot be 0 or 1 because*k*is the mean of different (i.e., shifting) values. - 16.
Proof omitted, although the reasoning is sketched in the previous paragraph. Note that the posterior probability of evidence needs to shift often enough.

- 17.
Equation 8 implies that standard Bayesian conditionalization will never lead an agent astray because it simplifies to \({{\text {Pr}}(E_n|H_i)= {\text {Pr}}(E_n|H_j)}\) when \({{\text {Pr}}_m^*(E_n)=1}\) for all

*m*and \({\text {Pr}}(E_m|H_j)\) is constant for all*m*. In other words, the hypothesis to which the agent who conditions on \(E_n\) ascribes a higher probability, \(H_i\), is the true hypothesis \(H_j\). - 18.
Thanks to an anonymous referee for this journal and an anonymous referee for the Formal Epistemology Workshop 2019 for bringing these objections to my attention.

- 19.
“Binary”

*E*here represents a simple partition into*E*and \(\lnot E\), the only kind of evidential partitions we deal with in this paper. - 20.
\({\text {Pr}}(red)={\text {Pr}}(\text {2R4B}){\text {Pr}}(red|\text {2R4B}) + {\text {Pr}}(\text {6B}){\text {Pr}}(red|\text {6B}) = 0.9 \times \frac{1}{3}+0.1\times 0=0.3={\text {Pr}}^*(red)\)

- 21.
We are assuming that agents only become fully certain if something is actually the case. See, however, Rescorla (2019) for an interesting discussion of non-factive aspects of conditionalisation.

- 22.
The term “significant evidence” here simply means substantial or important evidence. It is in no way related to statistical significance or low

*p*values. - 23.
Thanks to an anonymous referee for this journal for pointing out this assumption.

- 24.
Suppose \({\text {Pr}}(H|E)={\text {Pr}}^*(H|E)\) and \({\text {Pr}}(H|\lnot E)={\text {Pr}}^*(H|\lnot E)\), i.e. the rigidity condition. It is then trivial to show that \({\text {Pr}}^*(H)={\text {Pr}}(H|E){\text {Pr}}^*(E)+{\text {Pr}}(H|\lnot E){\text {Pr}}^*(\lnot E)\), i.e. the rule of Jeffrey conditionalization for binary

*E*(e.g. Jeffrey 1983, 169).

## References

Armendt, B. (1980). Is there a Dutch book argument for probability kinematics?

*Philosophy of Science*,*47*(4), 583–588.Brier, G. W. (1950). Verification of forecasts expressed in terms of probability.

*Monthly Weather Review*,*78*(1), 1–3.Climenhaga, N. (2017). Inference to the best explanation made incoherent.

*Journal of Philosophy*,*114*(5), 251–273.Douven, I. (2013). Inference to the best explanation, Dutch books, and inaccuracy minimisation.

*Philosophical Quarterly*,*63*(252), 428–444.Jeffrey, R. (1983).

*The logic of decision*. Chicago: University of Chicago Press.Jeffrey, R. (1992).

*Probability and the art of judgment*. Cambridge: Cambridge University Press.Rescorla, M. (2019). On the proper formulation of conditionalization.

*Synthese*,. https://doi.org/10.1007/s11229-019-02179-9.Schwan, B., & Stern, R. (2017). A causal understanding of when and when not to Jeffrey conditionalize.

*Philosophers’ Imprint*,*17*(8), 1–21.Skyrms, B. (1987). Dynamic coherence and probability kinematics.

*Philosophy of Science*,*54*(1), 1–20.Trpin, B., & Pellert, M. (2018). Inference to the best explanation in uncertain evidential situations.

*The British Journal for the Philosophy of Science*,. https://doi.org/10.1093/bjps/axy027.

## Acknowledgements

Thanks to Kristijan Armeni, Riccardo Baratella, Mariangela Zoe Cocchiaro, Anton Donchev, Branden Fitelson, Mario Günther, Ben Levinstein, Max Pellert, Vlasta Sikimić, Reuben Stern, anonymous reviewers, and especially Jan Sprenger for their comments and suggestions on earlier versions of this paper. I also want to thank for the feedback from the audiences of the 10th Arché Graduate Conference in St Andrews, UK (2017), the Workshop in Philosophy of Science at the University of Turin, Italy (2018), EENPS 2018, Bratislava, Slovakia, and Formal Epistemology Workshop in Turin, Italy (2019), where I presented the paper in its various stages. Finally, I want to thank Robbie Hopper for her help with proof-reading. I am also grateful that the research was (in parts) supported by Ernst Mach Grant, Ernst Mach Worldwide (ICM-2018-10093) and by Alexander von Humboldt Foundation.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendix: Proofs

### Appendix: Proofs

### Derivation of inequality 1

First note that \({\text {RelFact}}_{n,i}>1\) is equivalent to:

where \(0<{\text {Pr}}_n(E_n)<1\). Because \({\text {Pr}}_n(E_n){\text {Pr}}_n(\lnot E_n)>0\), we derive:

After substituting \({\text {Pr}}(\lnot E_n|H_i)\) with \(1-{\text {Pr}}(E_n|H_i)\) and \({\text {Pr}}_n(\lnot E_n)\) with \(1-{\text {Pr}}_n(E_n)\) and rearranging the Inequality 10, we obtain:

It is trivial to see that Inequality 11 is an expanded form of Inequality 1. This concludes our derivation.

Note that by following the same procedure, we also derive that \({\text {RelFact}}_{n,i}<1\) is equivalent to:

Similarly, \({\text {RelFact}}_{n,i}=1\) is equivalent to:

### Proof of Theorem 1

**Theorem 1**
*If the posterior probability of evidence is less (greater) than or equal to the lowest (greatest) likelihood of evidence according to some hypothesis, then JC prescribes an increase in the probability of the hypothesis according to which the likelihood of evidence is the lowest (the highest).*

### Proof

Suppose that the likelihood of evidence is the lowest according to hypothesis \(H_k\) and the highest according to \(H_q\). We first observe the following simple consequence of the law of total probability (assuming no hypothesis is certain):

Hence, if the posterior probability of evidence is less than or equal to the lowest likelihood of evidence, then the posterior probability of evidence is also less than the prior probability of evidence:

Recall that, for any *i*, the probability of \(H_i\) increases if the following condition is satisfied:

It then follows from Condition 3 and Inequality 15 that \({{\text {Pr}}^*_n(H_k)>{\text {Pr}}_n(H_k)}\) when \({{\text {Pr}}_n^*(E_n)\le {\text {Pr}}(E_n|H_k)}\).

Similarly, if the posterior probability of evidence is greater than or equal to the highest likelihood of evidence, it is also greater than the prior probability of evidence:

Recall that, for any *i*, the probability of \(H_i\) increases if the following condition is satisfied:

It then follows from Condition 2 and Inequalitiy 16 that \({\text {Pr}}_n^*(H_q)>{\text {Pr}}_n(H_q)\) when \({\text {Pr}}_n^*(E_n)\ge {\text {Pr}}(E_n|H_q)\). This concludes our proof of Theorem 1. \(\square\)

### Proof of Theorem 2

**Theorem 2** *If the hypothesis space consists of two mutually exclusive and jointly exhaustive hypotheses and the posterior probability of different pieces of evidence with the same likelihood is constant and greater than their likelihood according to one hypothesis but less than according to the other hypothesis, then, if no hypothesis is certain, JC prescribes the agent to update in such a way that the probability of the first hypothesis converges toward* \(b/(a+b)\)*and the probability of the second toward* \(a/(a+b)\), *where* *a* *and* *b* *are the absolute difference between the agent’s posterior probability of the pieces of evidence and their likelihood according to the first and the second hypothesis, respectively.*

### Proof

Suppose that the posterior probability of the *n*th piece of evidence is greater than the likelihood of evidence according to \(H_k\) and less than the likelihood of evidence according to \(H_q\):

Note that we can represent the likelihood \({\text {Pr}}(E_n|H_k)\) as \({\text {Pr}}_n^*(E_n)-a, a>0\) and the likelihood \({\text {Pr}}(E_n|H_q)\) as \({\text {Pr}}_n^*(E_n)+b, b>0\). Additionally, note that \({\text {Pr}}_n(H_q)=1-{\text {Pr}}_n(H_k)\). Finally, note that because \(H_k\) and \(H_q\) are mutually exclusive and jointly exhaustive, \({\text {Pr}}_n(E_n)={\text {Pr}}(E_n|H_k){\text {Pr}}_n(H_k) + {\text {Pr}}(E_n|H_q) {\text {Pr}}_n(H_q)\).

Suppose \({\text {Pr}}_n^*(E_n)<{\text {Pr}}_n(E_n)\). Hence:

Because \(a+b>0\), it follows that:

We have now shown that \({\text {Pr}}(E_n|H_k)<{\text {Pr}}_n^*(E_n)<{\text {Pr}}_n(E_n)\) implies that \({\text {Pr}}_n(H_k)<b/(a+b)\). Recall, again, that by Condition 3, \({\text {Pr}}(E_n|H_i)<{\text {Pr}}_n(E_n)\) and \({\text {Pr}}_n^*(E_n)<{\text {Pr}}_n(E_n)\) imply \({\text {Pr}}_n^*(H_i)>{\text {Pr}}_n(H_i)\) for any *i*. We thus conclude that \({\text {Pr}}(E_n|H_k)<{\text {Pr}}_n^*(E_n)<{\text {Pr}}_n(E_n)\) also implies \({\text {Pr}}_n^*(H_k)>{\text {Pr}}_n(H_k)\). The probability of \(H_k\), therefore, increases if it is less than \(b/(a+b)\).

It is trivial to see that \({\text {Pr}}(E_n|H_k)<{\text {Pr}}_n(E_n)<{\text {Pr}}_n^*(E_n)\), on the other hand, implies \({\text {Pr}}_n(H_k)>b/(a+b)\). Recall that by Condition 4, \({\text {Pr}}(E_n|H_i)<{\text {Pr}}_n(E_n)\) and \({\text {Pr}}^*_n(E_n)>{\text {Pr}}_n(E_n)\) imply \({\text {Pr}}_n^*(H_i)<{\text {Pr}}_n(H_i)\) for any *i*. We conclude that \({\text {Pr}}(E_n|H_k)<{\text {Pr}}_n(E_n)<{\text {Pr}}_n^*(E_n)\) also implies \({\text {Pr}}_n^*(H_k)<{\text {Pr}}_n(H_k)\). The probability of \(H_k\), therefore, decreases if it is greater than \(b/(a+b)\). Because we already know that the probability of \(H_k\) increases if it is less than \(b/(a+b)\), we conclude that it converges toward \(b/(a+b)\).

To finish our proof we need to show that the probability of \(H_q\) converges toward \(a/(a+b)\). Because \({\text {Pr}}_n(H_q) = 1- {\text {Pr}}_n(H_k)\), it immediately follows that the probability of \(H_q\) converges toward \(1-(b/(a+b))=a/(a+b)\). Because *a* and *b* are constant, this concludes our proof of Theorem 2. \(\square\)

### Proof of Theorem 3

**Theorem 3**
*JC prescribes that, if any updating takes place, the probability of the hypothesis according to which the likelihood of evidence is closer to the prior probability of evidence changes less, relative to its prior probability, than other hypotheses.*

### Proof

We need to inspect the absolute difference \(|{\text {RelFact}}_{n,i} - 1|\) to see how much \({\text {Pr}}^*_n(H_i)\) changes in relation to its prior probability \({\text {Pr}}_n(H_i)\) (regardless of direction). We first obtain:

We can now compare which of the two hypotheses (\(H_i\) or \(H_j)\) relatively changes more in the same round. By applying the rules of absolute products and absolute quotients, we obtain that \(|{\text {RelFact}}_{n,i}-1|<|{\text {RelFact}}_{n,j}-1|\) is equivalent to:

This means that the posterior probability of the hypothesis according to which the likelihood of evidence is closer to the prior probability of evidence will change less relative to its prior probability. This concludes our proof. \(\square\)

### Proof of Theorem 4

**Theorem 4** *JC prescribes such updates that, if any updating takes place, the prior probability of the next piece of evidence* \(E_{n+1}\)*shifts toward the posterior probability of the previous piece of evidence* \(E_{n}\).

### Proof

We know that no updating takes place if \({\text {Pr}}_n^*(E_n)={\text {Pr}}_n(E_n)\) (see Eq. 13). Hence, we need to show that when \({\text {Pr}}_n^*(E_n)\) is greater (less) than \({\text {Pr}}_n(E_n)\), \({\text {Pr}}_{n+1}(E_{n+1})\) increases (decreases) at most to the level of \({\text {Pr}}_n^*(E_n)\):

Let us start by proving that (22) holds. Assume \({\text {Pr}}_n(E_n)<{\text {Pr}}_n^*(E_n)\). We will first show that it follows that \({\text {Pr}}_n(E_n)<{\text {Pr}}_{n+1}(E_{n+1})\).

We know that when the posterior probability of evidence is greater than the prior probability of evidence (our assumption), the posterior probability of hypotheses according to which the evidence is more likely than its prior probability increases (Inequality 1) and, similarly, the posterior probability of hypotheses according to which the evidence is less likely than its prior probability decreases (Inequality 12). Suppose the hypotheses are ordered in accordance to the likelihood of evidence with the hypothesis with the lowest likelihood being \(H_1\) and the hypothesis with the highest likelihood \(H_m\). Further, suppose the likelihood of evidence according to hypotheses \(H_1\) to \(H_{n-1}\) is less than its prior probability in round *n*, equal to the prior probability of evidence according to \(H_n\), and greater than the prior probability of evidence for hypotheses \(H_{n+1}\) to \(H_m\). Recall also that for all *i*, *n*, \({\text {Pr}}(E_{n+1}|H_i)={\text {Pr}}(E_n|H_i)\) (the assumption of independence of various pieces of evidence). Hence, the prior probability of the next piece of evidence (\(E_{n+1}\)) in round \(n+1\) updates to:

where \(a, \dots , b, c, \dots , d\) are all greater than 0. We can simplify (24) to:

We can now show that \({\text {Pr}}_{n+1}(E_{n+1})>{\text {Pr}}_n(E_n)\). Assume the following inequality for reductio ad absurdum:

Hence (from Eq. 25 and Inequality RA1 after canceling out \({\text {Pr}}_n(E_n)\)):

We know that hypotheses are jointly exhaustive, hence the changes in their probability sum to 0. That is: \(-a-\cdots -b+c+\cdots +d=0\). We then substitute *c* with \(a+\cdots +b-\cdots -d\). After some rearrangements of Inequality 26, we obtain:

All terms are positive because \(a,\ldots ,b,\ldots ,d\) are positive and \({\text {Pr}}(E_n|H_{n+1})>{\text {Pr}}(E_n|H_{i})\) for all \(i, i<n+1\), and \({\text {Pr}}(E_n|H_{n+1})<{\text {Pr}}(E_n|H_{i})\) for all \(i, i>n+1\). Inequality 27, and hence our reductio assumption (RA1), must therefore be false. We conclude our reductio with \({\text {Pr}}_{n}(E_n)<{\text {Pr}}_{n+1}(E_{n+1})\). We have now shown the first part of the Inequality 22.

Note that we can also show the first part of Inequality 23, i.e., \({\text {Pr}}_n(E_n)>{\text {Pr}}_{n+1}(E_{n+1})\) given \({\text {Pr}}_n^*(E_n)<{\text {Pr}}_n(E_n)\), by following the same method with different assumptions (proof omitted).

To prove that Inequality 22 holds, we need to also show that, given \({\text {Pr}}_n(E_n)<{\text {Pr}}_n^*(E_n)\), the following inequality holds:

To show Inequality 28, first note that because for all *i*, \({\text {Pr}}_{n+1}(H_i)={\text {Pr}}_n^*(H_i)\), \({\text {Pr}}_{n+1}(E_{n+1})\) may be calculated in the following way by the law of total probability:

where *m* is the number of hypotheses. Recall the Jeffrey conditionalization rule for binary partitions of evidence. For all *i*, \({\text {Pr}}_n^*(H_i)\) updates to:

Hence:

Recall that, for all *i*, \({\text {Pr}}(\lnot E_n|H_i) = 1-{\text {Pr}}(E_n|H_i)\), and that \({\text {Pr}}_n(E_n)=\sum _{i=1}^m{\text {Pr}}_n(H_i) {\text {Pr}}(E_n|H_i)\). After some algebra we then obtain:

Assume the following inequality for reductio ad absurdum:

After multiplying both sides of Inequality (RA2) by \({\text {Pr}}_n(E_n) {\text {Pr}}_n(\lnot E_n)\), replacing all \({\text {Pr}}_n^*(\lnot E_n)\) with \(1-{\text {Pr}}_n^*(E_n)\) and \({\text {Pr}}_n(\lnot E_n)\) with \(1-{\text {Pr}}_n(E_n)\), and a few lines of algebra, we then obtain:

We know that the left term is negative (recall our initial assumption \({\text {Pr}}_n^*(E_n)>{\text {Pr}}_n(E_n)\)). We can also show that the second term is non-negative:

Hence, Inequality 32 cannot be true. We thus conclude that our reductio assumption (RA2) is false, so \({\text {Pr}}_{n+1}(E_{n+1})\le {\text {Pr}}_n^*(E_n)\) is true. Inequality 22, therefore, holds.

Note that we can also prove \({\text {Pr}}_{n+1}(E_{n+1})>{\text {Pr}}_n^*(E_n)\) given \({\text {Pr}}_n(E_n)\ge {\text {Pr}}_n^*(E_n)\) by following the same method (proof omitted). Inequality 23, therefore, also holds. This concludes our proof of Theorem 4.

\(\square\)

## Rights and permissions

## About this article

### Cite this article

Trpin, B. Jeffrey conditionalization: proceed with caution.
*Philos Stud* **177, **2985–3012 (2020). https://doi.org/10.1007/s11098-019-01356-3

Published:

Issue Date:

### Keywords

- Jeffrey conditionalization
- Belief updating
- Uncertain evidence
- Epistemic inaccuracy
- Formal epistemology