Tacking by conjunction, genuine confirmation and convergence to certainty

Tacking by conjunction is a well-known problem for Bayesian confirmation theory. In the first section, disadvantages of existing Bayesian solution proposals to this problem are pointed out and an alternative solution proposal is presented: that of genuine confirmation (GC). In the second section, the notion of GC is briefly recapitulated and three versions of GC are distinguished: full (qualitative) GC, partial (qualitative) GC and quantitative GC. In the third section, the application of partial GC to pure post-facto speculations is explained. In the fourth section it is demonstrated that full GC is a necessary condition for Bayesian convergence to certainty based on the accumulation of conditionally independent pieces of evidence. It is found that whenever a hypothesis is equivalent to a disjunction of more fine-grained hypotheses conveying different probabilities to the evidence, then conditional independence of the evidence fails. This failure occurs typically for unspecific negations of hypotheses. A refined version of the convergence to certainty theorem that overcomes this difficulty is developed in the final section.

(3). The diminished-confirmation proposal is measure-sensitive in the sense that it holds only for some of the prominent Bayesian confirmation measures, but is violated for others (cf. Schippers & Schurz, 2020, see observations 5, 8 and 10).
The idea of genuine confirmation is based on the following observation: E increases the probability of E∧X only because E is a content element of E∧X and increases its own probability to 1 (P(E|E) = 1), but E does not increase the probability of the content element X that logically transcends E, in the sense that X is not entailed by E. Gemes and Earman (Earman, 1992, 98n5) have called this type of pseudo-confirmation "confirmation by (mere) content-cutting". To avoid this problem one has to require that for E to count as genuine confirmation of a hypothesis, E has to confirm those content parts of the hypothesis that transcend the evidence (Schurz, 2014a).

Genuine confirmation: Definition and applications
The notion of genuine confirmation is based on the notions of a content element and content part. A definition of this notion for predicate languages has been given in Schurz (2014a, def. (5)) and Schippers and Schurz (2017, def. 4.2) as follows: Definition 1 1.1 C is a content element of (hypothesis) H iff (i). H logically entails C (H |== C), (ii). C is elementary in the sense that C is not L(ogically) equivalent with a conjunction C 1 ∧C 2 of conjuncts both of which are shorter than C, and (iii). no predicate or subformula in C is replaceable (on some of its occurrences) by an arbitrary new predicate or subformula, salva validitate of the entailment of C by H.

A content part of H is a non-redundant conjunction of content elements of H.
The following properties of content elements are important: (1). The set of content elements CE(H) of a hypothesis H preserves H's logical content: CE(H) ==| |== H. (2). The shortness criterion in def. 1(ii) is related to the concept of minimal description length in machine learning (Grünwald, 2000). The criterion is relativized to a language with ¬, ∧, ∨,∃ and ∀ as logical primitives; defined symbols are eliminated. 2 (3). Condition (iii) of def. 1 excludes irrelevant disjunctive weakenings (p |== p∨q) as content elements. For example, every hypothesis H confirmed by E has H∨¬E as an (irrelevant) logical consequence that is disconfirmed by E; but this consequence doesn't count as a content part, because "¬E" is replaceable in "H |== H∨¬E" salva validitate by any other formula. Therefore, (H∨E)∧(H∨¬E) is not an admissible conjunctive decomposition of H. This avoids the Popper and Miller (1983) objection to inductive confirmation (Schurz, 2014b, 320 Other technical definitions of content elements are possible -examples are Friedman's (1974) "independently acceptable elements", Gemes' (1993) "content parts" and Fine's (2017) "verifiers". The technical details don't matter as long as the core idea is captured, namely the decomposition of a hypothesis into a set of smallest content elements.
The notion of genuine confirmation, henceforth abbreviated as GC, has been explicated in three versions: full (qualitative) GC, partial (qualitative) GC and quantitative GC (cf. Schippers & Schurz, 2020): Definition 2 Assume E does not entail H. 3 Then: 2.1 Full (qualitative) GC: E fully genuinely confirms H iff (i) P(C|E) > P(C) holds for all E-transcending content parts C of H.
2.2 Partial (qualitative) GC: E partially genuinely confirms H iff (i) E fully genuinely confirms at least some E-transcending content part of H, and (ii) E does not diminish the probability of any content part of H.
2.3 Quantitative GC: The degree of GC that E provides for H is the sum of the confirmation degrees, conf(E,C), over all pairwise non-equivalent E-transcending content parts C of H, divided by their number (where "conf(E,H)" is one of the standard Bayesian confirmation measures).
Note that the notion of genuine confirmation has to be formulated not merely in terms of content elements, but in terms of content parts, as it may be that H = H 1 ∧H 2 and the evidence E confirms both H 1 and H 2 , but disconfirms the conjunction H 1 ∧H 2 .
In Schippers and Schurz (2020) it is shown that genuine confirmation has a number of attractive features that will be briefly summarized. Two further important applications of this notion, the handling of post facto speculations and in particular the enabling of Bayesian convergence to certainty, are elaborated in the remainder of this paper.
Applications of quantitative GC Quantitative GC solves the problem of measure sensitivity mentioned above. Schippers and Schurz (2020) demonstrate that the diminishing effect on the genuine confirmation of hypotheses containing irrelevant conjuncts holds for the quantitative GC-measures obtained with all pertinent quantitative Bayesian confirmation measures (ibid., see observations 6, 9 and 11). Also, note that partial GC implies positive quantitative GC; thus the qualitative and quantitative notion of GC are in coherence.
Applications of partial (qualitative) GC Partial (qualitative) genuine confirmation rules out the special case of tacking by conjunction in which the irrelevant hypothesis X is directly tacked on the evidence. This includes the important subcase of Goodman-type counter-inductive generalizations. To make this precise, let "Ox" stand for "x is observed", "Ex" for "x is an emerald", "Gx" for "x is green" and "Bx" for "x is blue". So the evidence E is L(ogically) equivalent with ∀x(Ex∧Ox→Gx). Now, let "I" stand for the inductive projection of E, "if x is an unobserved emerald, it is green" (Ex∧¬Ox→Gx) and "CI" for the respective counter-inductive projection, "if x is an unobserved emerald, then it is blue" (Ex∧¬Ox→Bx). So the inductive generalization of the evidence E is L-equivalent with E∧I and the counter-inductive (Goodman-type) generalization of E is L-equivalent with E∧CI. This representation of the inductive and counter-inductive generalization of E makes it clear why they can be subsumed under the special case of tacking by-conjunction. Although the probability of both hypotheses is raised by E, none of the two is genuinely confirmed by E, even not partially, because E neither confirms I nor CI, as long as the probability measure P does not satisfy special inductive principles that go beyond the basic probability axioms. On the other hand, if P satisfies additional inductive principles, such as de Finetti's principle of exchangeability (invariance of P under permutation of individual constants), then P(I|E) > P(I) and P(CI|E) < P(CI) holds, thus E genuinely confirms E∧I (since E confirms I) and genuinely disconfirms E∧CI (since E disconfirms CI).
For partial GC is it not only required that the hypotheses H must have an Etranscending content part C whose probability is raised by E, but stronger, that C must be fully confirmed by E (see definition 2.2). 4 The idea underlying this requirement is that H must have at least one E-transcending content part that can be sustainably confirmed in the sense of enabling convergence to certainty (section 4), and this requires the full GC of this content part. An important application of partial GC that would not be possible without this requirement is the treatment of so-called 'pure postfacto speculations' explained in section 3.
Applications of full (qualitative) GC Full GC seems to be a rather strong condition. Nevertheless it turns out that this condition has important applications. Full GC guarantees that the probability increase that E conveys to H spreads to all content parts of H. This is neither true for ordinary Bayesian confirmation nor for merely partial GC. For practical applications to scientific knowledge this feature is of obvious importance. Scientists base their predictions on those hypotheses that are well confirmed, which means that they infer relevant consequences, i.e. content parts, from them, thereby assuming that these consequences are themselves well confirmed. This assumption presupposes that confirmation is closed under content parts.
The most important application of full GC is its function as a necessary condition for Bayesian convergence to certainty. This application is worked out in sections 4-5.

Partial GC and pure post-facto speculations
In this section we explain why pure post-facto speculations are not genuinely confirmed by the evidence towards which they are fitted. In the simplest case, a pure post-facto speculation explains an evidence E in hindsight by a postulated unobservable 'power' (ψ) that produced it. More generally speaking, a hypothesis H is a pure post-facto speculation in regard to a given body of evidence E, iff (i) H contains theoretical concepts ψ that are not present in the evidence E, and (ii) whose values have been obtained by fitting the values of a theoretical variable X (of an underlying unfitted background hypothesis H unfit ) towards E so that H implies E, but (iii) H unfit could have been equally fitted to any other possible evidence of the same type. Condition (iii) is necessary for calling H a pure post-facto speculation; if H unfit cannot be fitted to some possible alternative evidence E*, then it contains at least some independent (use-novel) empirical content. In this case H is a post-facto speculation that is not a 'pure' one, but can be partially genuinely confirmed.
To illustrate how a pure post-facto speculation works, consider the pseudoexplanation of the following fact.
(E) there is an economic recession by the hypothesis (H) God wants that there is an economic recession and whatever God wants, where "W(X)" stands for "God wants that X" and X is a (logical) second-order variable ranging over propositions. Since H entails E and P(H∧E) > 0, E confirms H in the orthodox Bayesian sense. Of course, there are not only religious kinds but many other kinds of pure post-facto speculations, for example 'conspiracy' theories based on various sorts of hidden super-powerful intentional agents.
In the hypothesis H, God's wish-that-E is the theoretical (latent) parameter that has been obtained by fitting the theoretical variable X (God's wishes) post-facto towards the evidence E, the economic recession. The unfitted background hypothesis in this example is the hypothesis.
(H unfit ) There is some X that God wants, and whatever God wants, happens, formalized as ∃XW(X)∧∀X(W(X) → X).
Thus we propose that H unfit is generated from H by existentially quantifying over H's theoretical variable. H is the result of fitting H unfit to E; so we may also write H = H fit. The fitting operation consists in omitting the existential quantifier and replacing the free variable X by the proposition that one wants to explain. H unfit is a content part of H, since H entails H unfit . Observe that H unfit is not a simple conjunct of H; the considered case is more intricate than a tacking-by-conjunction.
What is important: the unfitted background hypothesis H unfit could be fitted equally well to any other possible alternative evidence E' whatsoever. This implies that H unfit cannot increase the prior probability of any possible alternative evidence; thus P(E|H unfit ) = P(E) and P(E'|H unfit ) = P(E') (for any E') must hold (Schurz, 2014a, sec. 3.3). This in turn entails P(H unfit |E) = P(H unfit ), i.e., H unfit 's probability cannot be raised by E. H unfit is a conjunction of two content elements, C 1 : ∃XW(X) and C 2 : ∀X(W(X) → X). Since C 1 can be fitted equally well to any other possible evidence E', unconditionally as well as conditionally on C 2 , it follows that neither the probability of C 1 nor that of C 2 is raised by E. 5 In conclusion, neither H unfit nor its conjuncts are confirmed by E in the ordinary Bayesian sense. In contrast, H is ordinarily confirmed by E. But it is not genuinely confirmed, not even partially. To see this, note that H has the form W(E)∧C 2 . H's Etranscending content parts are W(E), C 2 and W(E)∧C 2 . C 2 is even not ordinarily confirmed by E. W(E) is not fully genuinely confirmed by E, since its content element C 1 is not ordinarily confirmed. For the same reason, W(E)∧C 2 is not fully genuinely confirmed by E. So H is not partially genuinely confirmed by E. This means, for example, that creationism is not genuinely confirmed by a list of post-facto 'explanation' of the form "E i happened because Got wanted it and whatever God wants, happens" (for i = 1, 2, …), even not in a partial sense, and even not if this list is very long. This is the intended result.
Only if the fitted hypotheses H is confirmed by a second piece of evidence E* to which H unfit has not been fitted and which H could have predicted (in the epistemic sense of 'prediction'), then H unfit can said to be confirmed via the confirmation of H by E and E*. For obviously it is not possible to fit H unfit to a particular evidence E and then to confirm the so-obtained H by any other evidence E* whatsoever. The evidence E* is use-novel in the sense of Worrall (2006). Therefore this consideration provides a Bayesian justification of a special version of Worrall's idea of independent confirmation by use-novel evidencea version for which it is essential that the hypothesis contains a theoretical concept that is not part of the evidence. 6 In contrast to post-facto speculations, accepted scientific hypotheses involving theoretical concepts are highly confirmed by use-novel evidence. An example is the chemical explanation of the combustion of inflammable materials such as wood in air by the hypothesis H Ox = "wood has carbon which, when ignited, reacts with oxygen in the air, and this reaction underlies combustion" (with H Ox,unfit arising from H Ox by replacing the theoretical concepts "carbon" and "oxygen" by existentially quantified variables). A variety of use-novel empirical predictions are derivable from H in the given chemical background theory, for example, P* = "combustion of wood produces carbon dioxide and consumes oxygen, which impedes breathing and increases the greenhouse effect". If P* is confirmed by new evidence E*, then H is genuinely confirmed by {E,E*}. However, also outdated scientific theories that in the light of contemporary evidence are false were often successful at their time, i.e., entailed usenovel empirical consequences and, thus, were not post-hoc. An example is the phlogiston hypothesis of combustion, H Phlog , according to which inflammable materials contain a specific substance named phlogiston which when ignited leaves the wood in the form of a hot flame. Phlogiston theory successfully predicted the processes of calcination (roasting) of metals and of salt-formation of metals in acids, as well as the inversion of these reactions (Ladyman, 2011;Schurz, 2009). Nevertheless the existence 5 By our arguments we have (i) P(C 1 ∧C 2 |E) = P(C 1 ∧C 2 ) (since C 1 ∧C 2 = H unfit ), (ii) P(C 1 |E) = P(C 1 ) and (iii) P(C 1 |E∧C 2 ) = P(C 1 |C 2 ). This implies P(C 2 |E) = P(C 2 ). 6 For example, the fitting of the parameter "population-mean" to an observed sample mean is not a case of theoretical parameter fitting. For this reason, Howson's (1990) counterargument to use-novelty doesn't apply to our account. of phlogiston was later rejected because of various conflicts with observations, e.g., the fact that some substances gain weight after losing their phlogiston.
The above argument that the probability of H unfit is raised by use-novel probabilityincreasing evidence was based on intuition. Probability theory itself does not tell us how the probability increase of a hypothesis by an evidence spreads to its content parts. Based on the above considerations we suggest the following rationality criteria for the spread of the evidence-induced probability increase from H to its E-transcending content elements: Necessary criteria for spread of probability increase If H increases E's probability, then the resulting probability increase of H by E spreads to an E-transcending content element C of H (P(C|E) > P(C)) only if: (1). C is necessary within H to make E probable, i.e., there exists no conjunction H* of content elements of H that makes E at least equally probable (P(E|H*) ≥ P(E|H)) but does not entail C, and (2). it is not the case that C is an existential quantification, C = ∃XH(X), and H results from fitting the value of X in H(X) towards E, such that an equally good fitting of H(X) would have been possible for every possible alternative evidence E'.
We finally note that the use-novelty criterion is not a purely philosophical invention. A statistical method corresponding to the use-novelty criterion is cross-validation (Shalev-Shwartz & Ben-David, 2014, sec. 11.2). Here one starts with one (big) data set E, splits E randomly into two disjoint data sets E 1 and E 2 , fits the unfitted hypothesis to E 1 and tests the resulting fitted hypothesis with E 2 . By repeating this procedure and calculating the average probability of E 2 conditional on H-fitted-to-E 1 , one obtains a highly reliable confirmation score. An important domain of this method is curve fitting (applied to statistically correlated variables X, Y with values x i , y i ). Here, one approximates a finite set of data points E = {<x i ,y i >: 1 ≤ i ≤ n} by an optimal curve Y = f(X) with a remainder dispersion around it as small as possible. It is a well-known fact that any set of data points can be approximated by fitting the parameters c i of a polynomial function H: Y = c 0 + c 1 ·X + … + c n ·X n of a sufficiently high degree n. Here, the existential quantification over this function, ∃c 0 …c n H(c 0 ,…,c n ), plays the role of H unfit . Merely fitting the parameters of H unfit to the data set E is not enough for confirming it. The approximation success of a high-degree polynomial may also be due to overfitting the data, i.e., H unfit may have been fitted on random accidentalities of the sample (cf. Hitchcock & Sober, 2004). Only if the curve H with its parameters fitted towards E successfully approximates a use-novel data set E*, one to which its parameters have not been adjusted, then it is genuinely confirmed by E and E*.

Full GC and Bayesian convergence to certainty
An important part of Bayesian epistemology are convergence theorems. According to them the conditional probability of a hypotheses can be driven to near certainty, if many confirming pieces of evidence for this hypotheses are accumulated (Earman, 1992, 141ff.). Most versions of Bayesian convergence theorems have been formulated for hypotheses not containing theoretical concepts (or latent variables), typically hypotheses that are obtainable from the evidence by enumerative induction. For example, if P is countably additive, then lim n→∞ P(p(Fx) = r | (E 1 ,…,E n )) = 1, where each E i is Fa i or ¬Fa i and F's frequency limit in the E i -sequence is r (which is a consequence of the theorem of Gaifman & Snir, 1982). More important, however, are convergence theorems for hypotheses containing theoretical concepts. A well-known convergence result for this case is based on the confirmation by an increasing number of conditionally independent pieces of evidence, as follows: Theorem 1 -convergence to certainty Assume an infinite sequence of pieces of evidence E 1 ,E 2 ,… and a hypothesis H satisfying the following conditions (recall that P(H), P(E i ) ∉ {0,1}): (a). H makes each piece of evidence more probable than ¬H by an amount of at least δ, i.e. for each i ∈ω: P(E i |H) ≥ P(E i |¬H) + δ, (b). each piece of evidence, E n , is predecessor-independent conditional on H, in the sense that for every n∈ω, P(E n |H∧E 1 ∧…∧Ε n-1 ) = P(E n |H), (c). and likewise E n is predecessor-independent conditional on ¬H. Corollary 2 If the pieces of evidence are mutually independent on H and ¬H, in the sense that P(E n | ± H∧Ε} holds for every ±H ∈ {H,¬H}, n∈ω, E n and conjunction E of n-1 E i 's, then results (i) and (ii) of theorem 1 hold for every sequential ordering of the pieces of evidence. Proof see appendix.
Convergence to certainty in spite of a small prior probability is the ideal case of scientific confirmation. The confirmation of Darwinian evolution theory by multiple pieces of evidence constitutes an example. In theorem 1, the likelihoods P(E i |H) may be different for the different pieces of evidence and may be even smaller than 0.5; all what is required about them is condition (a). Corollary 1 expresses the special case where the likelihoods are equal and given by p and q; this result is found in Bovens and Hartmann (2003, 62, (3.17) and (3.19). 7 Theorem 1 is related to Condorcet's jury theorem for conditionally independent witnesses, where E i stands for "witness i has reported that H" (Bovens & Hartmann, 2003, present corollary 1 in this context). While theorem 1deals with the special case in which all informants (pieces of evidence E i ) are speaking in favor of H, Condorcet's theorems considers a majority of 'correct' informants (E i ) and a minority of incorrect ones (¬E i ). To handle this case Condorcet's theorem assumes that P(E i |H) = P(¬E i |¬H) > 1/2 > P(¬E i |H) = P(E i |¬H), which entails a version of corollary 1 where n stands for the surplus of correct over incorrect informants (cf. the derivation in List, 2004, andin Bovens &Hartmann, 2003, 82).
Surprisingly it turns out that a necessary condition for convergence to certainty is full GC. The existence of only one E-transcending content element of H, call it C, that is not confirmed by any elements of the evidence sequence is sufficient to prevent convergence to certainty. Since C's probability is not raised by any conjunction of E i 's, it holds that P(C|E 1 ∧…∧E n ) = P(C). But P(C|E 1 ∧…∧E n ) = P(C) is an upper bound of P(H|E 1 ∧…∧E n ), since H entails C. Therefore, P(H|E 1 ∧…∧E n ) cannot approach certainty but is forced to stay below P(C), which is small.
Theorem 2-failure of convergence to certainty 8 Assume an infinite sequence of pieces of evidence E 1 ,E 2 ,… and a hypothesis H that satisfies condition (a) of theorem 1, but contains an irrelevant evidence-transcending content part C, i.e., one that is not confirmed by any (consecutive) conjunction of pieces of evidence: P(C|E 1 ∧…∧E n ) = P(C) for all k∈ω and n∈ω. Then: (i). lim n→∞ P(H|E 1 ∧…∧E n ) ≤ P(C), and (ii). a failure of condition (b) or of condition (c) occurs for some piece of evidence E m+1 with m ≤ |log(P(H)·P(¬C)/P(¬H)·P(C))|/|log(1-δ)|, and occurs infinitely many times thereafter.
Corollary 3 If a hypothesis H satisfies conditions (a), (b) and (c) of theorem 1, then H is fully genuinely confirmed by the evidence sequence, in the sense that every of its content parts is confirmed by at least some conjunction of pieces in the sequence.
Proof see appendix.
Theorem 2 is not in conflict with continuous probability increase in the sense that P(H|E n+1 ∧E n ∧…∧E 1 ) > P(H|E n ∧…∧E 1 ). Assume the hypothesis contains an irrelevant E-transcending content part C (in the sense of theorem 2); in the simplest case, it has the form "H∧C", where H is fully genuinely confirmed by the E i and the priors of H and C are low. Then H's probability is continuously increasing conditional on accumulating evidence, however, it does not converge to 1, but to P(C). This is illustrated in Fig. 1.
In conclusion, full GC is a precondition for sustainable confirmation, in the sense of convergence to certainty conditional on accumulating independent evidence. The proof of theorem 2(i) is obvious, but that of 2(ii) is not trivial. All what can be inferred from theorem 2(i) and theorem 1 is that after some finite number of pieces of evidence, whose upper bond can be calculated from theorem 1(i), condition (b) or condition (c) fails, and thereafter the conjunction of (b) and (c) must fail, not necessarily always, but infinitely many times (since otherwise theorem 1(i) would still hold). 9 It would be good to know which of the two conditions, (b) or (c), typically fails. Moreover, we would like to know under which strengthened assumptions this condition must fail for each piece of evidence. Both question are answered by the following observation. Whenever the negation of the hypothesis, ¬H, can be decomposed into a partition of finer hypotheses that convey different probabilities to the evidence, and conditional on which the pieces of evidence are independent, then independence of the pieces of evidence conditional on ¬H must fail. This observation is formally proved in theorem 3 for an arbitrary hypothesis; thus the observation applies likewise to the positive hypothesis H. However, in many typical cases the positive hypotheses specifies an approximately causally complete scenario for the evidence, in which case such a partition does not exist. The scenario described by the negated hypotheses, however, will typically be causally incomplete, leaving open several different alternative causes conveying different probabilities to the evidence, in which case conditional independence must fail for ¬H. Here is an informal explanation of this result: Let ¬H split into two disjoint hypotheses N 1 , N 2 , i.e. P(E 1 |¬H) = P(E 1 |N 1 ∨ Á N 2 ) < P(E 1 ) (since P(E 1 |H) > P(E 1 )). Assume that P(E i |N 1 ) is much larger than P(E i |N 2 ) (for all i). Then P(E 2 |¬H∧E 1 ) > P(E 2 |¬H) will hold, because the fact that E 1 obtains makes it more probable that N 1 and not N 2 obtains, which in turn makes E 2 more probable. For illustration, assume that H is the abovementioned hypothesis "God exists and wants a recession". Then ¬H is equivalent with the exclusive disjunction N 1 ∨ Á Ν 2 , where N 1 stands for "God does not exist" and N 2 for "God does not want a recession". Let E i stand for "there is a recession in country i". Then the condition E i makes it more probable that N 2 is false but N 1 is true, and this diminishes the lowering effect of ¬H (that it is false that God wants a recession) on the probability of a recession in some other country (E j ), i.e. P(E 2 |¬H∧E 1 ) > P(E 2 |¬H) holds. This insight is elaborated in the next two theorems. 9 I am indebted to a referee for detecting a mistake in a previous version of theorem 2 in which I forgot to relativize the failure of conditions (b) or (c) to "sufficiently late" members of the evidence sequence. For a small number of pieces of evidence a countermodel − satisfying the conditions of theorem 1 for a hypothesis H∧C with irrelevant C − has been generated with the probabilistic consistency checker program PrSAT (cf. Fitelson, 2008). The countermodel search with PrSAT for the strengthened conditions of theorems 3 and 4 leads to the expected inconsistency result. Fig. 1 Continuous probability increase of a hypothesis H that is fully genuinely confirmed by an evidence sequence, and of H tacked-on with an irrelevant conjunct C Theorem 3 Assume an evidence sequence (E 1 , E 2 ,…) and a hypothesis H (with P(H), P(E i ) ∉ {0,1}) that decomposes analytically into a disjunction of fine-grained hypotheses H 1 , H 2 conveying different probabilities to the pieces of evidence, which in turn are predecessor-independent conditional on H 1 and on H 2. Or formally: H ↔ H 1 ∨ Á H 2, P(E i |H 1 ) > P(E i |H 2 ) (for all i∈ω), and P(E n |H k ∧E 1 ∧…∧Ε n-1 ) = P(E n |H k ) for all k ∈{1,2} and n ∈ ω. Then for all n ≥ 2, P(E n |H∧E 1 ∧…∧Ε n-1 ) > P(E n |H) holds, i.e., conditional independence fails.
Proof see appendix.
We finally apply theorem 3 to the case of a hypothesis H containing a (strongly) irrelevant content element C: H = H 1 ∧C, where H 1 satisfies the conditions of theorem 1 but both C and ¬C are irrelevant to the evidence, conditional on H 1 and ¬H 1 . In this case the negation of the hypothesis, ¬(H 1 ∧C) splits into the finer partition H 1 ∧¬C and ¬H 1 . While P(E i |¬H 1 ) < P(E i ) holds, H 1 ∧¬C increases E i 's probability, and this destroys the independence of the pieces of evidence conditional on ¬(H 1 ∧C).
Then: (i). ¬H decomposes into the partition ¬H 1 ∨ Á (H 1 ∧¬C) that satisfies the assumption of theorem 3, and (ii). ¬H violates conditional independence (condition (c) of theorem 1) for every piece of evidence E n with n ≥ 2.
Proof see appendix.
Theorem 4 demands the irrelevance of C conditional on ±H 1 , which is stronger than the unconditional irrelevance of C that is entailed by the assumption of theorem 2. If C is unconditionally irrelevant but conditionally relevant to the evidence, the conclusion of theorem 4 may fail. In addition, theorem 4 requires the irrelevance of ¬C conditional on ±H 1 (which given the assumptions is equivalent with the unconditional irrelevance of C).

A generalized convergence-to-certainty theorem: Conclusion and outlook
Often it will be the case that the negation of a given hypothesis decomposes into a long disjunction of possible alternative hypotheses that convey different probabilities to the evidence. Since theorem 4 applies to these cases, this theorem seems to imply a severe restriction of the applicability of standard versions of the convergence-to-certainty theorem 1, and a similar diagnosis applies to Condorcet jury theorems. It would seem that these theorems could no longer be applied to realistic cases of hypotheses.
Fortunately one can devise a generalized version of theorem 1 that is relativized to a given possibly large partition of hypotheses that are assumed to be sufficiently fine-grained to guarantee mutual conditional independence of the pieces of evidence. What one needs to assume in this case is that the hypothesis under consideration conveys a higher likelihood to the pieces of evidence, not only compared to its negation, but compared to any one of the alternative hypotheses: Theorem 5 -generalized convergence to certainty Assume an infinite sequence of pieces of evidence E 1 , E 2 ,… and a hypothesis H 1 that belongs to a partition of hypotheses {H 1 ,…,H m } satisfying the following conditions (where P(H k ), P(E i ) ∉{0,1}): (a). H 1 makes each piece of evidence more probable than every other hypothesis by at least δ (for some δ > 0), i.e., P(E i |H 1 ) ≥ P(E i |H k ) + δ for all k > 1 and i∈{1, …,n}, and (b). the pieces of evidence are mutually independent conditional on every H k . i.e., P(E n |H k ∧E 1 ∧…∧Ε n-1 ) = P(E n |H k ) holds for all k ∈ {1,…,m} and n∈ω.
Then the conclusion (i) and (ii) of theorem 1 holds for H = H 1 .
Corollary 4 In the special case where for each i∈ω, P(E i |H 1 ) = p and P(E i |H k ) = q < p for all k∈{2,…,m}, the conclusion of corollary 1 holds for H = H 1 . Proof see appendix.
If we apply theorem 5 to hypotheses that are conjunctions of several content elements, H = H 1 ∧…∧H k , then the smallest partition of competing hypotheses that has to be checked in regard to conditional independence of the evidence is the partition{±H 1 ∧…∧±H k : ±H i ∈ {H i ,¬H i }, 1 ≤ i ≤ k}, which contains 2 k elements. In conclusion, for complex hypotheses consisting of many content elements, the preconditions of Bayesian convergence to certainty are much stronger than usually presented.
6 Appendix: Proof of theorems 6.1 Proof of theorem 1 and corollaries 1 and 2 Theorem 1 and corollary 1 follow from theorem 5 and corollary 4 by assuming m = 2 and substituting {H,¬H} for {H 1 ,…,H m }. Then all conditions of theorem 1 hold. The conclusion of theorem 1 is identical with the conclusion of theorem 5 (for H = H 1 ), and the conclusion of corollary 1 is identical with the conclusion of corollary 4. Corollary 2 is an obvious consequence of theorem 1. Q.E.D.

Proof of theorem 2 and corollary 3
For theorem 2 Since H entails C, P(H|E 1 ∧…∧E n ) ≤ P(C|E 1 ∧…∧E n ) must hold for any n∈ω. By irrelevance of C, P(C|E 1 ∧…∧E n ) = P(C), which implies P(H|E 1 ∧…∧E n ) ≤ P(C). So all member of the sequence (P(H|E 1 ∧…∧E i ): i∈ω) are smaller than P(C), which implies (i) lim n→∞ P(H|E 1 ∧…∧E n ) ≤ P(C). Thus the consequence (ii) of theorem 1 fails (since by assumption P(C) < 1). Since condition (a) of theorem 1 holds, this entails, by simple Modus Tollens, that either condition (b) or condition (c) must fail for at least some members of the evidence sequence. By exploiting result (i) of theorem 1 we can prove an upper bound of the number m of pieces of evidence that may occur before condition (b) or (c) are violated the first time. The upper bound of this number is given by the equation , which gives by algebraic transformations: Since P(C) ≥ P(H), P(¬H) ≥ P(¬C) follows, so P H ð ÞÁP :C ð Þ P :H ð ÞÁP C ð Þ ≤ 1 and the log at the right side is negative; likewise log(1-δ) is negative. Division through the negative log(1-δ) turns ≥ into ≤ and we obtain m ≤ |log P H ð ÞÁP :C ð Þ P :H ð ÞÁP C ð Þ | / |log(1-δ)| ("|…|" for "absolute value"), as announced. After the first failure of condition (b) or (c) at position m+1, the conjunction of (b) and (c) need not fail always, but it must fail infinitely many times. For if it would fail only finitely many times, the proof of theorem 1 and 5 would still go through. To see this, consider the following line in the proof of theorem 5 (annotated by "(by factoring out)"): If there were a merely finite subsequence of pieces of evidence for which (b)∧(c) fails, we could re-index the evidence sequence by enumerating all members for which condition (b)∧(c) holds. In the equation for this modified situation, we would have to multiply in the above equation the product Π in the numerator and in the left side of the denumerator by a small number x, with 0 < x ≤ 1, where x is the product of the terms P(E n |H k ∧E 1 ∧…∧E n-1 ) for the finitely many E n 's for which (b) fails (note that each of these terms must be greater zero, since otherwise the terms P(E n* |H k ∧E 1 ∧…∧E n*-1 ) for later n* > n for which (b)∧(c) holds could not be greater than zero). Likewise the right side of the denumerator would have to be multiplied by a small number y, with 0 < y ≤ 1, being the product of the terms P(E n |¬H k ∧E 1 ∧…∧E n-1 ) for the finitely many E n 's for which (c) fails. This multiplication just amounts to a change of the priors; but since the theorem holds for all priors, it would still go through. Corollary 3 is an obvious consequence of theorem 2 and the definition of "full genuine confirmation".

Proof of theorem 4
We prove that the partition (H 1 ∧¬C)∨ Á ¬H 1 of ¬H satisfies the conditions of theorem 3 (thus, the ¬H of theorem 4 instantiates the H of theorem 3). From this it follows that our ¬H violates conditional independence (condition (c) of theorem 1) for every piece of evidence E n with n ≥ 2. To show that the conditions of theorem 3 are satisfied, we must prove: (a). P(E i |H 1 ∧¬C) > P(E i |¬H 1 ) holds for all i∈ω., (b). the pieces of evidence are predecessor-independent conditional on ¬H 1 , (c). they are predecessor-independent conditional on H 1 ∧¬C.
For (a): For all i∈ω: P(E i |H 1 ∧¬C) = P(E i |H 1 ) holds by the irrelevance of ¬C conditionally on H 1 . (Since we required P(H 1 ∧¬C) > 0, P(E i |H 1 ∧¬C) is defined.) Moreover P(E i |H 1 ) > P(E i ) > 0 since H 1 satisfies the conditions of theorem 1. This implies that