1 Introduction

Direct inferences identify values of some probabilistic credences with values of objective chances or relative frequencies. The main idea has been around for a long time. It goes by various names and has been articulated in a variety of ways.Footnote 1 Peirce calls it “probable deduction.” Contemporary logicians sometimes call it “statistical syllogism.” David Lewis’s Principal Principle is perhaps the most widely known version of an explicit direct inference principle (Lewis 1980).

Accounts of direct inference usually draw on two distinct notions of probability: an object-language notion, either relative frequency or some notion of objective chance, and a higher level metalinguistic notion that applies to object-language expressions, usually characterized as some kind of logical probability or as a probabilistic measure of rational credence. Carnap (1962), for instance, calls the object language notion \(probability_{2}\), and takes it to represent relative frequencies of attributes among members of populations. He calls the metalanguage notion \(probability_{1}\), and takes it to be a kind of degree of logical entailment, which he calls “degree of confirmation.”

For notational convenience we write ‘P’ for the \(probability_1\) notion and ‘ch’ for the \(probability_2\) notion. Although we will often take the ch function to represent some kind of objective chance, in most contexts the reader may interpret it to be either a chance function or a relative frequency function. In either case, expressions involving the function ch will take the form: ‘\(ch(Ax,Rx)=r\)’. On a reading of ‘ch’ as relative frequency, this expression says that the frequency of objects (or systems, or events) possessing attribute A among those in reference class R is r. On the reading of ‘ch’ as chance, this expression says that the chance that a system in initial state R will acquire attribute A is r.

Letting P represent the \(probability_{1}\) notion and taking ch to represent the \(probability_{2}\) notion, here is a generic version of a direct inference principle. Later we’ll extend it to more complex chance hypotheses.Footnote 2

Generic Direct Inference Principle—G-DIP:Footnote 3

Let P be an “appropriate” probability function on a language that contains chance (or frequency) claims. Let ‘\(ch(Ax,Rx)=r\)’ be an object-language statement that says that the chance that a system in state R acquires attribute A is r (alternatively, that the frequency of possessing attribute A among objects in reference class R is r), where r is a standard term for a real number between 0 and 1 (inclusive). Let ‘Rc’ say that system c is in state (or reference class) R, and let ‘Ac’ say that system c acquires (or possesses) attribute A. Then,

$$\begin{aligned} P[Ac \,|\, ch(Ax,Rx)=r \cdot Rc\cdot E] = r, \end{aligned}$$

provided that E is both consistent with \((ch(Ax,Rx)=r \cdot Rc)\) and admissible with respect to \((ch(Ax,Rx)=r \cdot Rc)\) (where tautologies are always considered admissible).Footnote 4

We won’t attempt to spell out an account of admissibility. Doing so is a complex and controversial undertaking. But, for our purposes, no specific account of admissibility need be supposed. Thau’s proposal works well enough for our purposes: “A proposition is inadmissible if it provides direct information about what the outcome of some chance event is.” (Thau 1994, p. 500, emphasis added)

Since tautologies are always admissible, the admissibility of any other statement E requires that E be probabilistically independent of Ac, given \((ch(Ax,Rx)=r \cdot Rc)\) (for P). However, admissibility does not simply reduce to probabilistic independence; rather, it is designed to motivate probabilistic independence in appropriate cases. For instance, Lewis’ substantive account (in Lewis 1980) declares a statement admissible for a direct inference provided that it contains only information about particular matters of fact that occur before the time at which the associated chance outcome occurs. On this account, all future statements about particular matters of fact are inadmissible, even those that may happen to be probabilistically independent of Ac given chance claim \((ch(Ax,Rx)=r\cdot Rc)\).Footnote 5

When a statement D fails to be probabilistically independent of Ac, given \((ch(Ax,Rx)=r \cdot Rc\cdot E)\) for admissible E (for probability function P), then we say that D defeats the corresponding direct inference. That is, defeat of a direct inference by D just means that \(P[Ac \,|\, D \cdot ch(Ax,Rx)=r \cdot Rc \cdot E] \ne P[Ac \,|\, ch(Ax,Rx)=r \cdot Rc \cdot E] = P[Ac \,|\, ch(Ax,Rx)=r \cdot Rc] = r\) for admissible E.

Notice that if D is a defeater, then on any adequate account of admissibility, \((D \cdot E)\) must be inadmissible for the direct inference, since failure of probabilistic independence is a sure-fire way for admissibility to fail. But its also possible for admissibility to fail in cases where probabilistic independence remains intact. In such a case, although D (or \((D \cdot E)\)) is inadmissible, D does not count as a direct inference defeater, not as we use that term in this paper. Thus, as we use the term, a direct inference defeater is a particularly strong kind of inadmissible statement.Footnote 6

We will investigate several kinds of cases where, on purely logical grounds, when P satisfies the classical axioms of probability, direct inference outcomes must fail to be probabilistically independent of a statement D. Thus, any account of direct inference based on G-DIP will rule the defeating statement D to be inadmissible, regardless of the particular account of admissibility employed. These are the kinds of troubles we consider. These troubles pose significant challenges if an agent wants to use these probability functions in a certain epistemic situation she finds herself in. One such use is to determine one’s current credence via the total evidence requirement.

For Bayesians, the logic of credence functions (or confirmation functions) is captured by the way in which the axioms of probability theory constrain the numerical values of \(P[A \,|\, B]\) for the range of statements A and B, often under conditions (or suppositions) that constrain the probability values of other statements. Logically speaking, a direct inference rule such as G-DIP is merely an additional axiomatic constraint. Any function P that satisfies the other axioms, but violates the direct inference rule, is “ruled out” for failing to be an “appropriate” credence (or confirmation) function.Footnote 7 However, the further issue of how a rational agent is supposed to apply these functions, given the situation in which she finds herself, including her current state of knowledge, is not a purely logical matter. Carnap realized this long ago. His Requirement of Total Evidence is merely a way to make explicit our usual implicit assumptions about how an agent is supposed to apply her credence (or confirmation) function. Here is a fairly close paraphrase of Carnap’s requirement, adapted to apply to the P functions of G-DIP.

Total Evidence Requirement: Suppose that the logic of credence functions (or confirmation functions) supplies a result of form ‘\(P[A \,|\, B] = r\)’, where A and B are statements, r is a real number between 0 and 1, and P is the rational initial credence function (or the confirmation function) for an agent. If B expresses this agent’s total available evidence at the time t, then she is justified at t in believing A to the degree r, and hence in betting that A is true with a betting quotient no higher than r.Footnote 8 (Compare Carnap 1962, p. 211.)

For an agent to apply our version of the direct inference principle, G-DIP, the agent’s total evidence should be captured by ‘\((Rc \cdot E)\)’. What about the chance claim ‘\(ch(Ax,Rx)=r\)’ (the chance claim X, for Lewis)? The Applications of the direct inference principle need not require that the chance claim itself be part of the agent’s total evidence, nor need the agent know it to be true. Here is a close paraphrase of what Lewis says about this point (Lewis 1980, p.267 continued):

If in addition you are sure that the chance claim \(ch(Ax,Rx)=r\) is true (i.e. if \(P[ch(Ax,Rx)=r \,|\, Rc \cdot E]=1\), where \((Rc \cdot E)\) is your total evidence), it follows also that \(r = P[Ac \,|\, Rc \cdot E]\) is your present unconditional degree of belief that Ac is true. More generally, whether or not you are sure about the chance claim \(ch(Ax,Rx)=r\), your unconditional degree of belief that Ac is given by summing over alternative hypotheses about chance:

\(P[Ac \,|\, Rc \cdot E] = \sum _{q}\,\, q\times P[ch(Ax,Rx)=q \,|\, Rc \cdot E].\)Footnote 9

We investigate several kinds of cases where, on purely logical grounds, direct inference outcomes must fail to be probabilistically independent of a statement D. Thus, any adequate account of admissibility should rule the defeating statement D to be inadmissible. We call such statements logically inadmissible with respect to the direct inferences they defeat. In some cases we show precisely how much the addition of these defeaters to the premises of a direct inference must divert the credence value from the associated chance value. We argue that some of these logically inadmissible statements may be easily acquired by an agent, thus tainting her total evidence and inhibiting her warrant to engage in legitimate direct inferences about these chance events.

Here is how we’ll proceed. In Sect. 2 we prove resultsFootnote 10 that show that material conditional and biconditional statements involving the conclusions of direct inferences must be inadmissible on purely logical grounds. This may present some surprising challenges for Bayesian direct inference principles.

In Sect. 3 we show that in an important class of cases the evidential relevance of a statement D to an outcome Ac implies the logical inadmissibility of D. It seems to be relatively easy for an agent to acquire this kind of information. Thus, an agent’s ability to engage in direct inferences is shown to be somewhat fragile.

In Sect. 4 we consider some fairly mild conditions on credence functions that makes them “inappropriate” for G-DIP, because any credence function that satisfies these conditions must get straightforward direct inferences wrong.

In Sect. 5 we discuss direct inferences in cases where several reference classes may compete. We argue that direct inference probabilities are best characterized as expected values over credences of possible observational statements or over extensive chance theories. We show how this fact is problematic for Bayesian direct inference principles.

The authors of this paper are divided over what these results show. One of us (Wallmann) thinks that many of these logically inadmissible statements should not defeat direct inferences. Rather, an agent who has such information as part of her total evidence should still conform her rational credences, and her betting behavior, to the objective chances. Therefore, this author reads these troubles as showing that the Bayesian account of direct inference fails, that having P satisfy the axioms of conditional probability is incompatible with a correct account of direct inference. The other author thinks that the logically inadmissible statements explored in this paper should indeed defeat direct inferences, so the Bayesian account gets it right. We will elaborate our reasons for disagreement in the main body of the paper. In any case, the paper explores a wide range statements of a kind that must turn out to be inadmissible on any Bayesian account of direct inference.

2 Logical Admissibility Troubles

The troubles we will raise for direct inference principles in this section and the next are quite general. They plague all Bayesian accounts where the P notion satisfies the usual axioms of conditional probability, regardless of whether the conception of objective chance applies to full propositions (as does Lewis’s Principal Principle) or is couched in terms of generic probabilities (containing only open sentences, as in G-DIP, above). All the admissibility failures we’ll discuss draw on cases where probabilistic independence must fail on purely logical grounds. We will first investigate several kinds of such logically inadmissible statements. Section 3 will go on to provide a more general characterization of an important class of logically inadmissible statements.

2.1 Logically Inadmissible Biconditionals

Consider the following situation. John and Maria are standing next to the craps table watching the action. Let H represent the chance hypotheses associated with a fair pair of dice tossed onto a flat surface in the usual (fair) way. In particular, R says that a pair of fair dies is tossed onto a flat surface in the usual (fair) way, and A says that the outcome of a toss is seven. According to chance hypothesis H, the chance of outcome A for a system in state R is 1 / 6, \(ch(Ax,Rx)=1/6\), which is the usual objective chance for getting seven on a (fair) toss of a pair of fair dice. Let c be the event consisting of the next toss of the dice, so Rc says that the next toss is that of a pair of fair dice (fairly) tossed onto a flat surface, and Ac says that the next toss comes up seven. Let E represent Maria’s background knowledge about dice and craps tables, and perhaps about human relationships, and about anything else that may be relevant to the following situation (including the fact that Maria trusts John to keep his word). Surely E is itself admissible with respect to possible chance outcomes for \((H \cdot Rc)\)—otherwise we will already have trouble applying direct inference principles to this kind of chance situation. Thus, we should have the direct inference \(P[Ac \,|\, H \cdot Rc \cdot E] = 1/6\), where P is Maria’s (initial) credence function.

Now, John says to Maria, “I’ll buy you dinner this evening if, but only if, the next toss comes up seven.” That is, John sincerely asserts a statement of form \((F \equiv Ac)\), where Maria understands F to say that John will pay for Maria’s dinner this evening (provided that no extraordinary circumstance arises—e.g. provided that Maria permits it, and John doesn’t fall ill before hand, etc.).Footnote 11

Taking John at his word, Maria adds \((F \equiv Ac)\) to her total body of evidence. Thus, the premise for the direct inference regarding Ac, based on her total body of evidence, becomes \((H \cdot Rc \cdot E \cdot (F \equiv Ac))\). Should Maria’s rational credence that the dice will come up seven on the next toss now differ from the objective chance value?—i.e. does \({P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)]}\) differ from 1 / 6? Or has Maria’s total information, \((E \cdot (F \equiv Ac))\) become inadmissible, undermining the direct inference? More urgently, should Maria still be willing to bet on the next toss turning up seven at the usual fair odds (which is 5 to 1 against, corresponding to the chance of occurrence being 1 / 6)? You might well think so!Footnote 12

As it happens, probability theory itself guarantees that this kind of biconditional information is almost always logically inadmissible for the relevant direct inference. For, whenever \(P[F \,|\, H \cdot Rc \cdot E \cdot Ac]\ne P[\lnot F \,|\, H \cdot Rc \cdot E \cdot \lnot Ac]\), Ac cannot be probabilistically independent of \((F \equiv Ac)\) given \((H \cdot Rc \cdot E)\). And any such failure of probabilistic independence entails inadmissibility. Worse yet, we will see that, according to her credence function, the odds at which Maria should be willing to bet that seven turns up may differ significantly from the usual fair betting-odds suggested by the objective chance.

Theorem 1

Inadmissible Biconditionals.

Let r be any real number such that \(0< r < 1\). Suppose \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) and \(1> P[(F \equiv Ac) \,|\, H \cdot Rc \cdot E] > 0\). Then both \(P[F \,|\, H \cdot Rc \cdot E \cdot Ac] = s\) and \(P[\lnot F \,|\, H \cdot Rc \cdot E \cdot \lnot Ac] = t\) are well-defined (for some s and t), and

  1. (1)

    either \(s > 0\) or \(t > 0\), and either \(s < 1\) or \(t < 1\), and

  2. (2)

    \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)] = 1 \,/\, [1 \,+\, ((1-r)/r)\times (t/s)]\).

Furthermore,

\(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)] > r\) if and only if \(s > t\),

\(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)] < r\) if and only if \(s < t\),

\(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)] = r\) if and only if \(s = t\).

If, in addition, \(P[Ac \,|\, H \cdot Rc \cdot E \cdot F] = P[Ac \,|\, H \cdot Rc \cdot E]\) (i.e. if Ac is probabilistically independent of F given \(H \cdot Rc \cdot E\)), then \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \equiv Ac)] = r\) if and only if \(P[F \,|\, H \cdot Rc \cdot E] = 1/2\).

Thus, when John says to Maria, “I’ll buy you dinner this evening if, but only if, the next roll comes up seven”, almost everyone who overhears this assertion, and who takes John to be sincere, should employ credences, based on the total available evidence, that fail to match the objective chances of the dice coming up seven on the next roll. Only one kind of exception is possible. Those individuals whose credences remain faithful to the objective chance are just those individuals who, before hearing John’s statement, happen to find the conditional credibility of the claim “John will buy Maria dinner this evening” given seven comes up on the next roll (i.e. \({P[F \,|\, H \cdot Rc \cdot Ac \cdot E]}\)) equal to the conditional credibility of the claim “John won’t buy Maria dinner this evening” given seven does not come up on the next roll (i.e. \({P[\lnot F \,|\, H \cdot Rc \cdot \lnot Ac \cdot E]}\)) — where both credence conditions include the agent’s total available evidence E together with the relevant chance claims, \((H \cdot Rc)\).

Indeed, before hearing John’s statement (\(F \equiv Ac\)), perhaps Maria and most bystanders will have taken “seven comes up on the next roll” to be probabilistically independent of “John buys Maria dinner this evening”, given \((H \cdot Rc \cdot E)\). Such an agent cannot have her credence that “the next roll turn up seven” remain faithful to the objective chance unless she happens to assign \(P[F \,|\, H \cdot Rc \cdot E] = 1/2\). Thus, the Bayesian account of direct inference apparently implies a form of the principle of indifference (Hawthorne et al. 2017). However, it seems highly doubtful that most agents will assign the value 1 / 2 to \({P[F \,|\, H \cdot Rc \cdot E]}\). For, in place of F, John might well have asserted biconditionals involving any number of distinct alternative conditions, \(F_1\), \(F_2\), \(F_3\), ..., etc. (e.g., “I’ll buy you dinner at McDonald’s”, “I’ll buy you dinner at Chez Panisse”, ..., etc.). But the statements \(F_k\) for the resulting biconditional claims, (\(F_k \equiv Ac\)), cannot all have conditional credence values \(P[F_k \,|\, H \cdot Rc \cdot E] = 1/2\). Thus, the agent’s direct inference credence \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F_k \equiv Ac)]\) must deviate from the objective chance value 1 / 6 for almost all such claims, \(F_k\).

When the value of \(s = P[F \,|\, H \cdot Rc \cdot A \cdot E]\) is much closer to 0 than the value of \(t = P[\lnot F \,|\, H \cdot Rc \cdot \lnot A \cdot E]\), the value of \(P[Ac \,|\, H \cdot Rc \cdot E\cdot (F \equiv Ac)]\) must be very close to 0, as the theorem shows.Footnote 13 So, if Maria (and eavesdropping bystanders) takes John’s offer to be very unlikely before he asserts it, then her total-evidence credence for seven on the next toss should be very close to 0! Thus, if the objective chance values provide the correct betting odds, then Maria (and bystanders) should be willing to accept wagers against seven at incorrect odds that are extremely unfavorable to themselves. This is true regardless of whether there is any evidence available for Maria (or the bystanders) that justifies assigning low credence to John paying for the dinner. We will discuss situations in which credences based on no evidence whatsoever lead to defeat of direct inferences in more detail in Sect. 5.2.

2.2 Some Other Logically Inadmissible Statements

Similar to biconditionals, material conditionals and disjunctions involving the outcome Ac must be logically inadmissible. The extent to which the resulting probabilities deviate from the corresponding direct inference probabilities will be characterized precisely here. We will also prove a result for the case where adding a further statement to the body of evidence defeats a defeater and restores the original direct inference.

A statement is a defeater just in case its negation is also a defeater. The only exceptions are cases where the candidate statement has probability 1 or 0, given the premise of the direct inference. This suggests an easy algorithm for generating a host of inadmissible statements: (1) find an obvious inadmissible statement (e.g. \((\lnot F \cdot \lnot Ac)\)); then (2) take its negation (e.g. \(\lnot (\lnot F \cdot \lnot Ac)\), which is logically equivalent to \((F \vee Ac)\)). The following result establishes this claim.

Theorem 2

Defeater just when Negation-Defeater.

\(P[D \,|\, H \cdot Rc \cdot E] > 0\) and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot D] \ne P[Ac \,|\, H \cdot Rc \cdot E]\)if and only if \(P[\lnot D \,|\, H \cdot Rc \cdot E] > 0\) and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot \lnot D] \ne P[Ac \,|\, H \cdot Rc \cdot E]\).

It follows immediately that whenever \(0< P[D \,|\, H \cdot Rc \cdot E] < 1\), we have \(P[Ac \,|\, H \cdot Rc \cdot E \cdot D] \ne P[Ac \,|\, H \cdot Rc \cdot E]\)if and only if \(P[Ac \,|\, H \cdot Rc \cdot E \cdot \lnot D] \ne P[Ac \,|\, H \cdot Rc \cdot E]\). It also follows immediately that disjunctions and material conditionals involving the outcome Ac are inadmissible. From \({P[Ac \,|\, H \cdot Rc \cdot E]}\)\(= r > 0\) and \(P[\lnot (Ac \vee F)\,|\, H \cdot Rc \cdot E] > 0\), we have \(0 = P[Ac \,|\, H \cdot Rc \cdot E \cdot \lnot (Ac \vee F)] \ne P[Ac \,|\, H \cdot Rc \cdot E]\); so (via the previous result) \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \vee F)] \ne P[Ac \,|\, H \cdot Rc \cdot E]\). Similarly, from \(P[Ac \,|\, H \cdot Rc \cdot E] = r > 0\) and \(P[\lnot (Ac \supset F)\,|\, H \cdot Rc \cdot E] > 0\), we have \(1 = P[Ac \,|\, H \cdot Rc \cdot E \cdot \lnot (Ac \supset F)] \ne P[Ac \,|\, H \cdot Rc \cdot E]\); so (via the previous result) \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \supset F)] \ne P[Ac \,|\, H \cdot Rc \cdot E]\). The following theorem extends this result by showing more precisely the degree to which \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \vee Ac)]\) differs from \(P[Ac \,|\, H \cdot Rc \cdot E]\).

Theorem 3

Inadmissible Disjunctions.

Let r be any real number such that \(0< r < 1\).

Suppose \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) and \(P[(Ac \vee F) \,|\, H \cdot Rc \cdot E] < 1\).

Then \(P[F \,|\, H \cdot Rc \cdot E \cdot \lnot Ac] = s\) is well-defined for some value of \(s < 1\), and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \vee F)] = 1 \,/\, [1 \,+\, ((1-r)/r) \times s] > r\).

It follows immediately that:

Corollary 4

Inadmissible Material Conditionals.

Let r be any real number such that \(0< r < 1\) and suppose \(P[Ac \,|\, H \cdot Rc \cdot E] = r\).

  1. 1.

    If \(P[Ac \supset F \,|\, H \cdot Rc \cdot E] < 1\), then \(P[F \,|\, H \cdot Rc \cdot E \cdot Ac] = s\) is well-defined for some \(s < 1\), and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \supset F)] = 1 / [1 + ((1-r)/r)/s] < r\).

  2. 2.

    If \(P[\lnot Ac \supset F \,|\, H \cdot Rc \cdot E] < 1\), then \(P[F \,|\, H \cdot Rc \cdot E \cdot \lnot Ac] = s\) is well-defined for some \(s < 1\), and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (\lnot Ac \supset F)] = 1 / [1 + ((1-r)/r)\times s] > r\).

  3. 3.

    If \(P[F \supset Ac \,|\, H \cdot Rc \cdot E] < 1\), then \(P[F \,|\, H \cdot Rc \cdot E \cdot \lnot Ac] = 1-s\) is well-defined for some \(s < 1\), and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F\supset Ac)] = 1 / [1 + ((1-r)/r)\times s] > r\).

This corollary characterizes additional counter-intuitive defeaters for Bayesian direct inference. Suppose that in our craps example from Sect. 2.1 John says “If seven comes up on the next toss, I’ll buy you dinner this evening”. Then, where \(r=1/6\), for \(s=0.5\), \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \supset F)] = 1/11\). Furthermore, if, believing that John is stingy, Maria considers “John buys Maria dinner this evening”, F, to be highly unlikely (given \(H \cdot Rc \cdot E\)), say \(s=.01\), then \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \supset F)] = 1/501<< 1/6\). Thus, such (material) conditional claims turn out to overwhelmingly defeat the direct inference. This is true regardless of whether Maria has any evidence that justifies her in considering John as stingy.

In some cases a defeated direct inference may be restored by the addition of information. Consider, for example, the case where \((Ac \vee F)\) is a defeater for the direct inference to Ac, but where F is not itself a defeater. In that case, although \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \vee F)] \ne P[Ac \,|\, H \cdot Rc \cdot E]\), adding F as a premise restores the direct inference, since \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (F \vee Ac)\cdot F] = P[Ac \,|\, H \cdot Rc \cdot E \cdot F] = P[Ac \,|\, H \cdot Rc \cdot E]\). In this case the statement F is a defeater–defeater for the defeater \((Ac \vee F)\). An earlier (Theorem 2) showed that the negation of a defeater must also be a defeater. So, one may well wonder whether the negation of a defeater–defeater may also be a defeater–defeater. The following theorem shows that this never happens. The negation of a defeater–defeater can never restore the previously defeated direct inference.

Theorem 5

Negations of Defeater–Defeaters cannot be Defeater–Defeaters.

Suppose \(P[Ac \,|\, H \cdot Rc \cdot E \cdot D] \ne P[Ac \,|\, H \cdot Rc \cdot E]\), but for G such that \(1> P[G \,|\, H \cdot Rc \cdot E \cdot D] > 0\) we have \(P[Ac \,|\, H \cdot Rc \cdot E \cdot D\cdot G] = P[Ac \,|\, H \cdot Rc \cdot E]\)—i.e. suppose that D defeats the direct inference \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) but G defeats the defeater, restoring the direct inference. Then \(1> P[\lnot G \,|\, H \cdot Rc \cdot E \cdot D] > 0\) and \(P[Ac \,|\, H \cdot Rc \cdot E \cdot D \cdot \lnot G] \ne P[Ac \,|\, H \cdot Rc \cdot E]\)—i.e. \(\lnot G\)cannot also defeat the defeater D.

The next subsection provides an important example of a defeater–defeater.

2.3 Escape from These Troubles via Stronger Conditionals

The craps table examples presented in Sects. 2.1 and 2.2 show how easy it can be to taint an agent’s total body of evidence with statements that defeat her direct inferences. But perhaps our way of interpreting these examples is mistaken. For, although direct inferences are indeed defeated by such material conditionals and biconditionals (in which the antecedents are the target statement of the direct inference, or its negation, Ac or \(\lnot Ac\)), perhaps such defeating conditionals and biconditionals may not be so easily introduced into an agent’s total body of evidence in such a way that they function as defeaters. If this suggestion is right, then although the formal results about material conditional and biconditional defeaters are correct, the intuitive examples we used to illustrate the impact of these formal results may be misleading. Properly represented, the intuitive examples might not give rise to direct inference defeaters after all. Here is what we have in mind.

We first treat the case of simple conditional statements, before turning to the biconditional case. Consider John’s conditional assertion to Maria, “If seven comes up on the next toss, I’ll buy you dinner this evening.” As usually understood, such an assertion suggests a clear causal asymmetry between John’s dinner offer (i.e. “I’ll will buy Maria dinner this evening”) and the outcome of the dice roll (i.e. “seven comes up on the next toss”). John may wait for the outcome of the toss and may then act in such a way that the conditional will be true. So, perhaps the representation of the example in terms of a mere material conditional is inadequate. Perhaps the conditional involved is more adequately represented by some stronger kind of indicative or causal conditional. Let’s formally represent John’s assertion this way: \((Ac \rightarrow F)\), where \(\rightarrow \) represents some kind of strong, causal or indicative conditional. Then, the central issue is whether or not \(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \rightarrow F)] = P[Ac \,|\, H \cdot Rc \cdot E]\) may hold for direct inference \(P[Ac \,|\, H \cdot Rc \cdot E] = r\). The following result will prove useful.

\(P[Ac \,|\, H \cdot Rc \cdot E \cdot (Ac \rightarrow F)] = P[Ac \,|\, H \cdot Rc \cdot E]\)—i.e. the direct inference remains undefeated by \((Ac \rightarrow F)\)—whenever

\(P[(Ac \rightarrow F) \,|\, H \cdot Rc \cdot Ac \cdot E] = P[(Ac \rightarrow F) \,|\, H \cdot Rc \cdot E]\)—i.e. whenever Ac provides no evidence for (or against) \((Ac \rightarrow F)\), given \((H \cdot Rc \cdot E)\).Footnote 14

Arguably, in the craps-table example the claim Ac (given \((H \cdot Rc \cdot E)\)) does not provide evidence for or against a strong (causal or indicative) conditional claim of form \((Ac \rightarrow F)\).Footnote 15 Thus, our example of easy defeat for an agent’s direct inference may be side-stepped. Supplying the agent with a convincing conditional claim involving the target statement of her direct inference, Ac, need not defeat her direct inference after all, unless that convincing conditional claim is merely a material conditional claim. A truly convincing example of easy defeat via the acquisition of a knowledge of conditional claim will have to show how the rational agent may (easily) become convinced of the material conditional claim in cases where she is not also convinced of the corresponding strong conditional claim.Footnote 16

All of the previous points carry over fairly directly to the case of the biconditional defeater. In this context, John’s biconditional assertion to Maria, “I’ll buy you dinner this evening if, but only if, seven comes up on the by next toss”, clearly suggests a causal asymmetry between John’s dinner offer and the outcome of the dice roll. So, perhaps John’s biconditional assertion is not adequately captured by the material biconditional. Perhaps it is more adequately represented by a conjunction of stronger, indicative or causal conditional claims, as follows: \(((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F))\), where \(\rightarrow \) again represents some kind of strong, causal or indicative conditional. Then, the issue is whether or not \(P[Ac \,|\, H \cdot Rc \cdot E \cdot ((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F))] = P[Ac \,|\, H \cdot Rc \cdot E]\) may hold, where \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) is a direct inference. The following result should help.

The direct inference remains undefeated by the strong biconditional—i.e. \(P[Ac \,|\, H \cdot Rc \cdot E \cdot ((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F))] =\)\(P[Ac \,|\, H \cdot Rc \cdot E]\)—whenever Ac and \(\lnot Ac\) each provide the same evidence for (or against) \(((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F))\), given \({(H \cdot Rc \cdot E)}\)—i.e. whenever \(P[((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F)) \,|\, H \cdot Rc \cdot E \cdot Ac] = P[((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F)) \,|\, H \cdot Rc \cdot E \cdot \lnot Ac]\).Footnote 17

Arguably, in the context of the craps-table example, the claims Ac and \(\lnot Ac\) should (given \((H \cdot Rc \cdot E)\)) each provide the same amount of evidence for or against a strong (causal or indicative) biconditional claim of form \(((Ac \rightarrow F)\cdot (\lnot Ac \rightarrow \lnot F))\).Footnote 18 Thus, the prospect of easy defeat for an agent’s direct inference about a future chance event, via the easy acquisition of a biconditional, may be averted. Informing the agent with a convincing biconditional claim need not defeat her direct inference, unless that convincing biconditional claim involves only a material biconditional, rather than conditionals of some stronger kind.

None of this is to suggest that defeat via material conditionals and biconditionals is unimportant to Bayesian direct inferences; only that their availability should not be so easily acquired as the craps-table examples suggest. Furthermore, in cases where the chance event Ac has already occurred, when the agent’s total available evidence remains admissible for the relevant direct inference, her chance claims may continue to guide her credence that Ac holds via the usual kind of direct inference. However, in such cases an agent may more easily become informed of a material conditional or biconditional statement that informationally ties Ac to another statement F. When that happens, this additional information may well defeat her chance-based direct inference regarding the chance event Ac, as indicated by the defeater theorems presented in this section. From a Bayesian perspective, this may sound plausible. When F and Ac are informationally tied together by a material conditional or biconditional claim, and that claim is added to the agent’s total evidence, then whatever credence F itself already had will drag the credence of Ac away from its direct inference value.Footnote 19 This is true, however, even for the case where no evidence is available for or against F. In this case, it seems that defeat by biconditionals may be problematic. We will discuss situations in which credences based on no evidence whatsoever lead to defeat of direct inferences in more detail in Sect. 5.2.

3 Evidential Relevance and Admissibility

It is commonly supposed that chance hypotheses screen off “many propositions that one can easily come to know and that would otherwise be relevant to the proposition A under discussion.” (Schwarz 2014, p. 82). When this is so, the direct inference from the chance hypothesis is said to be resilient.Footnote 20 A high degree of resiliency for direct inferences is crucial. Otherwise, they may be largely inapplicable, given the total evidence available to agents. In this section we will characterize a broad class of statements that, on logical grounds, must defeat direct inferences. Thus, to the extent that such information is readily available to agents, direct inferences may turn out to be rather less resilient than usually supposed.

We investigate some quite general conditions under which a statement D may defeat direct inferences. Our results are general enough to apply to extensive chance hypotheses—i.e. chance hypotheses (and theories) that entail chance claims for an algebra of outcomes of initial chance states R, and may do so for any number of distinct initial chance states. We’ll say more about the nature of extensive chance hypotheses below.

We will characterize some classes of statements that must defeat direct inferences, and so must be inadmissible on any account. For example, under assumptions very commonly met, one of our main results shows that evidential support of a statement D for Ac implies inadmissibility of D in direct inferences for Ac and goes like this:

Let \(A_1c\) and \(A_2c\) be any two possible chance outcomes of initial state R for chance system c, and suppose E is admissible for the direct inferences from H to each of these two outcomes. Consider a statement D to which each of the possible chance events \((Rc \cdot A_1c)\) and \((Rc \cdot A_2c)\) is directly relevant. Indeed, suppose that each of these possible chance events is so directly relevant to D that it overrides (or screens-off) whatever relevance H might have to D, given E (for credence function P). Then, provided that D is more likely according to one of these two chance events than according to the other, given E (for P), D must defeat either the direct inference from \((H \cdot Rc \cdot E)\) to \(A_1c\) or the direct inference from \((H \cdot Rc \cdot E)\) to \(A_2c\) (for P). Thus, any such statement D, in conjunction with the admissible statement E, must be inadmissible for direct inferences from \((H \cdot Rc)\).

This section is mainly devoted to explicating several results of this kind.

We proceed by first characterizing extensive chance hypotheses, and generalizing the principle of direct inference, G-DIP, to cover them. Then we identify an important class of statements D that turn out to defeat direct inferences from chance hypothesis H: statements D to which some of H’s chance outcomes are “more directly relevant” than is H itself. We provide an illustrative example of such a case. Finally, we establish two general results that show the logical inadmissibility of such statements. The first result, stated informally above, provides sufficient conditions for such statements to defeat direct inferences. The second result provides necessary and sufficient conditions for such statements to defeat direct inferences, but under slightly stricter conditions (involving partitions of chance outcomes) than supposed by the first result.

3.1 Extensive Chance Hypotheses and Algebras of Attributes

Sophisticated chance hypotheses (or chance theories) entail chance claims for all Boolean combinations of possible outcome attributes of an initial chance state (or reference class) R. That is, whenever the hypothesis entails chance claims of form \(ch(Ax,Rx)=r\) and \(ch(Bx,Rx)=s\), it also entails chance claims of form \(ch(\lnot Ax,Rx)=p\), \(ch((Ax\vee Bx),Rx)=q\), and \(ch((Ax\cdot Bx),Rx)=t\), where p, q, r, s, t are standard terms for real numbers between 0 and 1. Thus, associated with each chance state Rx is a Boolean algebra of outcome attributes \(\Theta _R\) for R, where, whenever \(\Theta _R\) contains Ax and Bx, it also contains \(\lnot Ax\), \((Ax \vee Bx)\), and \((Ax \cdot Bx)\); and where \(\Theta _R\) contains no other expressions.Footnote 21 Furthermore, for each initial state (or reference class) R treated by H, the associated chance function \(ch(\ ,Rx)\) should satisfy the usual axioms of probability theory for its algebra of attributes, \(\Theta _R\).Footnote 22 An extensive chance theory of this kind will often cover a variety of distinct initial states (or reference classes) Rx, and provide chance claims for Boolean algebras of outcomes, \(\Theta _R\), for each such R.

One more bit of notation will prove useful. When a particular chance system c is in an initial chance state R, we denote the algebra of chance outcomes for event Rc by the term ‘\(\Theta _R(c)\)’, which represents the algebra of outcome attributes for R, \(\Theta _R\), applied to the individual system c. That is, when Rc holds, for each Ax in \(\Theta _R\), there is an associated possible outcome of Rc, Ac, in the algebra of associated outcomes \(\Theta _R(c)\).

Throughout the remainder of this paper our treatment of chance and direct inference will apply to the kind of extensive chance hypotheses just described. We’ll use ‘H’ to represent chance hypotheses of this kind. Here is a generalization of the direct inference principle that applies to direct inferences from extensive chance hypotheses.

Generalized Generic Direct Inference Principle—GG-DIP:

Let P be an appropriate classical probability function (credence function) on a language that contains chance (or frequency) statements. Let H be any extensive chance hypothesis: that is, for each initial state (or reference class) R treated by H, for each \(A_j\) in the associated Boolean algebra, \(\Theta _R\), of possible outcome attributes for systems in state R, H entails a chance claim of form \(ch(A_jx, Rx) = r_j\), where \(r_j\) is a standard term for a real number between 0 and 1 (inclusive), and where each chance function \(ch(\ , Rx)\) satisfies the usual axioms of probability theory on \(\Theta _R\). Then, for each outcome attribute \(A_j\) in \(\Theta _R\), for each chance system c,

$$\begin{aligned} P[A_jc \,|\, H \cdot Rc \cdot E] = r_j, \end{aligned}$$

provided that E is both consistent with \((H \cdot Rc)\) and admissible with respect to \((H \cdot Rc)\) over \(\Theta _R(c)\) (where tautologies are always considered admissible).

A statement E may defeat some of the direct inferences based on \((H \cdot Rc)\), while leaving others intact. That is, we may have \(P[A_jc \,|\, H \cdot Rc \cdot E] = r_j\) for some possible outcomes \(A_jc\), while \(P[A_kc \,|\, H \cdot Rc \cdot E]\)\(\ne r_k\) for some other possible outcomes. In that case E should count as inadmissible for the direct inferences from \((H \cdot Rc)\) to the outcomes in \(\Theta _R(c)\), regardless of the fact that some of these chance outcomes happen to be probabilistically independent of E. For, when a agent’s total body of evidence consists of \((Rc \cdot E)\) and she is contemplating bets on outcomes of Rc, no proper account of admissibility should count her total evidence as admissible for some of the possible outcomes, but inadmissible for others—admissible for the dice coming up six, but inadmissible for coming up nine. Any proper account of admissibility involves more than mere probabilistic independence. Any specific notion of admissibility is supposed to provide a rational for probabilistic independence in direct inference contexts, and that rational should apply to all the possible outcomes of an initial chance state Rc for a chance system c.

At the beginning of this section we introduced the notion of resiliency for direct inferences. The idea is that the alignment of credences with chances should not be undermined by the addition of easily acquired information. Otherwise, the ability to apply direct inferences becomes unstable. Resiliency is meant to capture this kind of desired stability for direct inferences. A direct inference is highly stable provided that nearly all of the kinds of information that might become available to an agent who is in a position to apply that direct inference falls within its “sphere of resiliency”. It will prove useful to specify this notion formally.

Definition 6

Resiliency Spheres.

For a credence function P, an extended chance hypothesis H, and a chance system c in initial state R covered by chance claims in H, the resiliency sphere for direct inferences from \((H \cdot Rc)\) is the collection of statements E such that, for every outcome Ac in algebra \(\Theta _R(c)\) of outcomes for Rc (according to H), \(P[Ac \,|\, H \cdot Rc \cdot E] = P[Ac \,|\, H \cdot Rc]\).

Notice that a resiliency sphere surrounds not merely individual chance outcomes, taken one at a time, but the whole algebra of outcomes of chance state Rc. A statement E that is probabilistically independent of one outcome of Rc, given \((H \cdot Rc)\), but fails to be probabilistically independent of another of its outcomes, falls outside the resiliency sphere.Footnote 23

The resiliency sphere for \((H \cdot Rc)\) will usually be broader than its class of admissible statements, depending on how the notion of admissibility is specified. To see why, notice how GG-DIP (and G-DIP) is supposed to work. Any application of GG-DIP presupposes some concrete notion of admissibility, specified in advance of identifying associated credence functions P. That is, a concrete notion of admissibility specifies, for each chance statement in H and its initial state Rc (for arbitrary systems c), exactly what statements E are to count as admissible. It will usually do so in terms of the information carried by the chance claims in H, the information carried by Rc and its associate chance outcomes in \(\Theta _R(c)\), and by the information carried by statements E. This will usually involve conditions that take into account whether the information in E is (or is not) “directly relevant” to outcomes Ac in \(\Theta _R(c)\).Footnote 24 The specification of admissibility doesn’t depend in any way on the particular credence function considered. Rather, after a specific account of admissibility is spelled out, GG-DIP (or G-DIP) does its work by ruling out those credence functions P that either fail to make \(P[Ac \,|\, H \cdot Rc] = r\) when H entails \(ch(Ax,Rx)=r\), or that fail to make \(P[Ac \,|\, H \cdot Rc \cdot E] = P[Ac \,|\, H \cdot Rc]\) when E has been deemed admissible by the account of admissibility on offer. All credence function P that are not ruled out in this way may count as “appropriate” for some agent, provided that they satisfy whatever other constraints are deemed proper (e.g. for Lewis they must also satisfy regularity). The point is, for a credence function P that passes these hurdles, so succeeds in satisfying GG-DIP, there may well be a number statements E not designated as admissible but that still yield \(P[Ac \,|\, H \cdot Rc \cdot E] = P[Ac \,|\, H \cdot Rc] = r\) for all Ac in \(\Theta _R(c)\). Thus, the resiliency sphere of \((H \cdot Rc)\) for P may well contain more than the class of admissible statements for \((H \cdot Rc)\) specified by a specific account of admissibility. However, any statement E that falls outside the resiliency sphere of \((H \cdot Rc)\) for P must be inadmissible for \((H \cdot Rc)\) according to every possible coherent account of admissibility.

3.2 When Chance Outcomes of a Hypothesis HOverride Its Relevance to a Statement D

Typically, the relevance of a chance hypothesis H to a statement D will be overridden by outcomes of an initial chance state Rc in the following kind of situation. Statement D contains information about possible chance outcome Ac (and its alternatives), so Ac is evidentially relevant to D given \((Rc \cdot E)\). And because hypothesis H is relevant to chance outcome Ac, it will be relevant to (information in) D as well. But, the chance claim \(ch(Ax,Rx)=r\) entailed by H is more directly about outcome Ac than about D, so the relevance of H to D derives from its relevance to Ac. When that’s the case, the information contained in outcomes Ac and \(\lnot Ac\) may override what information H contains (about possible outcomes) that is relevant to D, given \((Rc\cdot E)\), because the information Ac and \(\lnot Ac\) contain is more directly tied to D than the information contained in H. Thus:

$$\begin{aligned}&P[D \,|\, Ac\cdot H \cdot Rc \cdot E] = P[D \,|\, Ac\cdot Rc \cdot E]~ \hbox {and}\\&\quad P[D \,|\, \lnot Ac\cdot H \cdot Rc \cdot E] = P[D \,|\, \lnot Ac\cdot Rc \cdot E]. \end{aligned}$$

In such cases let’s say that the relevance of chance hypothesis H to statement D is overridden by the associated chance outcomes of chance state Rc. It turns out that whenever this condition holds and D is evidentially relevant to Ac (\(P[D \,|\, Ac\cdot H \cdot Rc \cdot E] \ne P[D \,|\, Rc \cdot E]\)), D (together with admissible E) must defeat the direct inference from \((H \cdot Rc)\) to Ac.

Definition 7

Chance Outcomes with Overriding Relevance to D.

The relevance of chance hypothesis H to statement D is overridden by its direct inference outcomes in \(\Gamma _R(c) = \{A_ic, A_jc, \dots , A_kc\}\) (which is some subset of the algebra of outcomes \(\Theta _R(c)\) for chance state Rc), given admissible E, just in case for each of the chance outcomes \(A_jc\) in \(\Gamma _R(c)\) (associated with direct inferences based on \((H \cdot Rc)\), for admissible E), \(P[D \,|\, A_jc\cdot H \cdot Rc \cdot E] = P[D \,|\, A_jc\cdot Rc \cdot E]\).

Here is an illustration of a case where chance outcomes \(\{Ac, \lnot Ac\}\) of a chance hypothesis H are overridingly relevant to a statement D.

Let H be a theory about the chances that people who fit some particular profile R have the attribute, “will develop Alzheimer’s disease by age 70”, attribute A. Thus, H entails \(ch(Ax, Rx)=r\), for some specific value r (e.g. perhaps \(r = .83\)). Suppose that a 50 year old male named Chuck, c, fits the profile, so Rc holds. Thus, for admissible background information E, \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) is a perfectly good direct inference about Chuck’s chances of developing Alzheimer’s by age 70. E may include whatever admissible background information we may know about medical conditions and medical testing (including brain imaging), about the chance theory H, about Chuck himself, etc.

We may be interested in other indications of whether Chuck will develop Alzheimer’s by age 70, indications that are independent of the information provided by chance theory H. Suppose that by means of an imaging technique it is possible to detect brain plaque of the kind usually associated with Alzheimer’s. The detection of a “moderate accumulation” of this plaque (in a patient like Chuck) does not guarantee that the patient will acquire Alzheimer’s as he ages, but it is an indication of a significantly increased risk of developing the disease. Included among the admissible background knowledge E may be information about this technique and its implications. Let statement Fc state the fact that Chuck undergoes the imaging technique at age 50, and let statement D say that the image of Chuck’s brain shows that a “moderate accumulation” of plaque is present. Presumably, absent the result D, Fc taken together with the other information in E is admissible, so let’s suppose that Fc is included within E. However, the result of this this procedure, D, may well be evidentially relevant to whether or not Chuck will develop Alzheimer’s at age 70. Suppose it indicates an increased likelihood of the onset of Alzheimer’s by age 70: \(P[Ac \,|\, D \cdot Rc \cdot E] > P[Ac \,|\, Rc \cdot E]\).

Regardless of whatever relevance a person’s chances of developing Alzheimer’s by age 70, H, may have to his likelihoods of exhibiting a “moderate accumulation” of brain plaques by age 50, D, the relevance of that chance claim H to image result D is overridden by the claim that the individual will indeed develop Alzheimer’s by age 70, Ac. That is, the fact that a person will develop the disease, Ac, is predictive enough about the amount of plaque build up over time that it overrides the relevance of the chances of developing the disease (expressed by H) to the likelihood of outcome D from a brain scan at age 50. Thus, \(P[D \,|\, Ac \cdot H \cdot Rc \cdot E] = P[D \,|\, Ac \cdot Rc \cdot E]\). Similarly, the fact that a person will not develop the disease, \(\lnot Ac\), is predictive enough about the amount of plaque build up over time that it overrides the relevance of the chances of developing the disease (expressed by H) to the likelihood of outcome D from a brain scan at age 50. Thus, \(P[D \,|\, \lnot Ac \cdot H \cdot Rc \cdot E] = P[D \,|\, \lnot Ac \cdot Rc \cdot E]\).

Thus, in the order discussed, we have the following:

  1. 1.

    \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) is a direct inference about Chuck’s chances of developing Alzheimer’s by age 70, given he fits profile R.

  2. 2.

    \(1> P[Ac \,|\, D \cdot Rc \cdot E]> P[Ac \,|\, Rc \cdot E] > 0\): given membership in risk group R, the fact that a person’s brain scan at age 50 shows a “moderate accumulation” of plaque is positive evidence that the person will develop Alzheimer’s by age 70.

  3. 3.

    \(P[D \,|\, Ac \cdot H \cdot Rc \cdot E] = P[D \,|\, Ac \cdot Rc \cdot E]\) and \(P[D \,|\, \lnot Ac \cdot H \cdot Rc \cdot E]\)\(=\)\(P[D \,|\, \lnot Ac \cdot Rc \cdot E]\): relevance of the chances of developing Alzheimer’s by age 70 (according to hypothesis H) to whether a person’s brain scan at age 50 shows a “moderate accumulation” (statement D) is overridden by the claim that the person will (or will not) develop Alzheimer’s by age 70 (the direct inference outcomes of H in \(\{Ac, \lnot Ac\}\)), given admissible E.

Therefore, the claim that Chuck’s brain scan shows a “moderate accumulation” of plaque, D, defeats the direct inference regarding Chuck’s chances, r, of developing Alzheimer’s by age 70: \(P[Ac \,|\, D \cdot H \cdot Rc \cdot E] \ne P[Ac \,|\, H \cdot Rc \cdot E] = r\), for admissible E. Thus, D (in conjunction with E) must be inadmissible for this direct inference.

Here is the relevant formal result. It shows that whenever a chance hypothesis H satisfies the above “overridden relevance to D” condition for its outcomes \(\{Ac, \lnot Ac\}\), given \((Rc \cdot E)\), statement D must defeat the direct inference from \((H \cdot Rc \cdot E)\) to Ac if and only if D is evidentially relevant to Ac, given \((Rc \cdot E)\).

Corollary of Theorem 9

Inadmissible Evidence for Outcomes.Footnote 25

We assume throughout that \(P[D \cdot H \cdot Rc \cdot E] > 0\) (so that all the conditional probabilities are well-defined).

Let \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) for \(0< r < 0\), be a direct inference for admissible E.

figure a

3.3 The Main Results

The next two theorems provide the main formal results of this section. Each result has two parts. Near the beginning of this section we summarized the first part of the first theorem. Here is an interpretive account of both parts of the first theorem.

Let P be any classical probability function (or rational credence function) that satisfies GG-DIP for the direct inferences from \((H \cdot Rc \cdot E)\) for admissible E. Let \(A_1c\) and \(A_2c\) be any two possible chance outcomes of initial state R for chance system c. Suppose that (according to the credences represented by function P) each of these two chance events overrides (or screens-off) whatever relevance H might have to D, given E (according to P):

\(P[D \,|\, A_kc \cdot H \cdot Rc \cdot E] = P[D \,|\, A_kc \cdot Rc \cdot E]\) for \(k=1, 2\). Then:

  1. (1)

    If D is more likely according to \((A_1c \cdot Rc \cdot E)\) than according to \((A_2c \cdot Rc \cdot E)\) (as represented by P), then \((D \cdot E)\) must defeat one of the direct inferences based on \((H \cdot Rc)\)—i.e. \((D \cdot E)\) falls outside the resiliency sphere of \((H \cdot Rc)\) for P.

  2. (2)

    If, given \((Rc \cdot E)\), either \(A_1c\) is positively supported by D and \(A_2c\) is not positively supported by it, or \(A_2c\) is negatively supported by D and \(A_1c\) is not negatively supported by it, then \((D \cdot E)\) must defeat one of the direct inferences based on \((H \cdot Rc)\)—i.e. \((D \cdot E)\) falls outside the resiliency sphere of \((H \cdot Rc)\) for P.

Here is the formal statement of this result.

Theorem 8

Sufficient Condition for Inadmissible Evidence.

We assume throughout that \(P[D \cdot H \cdot Rc \cdot E] > 0\) (so that all conditional probabilities are well-defined).

Let \(A_1c\) and \(A_2c\) be any two outcomes of initial state Rc such that, for admissible E, the following direct inferences hold:

$$\begin{aligned} P[A_kc \,|\, H \cdot Rc \cdot E] = P[A_kc \,|\, H \cdot Rc] = r_k,\quad \hbox {where}~ 1> r_k > 0,\,\, \hbox {for}~ k=1, 2. \end{aligned}$$

Suppose, for \(k=1, 2\): \(P[D \,|\, A_kc \cdot H \cdot Rc \cdot E] = P[D \,|\, A_kc \cdot Rc \cdot E]\).

figure b

Whereas the first theorem applies for any two chance outcomes of chance hypothesis H, the next theorem relies on outcomes that form a partition. The payoff for this stronger supposition is a biconditional connection between support for (or by) D and the failure of direct inferences.

The first part of this theorem shows that whenever, for each \(B_jc\) in a partition of outcomes of Rc, the support for D by chance hypothesis H is overridden by the support afforded to D by \((Rc \cdot B_jc)\), according to P, the following result holds:

D falls outside the resiliency sphere for the direct inferences based on \((H \cdot Rc \cdot E)\)if and only if D is supported more (or less) by \(B_ic\) than by \(B_jc\), given \((Rc \cdot E)\), for some \(B_ic\) and \(B_jc\) in the partition.

The second part of this theorem shows that under the same conditions stated above for the first part, the following result holds:

D falls outside the resiliency sphere for the direct inferences based on \((H \cdot Rc \cdot E)\)if and only if \(B_kc\) is either positively or negatively supported by D, given \((Rc \cdot E)\), for at least one of the \(B_kc\) in the partition.

Theorem 9

Necessary and Sufficient Condition for Inadmissible Evidence.

We assume throughout that \(P[D \cdot H \cdot Rc \cdot E] > 0\) (so that all conditional probabilities are well-defined).

Let \(\Delta _R(c) = \{B_1c, B_2c, \dots \}\) be some partition of outcomes of initial state Rc for \(P[\ \,|\, H \cdot Rc \cdot E]\) such that, for each \(B_kc\) in \(\Delta _R(c)\), the following direct inferences hold for admissible E:

$$\begin{aligned} P[B_kc \,|\, H \cdot Rc \cdot E] = P[B_kc \,|\, H \cdot Rc] = r_k, \quad \hbox {for}~\, r_k > 0. \end{aligned}$$

Suppose, for each \(B_kc\) in \(\Delta _R(c)\), \(P[D \,|\, B_kc \cdot H \cdot Rc \cdot E] = P[D \,|\, B_kc \cdot Rc \cdot E]\).

Then we have the following result:

figure c

4 “Inappropriate” Credence Functions

It should be pretty clear that, given a specific account of admissibility, not all credence functions are “appropriate” in the way required by G-DIP and GG-DIP. Our next result shows that the axioms of classical probability put tight constraints on precisely which credence functions can get direct inference right. Let P be any “appropriate” initial credence function, which gets direct inferences from \((H \cdot Rc \cdot E)\) to chance outcomes \(A_jc\) right, where E is admissible. Let Q be any credence function that varies from P by even a small shift in the non-direct inference credence for a chance outcome—i.e., such that \(Q[A_jc \,|\, Rc \cdot E] \ne P[A_jc \,|\, Rc \cdot E]\). Then, provided that Q satisfies an additional weak condition, it cannot get all the direct inferences right.

One example of the additional weak condition is that P and Q agree on the amount of evidential support that \((Rc \cdot A_kc \cdot E)\) would provide to H, for each \(A_kc\) in a partition. Another example is where Q comes from P via certain instances of Jeffrey Conditionalization (see Jeffrey 1990). Thus, some rather minor variants of credence functions that satisfy GG-DIP (including some that come about via the kinematics of Jeffrey updating) must fail to satisfy GG-DIP—they fail to count among the “appropriate” credence functions for direct inferences.

For the sake of clarity, we first present our results for binary chance outcomes, Ac and \(\lnot Ac\). We generalize these results in a later subsection.

4.1 Examples of “Inappropriate” Credence Functions

Consider the Alzheimer’s example described in Sect. 3. Chance hypothesis H says that the chance of an individual in reference class R getting Alzheimer’s by age 70 is r; Rc says that Chuck is in reference class R; and Ac says that Chuck will get Alzheimer’s by age 70. Suppose that Maria and John agree on the amount of evidential support that \((Rc \cdot Ac)\), were it true, would supply to chance hypothesis H, given all their other relevant evidence E (on which they completely agree): \(Q[H \,|\, Rc \cdot Ac \cdot E]\)\(=\)\(P[H \,|\, Rc \cdot Ac \cdot E]\), where P is Maria’s credence function and Q is John’s credence function. And also suppose they agree on the amount of evidential support that \((Rc \cdot \lnot Ac)\), were it true, would supply to chance hypothesis H, given all their other relevant evidence E: \(Q[H \,|\, Rc \cdot \lnot Ac \cdot E]\)\(=\)\(P[H \,|\, Rc \cdot \lnot Ac \cdot E]\). However, Maria is somewhat more optimistic than John about Chuck’s future health, particularly his prospects of getting Alzheimer’s by age 70; thus, \(Q[Ac \,|\, Rc \cdot E] < P[Ac \,|\, Rc \cdot E]\).

Although neither Maria nor John is confident that chance hypothesis H is true, both want to draw the correct direct inference value, r, when H is added to their total admissible evidence \((Rc \cdot E)\): \(P[Ac \,|\, H \cdot Rc \cdot E] = r\) and \(Q[Ac \,|\, H \cdot Rc \cdot E] = r\). However, it turns out that at least one of them must get the direct inference wrong, since: \(P[Ac \,|\, H \cdot Rc \cdot E] \ne Q[Ac \,|\, H \cdot Rc \cdot E]\). That is, if Maria gets the direct inference right, then John must get it wrong.

Corollary of Theorem 10

“Inappropriate” Credence Functions.

Suppose, for admissible E, \(P[Ac \,|\, H \cdot Rc \cdot E] = r > 0\) is a direct inference. (We assume \(P[H \cdot Rc \cdot Ac \cdot E] > 0\) and \(P[H \cdot Rc \cdot \lnot Ac \cdot E] > 0\), so that all the conditional probabilities are well-defined.)

Suppose that probability function Q is related to P in the following way (where \(Q[Rc \cdot Ac \cdot E] > 0\) and \(Q[Rc \cdot \lnot Ac \cdot E] > 0\)):

\(Q[H \,|\, Rc \cdot Ac \cdot E] = P[H \,|\, Rc \cdot Ac \cdot E]\) and \(Q[H \,|\, Rc \cdot \lnot Ac \cdot E] = P[H \,|\, Rc \cdot \lnot Ac \cdot E]\).

Then, \(Q[Ac \,|\, Rc \cdot E] \ne P[Ac \,|\, Rc \cdot E]\)if and only if \(Q[Ac \,|\, H \cdot Rc \cdot E] \ne P[Ac \,|\, H \cdot Rc \cdot E]\).

And, if \(Q[H \,|\, Rc \cdot E] \ne P[H \,|\, Rc \cdot E]\), then \(Q[Ac \,|\, H \cdot Rc \cdot E] \ne P[Ac \,|\, H \cdot Rc \cdot E]\).

Proof

Follows immediately from setting \(\Delta (c)=\{Ac, \lnot Ac\}\) in the more general Theorem 10 below. \(\square \)

Jeffrey Conditionalization is the best known approach to the representation of learning based on uncertain new evidence. It deals with cases where, rather than learning by becoming certain of new information F, the agent has an experience or an insight that directly changes her confidence in the truth of each alternative among some range of possibilities, \(\{F_1, F_2, \dots , F_n\}\). Formally, when P is the agent’s initial credence function, her new information induces a new credence function Q that directly assigns new credence values to the directly affected alternative possibilities in \(\{F_1, F_2, \dots , F_n\}\) as follows: \(Q[F_i \cdot F_k] = 0\) (they are alternative possibilities), \(\sum _{j=1}^n Q[F_j] = 1\) (they are a complete collection of alternative possibilities). The relationship between the old credence function P and the new credence function Q is this: \(Q[G \;|\, F_j] = P[G \;|\, F_j]\), for all statements G, for each \(F_j\) in \(\{F_1, F_2, \dots , F_n\}\). That is, were the agent to become certain of any one of the statements \(F_j\), her new credence value, \(Q[G \;|\, F_j]\) should be identical to the old credence value, \(P[G \;|\, F_j]\) (for each statement G). It follows immediately that, for each statement G, the new credence value is given by \(Q[G] = \sum _{j=1}^n P[G \;|\, F_j] \times Q[F_j]\).Footnote 26 We now consider a case where Jeffrey Conditionalization (or a similar update method) induces a new credence function that must get direct inferences wrong.

Consider once again the Alzheimer’s example from Sect. 3. As before, chance hypothesis H says that the chance of an individual in reference class R getting Alzheimer’s by age 70 is r; Rc says that Chuck is in reference class R; and Ac says that Chuck will get Alzheimer’s by age 70; statement D says that Chuck’s brain scan at age 50 shows a “moderate accumulation” of plaque. Suppose (this time) that Maria considers the relevance of the chance claim H to brain imaging result D be overridden by the claim, “Chuck gets Alzheimer’s by age 70” (if added as a premise): \(P[D \,|\, H \cdot Rc \cdot Ac \cdot E] = P[D \,|\, Rc \cdot Ac \cdot E]\). Similarly, suppose Maria considers the relevance of the chance claim H to brain imaging result D be overridden by the claim, “Chuck does not get Alzheimer’s by age 70” (if added as a premise): \(P[D \,|\, H \cdot Rc \cdot \lnot Ac \cdot E] = P[D \,|\, Rc \cdot \lnot Ac \cdot E]\). Furthermore, suppose Maria isn’t privy to the result of Chuck’s brain scan, but she overhears two technicians talking about it. What she hears is vague (mostly tone of voice), but her impression changes her credence from \(P[Ac \,|\, Rc \cdot E] = s\) to \(Q[Ac \,|\, Rc \cdot E] = t > s\). Maria updates her credences via Jeffrey Conditionalization, according to (1) and (2) below. Thus, her new credence function must get the direct inference (concerning Chuck having Alzheimer’s by age 70) wrong: \(Q[Ac \,|\, H \cdot Rc \cdot E]\ne r = P[Ac \,|\, H \cdot Rc \cdot E]\), for admissible E.

Corollary of Theorem 11

“Inappropriate” Credence Functions, Extended.

figure d

Proof

Follows immediately from setting \(\Delta (c)=\{Ac, \lnot Ac\}\) and \(\Gamma =\{D,\lnot D\}\) in the more general Theorem 11 below. \(\square \)

Our result here fits the pattern of Jeffrey Conditionalization, but our result is more general. For, the result itself doesn’t assume that every statement is updated via the Jeffrey update formula; it only supposes that the update formula applies to \((Rc \cdot Ac)\), \((Rc \cdot \lnot Ac)\), \((H \cdot Rc \cdot Ac)\), and \((H \cdot Rc \cdot \lnot Ac)\). Furthermore, the result itself says nothing about updating, and need not be interpreted that way. Rather, the result applies to any pair of credence functions, Q and P, whatever their origins. The result says that for any credence function P that satisfies the initial suppositions, and for any credence function Q related to P as specified by conditions (1) and (2), when they disagree on the credence values for chance outcome Ac based on \((Rc \cdot E)\) alone, then (and only then) at least one of them must get the direct inference wrong; so at least one of them must be an “inappropriate” credence function according to GG-DIP.

4.2 Generalization to Algebras of Outcomes

We now state the main results of this section in a more general form. The corollaries stated earlier follow directly from these.

Theorem 10

“Inappropriate” Credence Functions.

Let \(\Delta (c) = \{B_1c, B_2c, \dots \}\) be a partition for \(P[\ \,|\, Rc \cdot E]\), where according to H the members of \(\Delta = \{B_1, B_2, \dots \}\) are chance outcome attributes for systems in state R, and where, for admissible E, \(P[B_kc \,|\, H \cdot Rc \cdot E] = r_k > 0\) are direct inferences. (We assume \(P[H \cdot Rc \cdot B_kc \cdot E] > 0\) for each \(B_kc\) in \(\Delta (c)\), so that all the conditional probabilities are well-defined.)

Suppose probability function Q is related to P in the following way, where \(Q[Rc \cdot B_kc \cdot E] > 0\) for each \(B_kc\) in \(\Delta (c)\):

figure e

The next theorem applies to all cases where probability function Q comes from function P via Jeffrey Conditionalization, but it applies to lots of other Q functions as well. Conditions (3.1) and (3.2) of the theorem only require the weaker claim that the \(Q[\ \,|\, E]\) values for expressions \((Rc \cdot B_jc)\) and \((H\cdot Rc \cdot B_jc)\) (for each \(B_jc\) in \(\Delta (c)\)) are related to their \(P[\ \,|\, E]\) values by Jeffrey’s formula on partition \(\Gamma = \{D_1, D_2, \dots \}\). Full Jeffrey Conditionalization would require the stronger claim that the \(Q[\ \,|\, E]\) values for all expressions are related to their \(P[\ \,|\, E]\) values via Jeffrey’s formula on partition \(\Gamma = \{D_1, D_2, \dots \}\). When full Jeffrey Conditionalization applies, the supposition that \(\Delta (c)\) is a partition for \(Q[\ \,|\, Rc \cdot E]\) (supposition (2)) is derivable from Jeffrey’s formula, since \(\Delta (c)\) is a partition for \(P[\ \,|\, Rc \cdot E]\).

Theorem 11

“Inappropriate” Credence Functions, Extended.

Let \(\Delta (c) = \{B_1c, B_2c, \dots \}\) be a partition for \(P[\ \,|\, Rc \cdot E]\), where according to H the members of \(\Delta = \{B_1, B_2, \dots \}\) are chance outcome attributes for systems in state R, and where, for admissible E, \(P[B_kc \,|\, H \cdot Rc \cdot E] = r_k > 0\) are direct inferences. (We assume \(P[H \cdot Rc \cdot B_kc \cdot E] > 0\) for each \(B_kc\) in \(\Delta (c)\), so that all the conditional probabilities are well-defined.)

Suppose probability function Q is related to P in the following way, where \(Q[Rc \cdot B_kc \cdot E] > 0\) for each \(B_kc\) in \(\Delta (c)\), and where \(\Gamma = \{D_1, D_2, \dots \}\) is a partition for \(Q[\ \,|\, E]\), with each \(Q[D_i \,|\, E] > 0\):

figure f

5 Reference Class Problems

Accounts of direct inference, Bayesian or not, often encounter troubles in dealing with overlapping reference classes or initial chance states. Lots of ink has been spilt trying to sort out these problems.Footnote 27 In this section we raise some troubles for Bayesian accounts. We focus on issues that arise when the object language notion, ch, is some kind of objective chance. (Frequency accounts have distinct troubles of there own.) We will suggest some ways a Bayesian account may deal with these troubles.

5.1 Defeat by Outcome Attributes

Consider the case where an extensive chance hypothesis H entails chances for at least two distinct outcome attributes, Ax and Bx, for initial state R—i.e. Ax and Bx are members of the algebra of outcome attributes \(\Theta _R\). Then it will usually be the case that possible outcome Bc for system c defeats the direct inference from \((H \cdot Rc \cdot E)\) to outcome Ac, for admissible E:Footnote 28

$$\begin{aligned} P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E] \ne P[Ac \,|\, H \cdot Rc \cdot E] = r. \end{aligned}$$

Defeat of this kind turns out to be easy to finesse. Indeed, when H is an extensive chance hypothesis, as defined earlier, defeat of this kind turns into a direct inference success. For, whenever an extensive chance hypothesis H entails \(ch(Ax,Rx) =r\), and Bx is a chance attribute for Rx according to H, then H must also entail \(ch(Bx,Rx)=s\) and \(ch(Ax\cdot Bx,Rx)=t\), where s and t are standard terms for real numbers. Thus, for admissible E, the following two direct inferences result:

$$\begin{aligned} P[Bc \,|\, H \cdot Rc \cdot E] = s ~\hbox {and}~ P[Ac\cdot Bc \,|\, H \cdot Rc \cdot E] = t. \end{aligned}$$

So, although Bc defeats the simple direct inference to Ac, we still obtain the direct inference we should want, but we get it via the following “complex direct inference”:

$$\begin{aligned} P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E] = P[Ac\cdot Bc \,|\, H \cdot Rc \cdot E] / P[Bc \,|\, H \cdot Rc \cdot E] = t/s. \end{aligned}$$

This is exactly the value we should want for \(P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E]\). And we’ve gotten it without complicating the account of chance by taking on a notion of conditional chance. That is, when Bx is an outcome attribute for Rx, the Bayesian machinery yields the desired direct inference value for Ax without needing to draw on chance expressions of form \(ch(Ax ,Rx\cdot Bx)=q\).Footnote 29 This approach avoids drawing on the notion of conditional chance, and the attendant difficulties identified by Humphreys (1985). It also benefits by not requiring the account of chance to make sense of expressions that conditionalize on outcome attributes: when Bx is an outcome attribute for Rx, what does an expression of form \(ch(Ax ,Rx\cdot Bx)=q\)say?Footnote 30

One more point before moving on. The treatment described above works well for extensive chance hypotheses. But what about cases where H is not extensive, say, where H only entails one of \(ch(Bx,Rx)=s\) or \(ch(Ax\cdot Bx,Rx)=t\). In that case, although Bc should defeat the direct inference to Ac,

$$\begin{aligned} P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E] \ne P[Ac \,|\, H \cdot Rc \cdot E] = r \end{aligned}$$

The Bayesian direct inference approach doesn’t produce a chance-based value for \({P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E]}\). Is this a problem for the Bayesian account?

By Bayesian lights, not at all. The incomplete, non-extensive chance hypothesis cannot supply the desired direct inference, but this is just as it should be! First, recall that the present account of direct inference doesn’t suppose that the agent is certain of the chance hypothesis involved. Application of the Bayesian direct inference principle (GG-DIP or G-DIP) only supposes that the agent’s total evidence is expressed by \((Rc \cdot E)\), or by \((Rc \cdot Bc \cdot E)\) in this case, and contemplates the appropriate credence value when a chance hypothesis is added (as an additional premise) to this evidence. It does not suppose that the agent’s total evidence contains the chance hypotheses on which the direct inferences depend. The main issue for the theory of direct inference is to determine the conditions under which the addition of a chance hypothesis (however well confirmed) to an agent’s total evidence specifies appropriate direct inferences to possible outcomes. In this regard, the direct inference principle does not privilege any one chance hypothesis over another.

So, one plausible Bayesian line goes like this. It is not at all surprising that an incomplete chance hypothesis may fail to produce a direct inference when it fails to specify appropriate chance claims. The failure of the Bayesian account to produce direct inferences in such cases is not a fault of the account. Indeed, when hypothesis H doesn’t include the chance claim \(ch(Ax,Rx)=r\), it is no fault of the Bayesian account that it fails to produce the direct inference \(P[Ac \,|\, H \cdot Rc \cdot E] = r\). Similarly, when an incomplete chance hypothesis H fails to supply one of the chance claims \(ch(Bx,Rx)=s\) or \(ch(Ax\cdot Bx,Rx)=t\), it is no fault of the Bayesian account that it fails to produce one of the direct inferences \(P[Bc \,|\, H \cdot Rc \cdot E] = s\) or \(P[Ac\cdot Bc \,|\, H \cdot Rc \cdot E] = t\), and so fails to produce the appropriate direct inference \(P[Ac \,|\, H \cdot Rc \cdot Bc \cdot E] = t/s\). In such a case, a more filled-out extension of H hypothesizes specific chance values for ch(BxRx) and \(ch(Ax\cdot Bx,Rx)\), and can thereby supply the appropriate direct inferences. If an agent lacks confidence in any of the filled-out extensions of H, then she simply needs to acquire more evidence for (or against) them, in the usual Bayesian way.

5.2 Competing Chance Claims

We now turn to cases where two chance claims may compete for direct inference priority. This can only happen when two chance claims about the same outcome attribute have “overlapping reference classes”—i.e. when some chance systems can be in two distinct initial chance states, R and S, at the same time, and where both initial chance states provide chances for the same outcome attribute, A. Bayesian direct inference runs into some trouble in trying to accommodate this situation. We’ll suggest some ways that the Bayesian account may deal with these troubles.

Let \(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot E] = r\) be a perfectly good direct inference (for admissible E). Then, presumably, \(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax, Sx)=s \cdot E] = r\), where \(s\ne r\), should also be a perfectly good direct inference. The addition of some chance claim \(ch(Ax,Sx)=s\) should not be problematic for such straightforward direct inferences. Otherwise, extended chance hypotheses, involving multiple chance claims, would be unable to ground direct inferences. Now, the usual way to raise “multiple reference class problems” for direct inference goes like this. Suppose we add Sc as a premise to this direct inference. This clearly must defeat the direct inference, since we have two equally good but incompatible direct inferences:

\(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Sx)=s \cdot Sc \cdot E] = r \ne s =\)

\(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Sx)=s \cdot Sc \cdot E]\). Thus, we must have

\(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Sx)=s \cdot Sc \cdot E] \ne r\) (or \(\ne s\)).Footnote 31

What happens when \(\lnot Sc\), instead of Sc, is added as a premise? Since Sc defeats the direct inference, it’s negation must also defeat it (see Theorem 2), so: \(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Sx)=s \cdot \lnot Sc \cdot E] \ne r\).

Now, on the usual story, this kind of defeat may be averted when state Sx is a sub-state of Rx—when every possible system in state Sx must also be in state Rx. We may express this as \(\forall x (Sx\supset Rx)\) if the quantifier is taken to range over all possible systems, or modally as \(\Box \forall x (Sx\supset Rx)\) when the quantifier is more restricted. The sub-state claim can then be expressed by adding this statement to the premise of the the direct inference. However, for our purposes the same idea can be expressed by replacing the chance claim \(ch(Ax,Sx)=s\) in the above example with the claim \(ch(Ax,Rx\cdot Sx)=s\). With this replacement, the following should be a perfectly good direct inference: \(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Rx\cdot Sx)=s \cdot Sc \cdot E] = s\).Footnote 32

That’s the usual idea. But it presents problems in a Bayesian context. Here is why. Let H be \((ch(Ax,Rx)=r \cdot ch(Ax,Rx\cdot Sx)=s)\). Let’s suppose (as seems reasonable) that s can be quite far away from r.Footnote 33 Consider the following equation, which follows from the axioms of probability theory, assuming that \(0< P[H \cdot Rc \cdot Sc \cdot E] < 1\) and \(0< P[H \cdot Rc \cdot \lnot Sc \cdot E] < 1\):

$$\begin{aligned} r= & {} P[Ac \,\,|\,\, H \cdot Rc \cdot E] \\= & {} P[Ac \,\,|\,\, H \cdot Rc \cdot Sc \cdot E] \times P[Sc \,\,|\,\, H \cdot Rc \cdot E] + P[Ac \,\,|\,\, H \cdot Rc \cdot \lnot Sc \cdot E] \\&\times \, (1 - P[Sc \,\,|\,\, H \cdot Rc \cdot E]) \\= & {} s \times P[Sc \,\,|\,\, H \cdot Rc \cdot E] + P[Ac \,\,|\,\, H \cdot Rc \cdot \lnot Sc \cdot E] \\&\times \, (1 - P[Sc \,\,|\,\, H \cdot Rc \cdot E]), \end{aligned}$$

provided that \((Rc \cdot E)\) and \((Rc \cdot Sc \cdot E)\), respectively, are admissible for the two direct inferences \(P[Ac \,\,|\,\, H \cdot Rc \cdot E] = r\) and \(P[Ac \,\,|\,\, H \cdot Rc \cdot Sc \cdot E] = s\). However, in the normal course of events an agent’s total evidence may push the value of her credence, \(P[Sc \,\,|\,\, H \cdot Rc \cdot E]\), close to 1. When that happens, the value of \(P[Ac \,\,|\,\, H \cdot Rc \cdot E]\) must approach s.Footnote 34 This contradicts the supposition that \(P[Ac \,\,|\,\, ch(Ax,Rx)=r \cdot Rc \cdot ch(Ax,Rx\cdot Sx)=s \cdot E]\) equals r, the value the direct inference should apparently have.

Notice that this analysis doesn’t really depend on whether E itself provides evidence for or against Sc. Even in cases where the evidence E says nothing about Sc, the value an agent assigns to \(P[Sc \,\,|\,\, H \cdot Rc \cdot E]\) (perhaps only due to her gut feeling) may force her credence value for Ac to significantly depart from the direct inference value based on \(ch(Ax,Rx)=r\).

One Bayesian response to this problem is to restrict the agent’s possible credence values for Sc so as not to permit the defeat of the direct inference unless E contains explicit evidence for or against Sc. Direct inference restricts other credence values, including the value for Sc. That should not be at all surprising. Any axiom or constraint added to the usual axioms for conditional probabilities is bound to result in the propagation of constrains on credence values throughout the system. Given the way that the credence value for Sc depends on the recommended direct inference value for Ac, one may simply maintain that the direct inference rule provides a kind of objectivist Bayesian constraint on what credence values Sc may take.Footnote 35

However, Bayesians are also free to reject this kind of constraint on credence values for Sc, provided they can find some other way to accommodate the above analysis. For instance, they may adopt a more straightforward response to this problem: simply void (or invalidate) the direct inference \(P[Ac \,\,|\,\, H \cdot Rc \cdot E] = r\) in all cases where H contains a chance claim, \(ch(Ax,Rx\cdot Sx)=s\), based on a more specific initial chance state than Rx. This may be a more coherent view than the “objectivist” approach described above. For, clearly in some cases the credence for Sc may be near 1 based on good evidence, stated within E. In such cases the agent’s credence for Ac should be close to s rather than r. But then, precisely how much evidence, and of what kind, must occur within E to warrant a value of \(P[Sc \,\,|\,\, H \cdot Rc \cdot E]\) that can break the direct inference based on \(ch(Ax,Rx)=r\)? Rather than try to parse this tricky issue (which may have no clear solution), it may make better sense to simply let the presence of the more specific chance claim override the weaker chance claim, as the above analysis seemed to initially suggest.

One of the authors (Wallmann) takes the overall thrust of this analysis to show that Bayesian direct inference cannot work properly—that it should be rejected in favor of some more lenient, more intuitively plausible account of direct inference. The idea that a direct inference based on \((ch(Ax,Rx)=r \cdot Rc)\) should be defeated simply by the presence of some additional chance claim that draws on a more specific chance state, \(ch(Ax,Rx\cdot Sx)=s\), absent an assertion of the applicability of that chance claim, \((Rc\cdot Sc)\), just seems too implausible. The other author finds the above Bayesian response both acceptable and reasonable, although he finds it somewhat surprising that the Bayesian account of direct inference leads to this view.

A further move in the spirit of the “straightforward approach” suggested above is a Bayesian approach that rules out the very possibility of overlapping initial chance-states that have outcome attributes in common.Footnote 36 This chance-state overlap restriction has an important precedent. Our best indeterministic scientific theory, quantum theory, does not draw on overlapping initial quantum states. Each quantum system is in precisely one basic quantum state at any given time, and that state completely accounts for chances of quantum outcomes (upon system collapse, or upon “measurement”). To make good on this view, we need an account of how the usual kinds of chance models of macro-systems can be accommodated within the Bayesian direct inference framework without drawing on overlapping initial states that have outcome attributes in common.

When a chance hypothesis asserts that the chance of Ax (dying by age 75) for systems in chance state Rx (male in good health at age 50), the applicability of the chance claim, \(ch(Ax,Rx)=r\), is of little import if it fails to account for important risk factors. For instance, if it hasn’t taken into account whether (and how much) an individual smokes, Sx, then it doesn’t tell you much of anything about anyone’s individual chances. So, perhaps \(ch(Ax,Rx\cdot Sx)=s\) is the more relevant chance claim for Chuck. And if state Sx is relevant, so is state \(\lnot Sx\), which yields some chance claim \(ch(Ax,Rx\cdot \lnot Sx)=t\). Indeed, the amount an individual smokes is relevant, so instead of Sx and \(\lnot Sx\), perhaps a range of alternatives, describing amount smoked, and for how many years, is in order: \(ch(Ax,Rx\cdot S_jx)=s_j\) for a range of categories \(S_jx\). So, supposing Chuck is a 50 year old male in good health who has never smoked, does \(ch(Ax,Rx\cdot S_0x)=s_0\) capture his chances of dying by age 75? How much does Chuck drink? Is he engaged in a particularly hazardous occupation? The point is that Chuck’s chances depend on the most specific relevant chance state to which he belongs, according to the most specific, accurate chance hypothesis we can develop (and evidentially support) about people in various initial states of health. Anything less is at best an approximation of Chuck’s real chances.Footnote 37

A Bayesian approach that excludes overlapping initial chance states will need to draw on hypotheses about approximate chance models, where these chance models rely on most basic initial chance states—chance states that are most basic according to the model. Associated with any given chance model is a chance hypothesis that asserts that the model fits the real world to some specified degree of approximation. Fitting the world means capturing the most significant causal factors and their associated chances for producing various kinds of outcomes. Evidence for such hypotheses confirms those that do the best job of capturing the most significant causal factors. Such approximations of chance mechanisms is the best we can hope for within the special sciences. So, the fact that a Bayesian approach to direct inference needs to draw on hypotheses about chance models for macroscopic systems (and the basic initial chance states posited by such models) is no defect. Any theory of direct inference, Bayesian or not, will need to accommodate hypotheses about approximate chance models, since that’s the best the special sciences can offer. And each such model will have chance states that are most basic for that model.

6 Conclusion

In this paper we’ve identified a variety of different kinds statements that are logically inadmissible for Bayesian direct inference. Such statements must defeat direct inferences on any coherent Bayesian account. In particular, whenever such information is available to the Bayesian agent, it supplies credence values for chance outcomes that significantly depart from the fair betting odds represented by objective chance statements. One of the authors (Wallmann) finds these results so counter-intuitive that he advocates giving up Bayesian direct inference.Footnote 38 He favors some alternative account on which direct inferences remain intact when faced with such information. The other author thinks that whenever an agent is in possession of such information, those deviations from objective chance values required by the Bayesian account make good sense. We agree that the Bayesian account places severe constraints on the theory of chance. Whether the costs imposed by these constraints are paid for by the avowed Bayesian benefits remains unresolved, for now.