1 Introduction

Suppose that a rational agent receives evidence E. According to the Bayesian model of reasoning (hereafter: Bayesianism), the agent will update her degree of belief in H by conditioning on E. According to the inference to the best explanation (hereafter: IBE) model, if a hypothesis H provides an adequate explanation for E, while the rival hypotheses in consideration do not, then she has a reason to favor H.

Bayesianism and IBE have been the most widely adopted models of scientific reasoning. However, van Fraasen (1989) claimed that they are mutually incompatible and that Bayesianism is correct. Therefore, IBE is wrong.

Of course, proponents of IBE reject this conclusion. One possible reply is simply to say that IBE is right and Bayesianism is wrong. This is not entirely implausible, especially because Bayesianism has been criticized on independent grounds: it suffers from the old evidence problem (Glymour, 1980) and asks the agent to assign a single real number to every proposition (Kyburg & Pittarelli, 1996; Joyce, 2010; Bradley, 2019), among other issues.

However, most advocates of IBE have adopted a different strategy: they have defended the compatibility of IBE and Bayesianism (e.g., Lipton, 2004; Weisberg, 2009; Henderson, 2013). Unfortunately, the simplest form of a hybrid model has a serious problem (van Fraasen, 1989, 131–82). According to this model, a rational agent is supposed to assign a higher credence to a hypothesis H than the agent would by conditioning on E if H provides an adequate explanation for E while its competitors fail to do so. The problem is that if the agent actually updates her credence in H in this way, she will become susceptible to a diachronic Dutch book. Therefore, to defend the compatibilist view, a more sophisticated hybrid model is needed.

This paper provides such a hybrid model, with two novel features. First, it combines IBE with imprecise Bayesianism rather than the standard form (Kyburg & Pittarelli, 1996; Joyce, 2010; Bradley, 2019).Footnote 1 Second, in this new model, the domain of the agent’s credence function can be extended (Halpern, 2005). As a result, we can deal with cases in which the agent updates her credence in H by considering a newly introduced sentence, X. In particular, X can be a sentence about the explanatory relationship between H and E.Footnote 2

The remainder of this paper proceeds as follows. Section 2 presents IBE and imprecise Bayesianism. Section 3 presents a new model of scientific reasoning, which incorporates both explanationist and probabilist elements, called “explanationist imprecise Bayesianism” (EIB). In Sect. 4, EIB is compared with Douven’s (2013) extra boost view, criticized as having the diachronic Dutch book problem.Footnote 3 Interestingly, EIB does not have this problem. Moreover, when both EIB and extra boost view can be used for updating, EIB maximizes expected practical and epistemic utilities of future actions and credences. Section 5 compares EIB with constraint-based compatibilism. To evaluate it, the paper discusses two purposes served by epistemological theories and argue that EIB serves them better than constraint-based compatibilism. In Sect. 6, EIB will be defended from Roche and Sober’s general objection to the probabilistic forms of IBE. Section 7 concludes that EIB is the best of the three hybrid models discussed in this paper.

2 Preliminaries

As previously mentioned, the ultimate goal of this paper is to construct a new hybrid model of scientific reasoning from IBE and imprecise Bayesianism. To begin, IBE and imprecise Bayesianism must first be explained.

To understand IBE, one must first understand what an explanation is and what makes one explanation better than another. However, there are many competing accounts of explanation in the literature: the covering law account (Hempel, 1965), the causal account (Lewis, 1986) and the unification account (Kitcher, 1989), to name a few. Furthermore, although most philosophers agree that simplicity and coherence are explanatory virtues, they disagree about the complete list.Footnote 4 This paper does not seek to settle these debates. Instead, two minimal assumptions about explanation and explanatory virtues are introduced: a hypothesis may explain evidence, and sometimes one hypothesis explains the given evidence better than another. For example, suppose that we adopt the causal account of explanation and simplicity as an explanatory virtue. Then, if each of A and B includes information about the causal history of E and both A and B explain evidence E, then, between A and B, the simpler one is the better explanation of E. The same can be said about other combinations.Footnote 5

Given the above assumption, one may try to formulate IBE as follows: let \(H_{1},\ldots ,H_{n}\) be the competing hypotheses; then, one can infer \(H_{i\in \left\{ 1,\ldots ,n\right\} }\) from evidence E if \(H_{i}\) is the best explanation of E among \(H_{1},\ldots ,H_{n}\). However, there is a catch. If none of \(H_{1},\ldots ,H_{n}\) provides a sufficiently good explanation of E, then even the best \(H_{i}\) might be unlikely to be true (van Fraasen, 1989, 142–3). Another potential problem is that the above formulation only applies to the case where E has already been received, not to, for example, the case where a scientist ponders which hypothesis would explain her experiment’s possible future result E. To fix these flaws, let us say that a hypothesis \(H_{i}\) adequately explains E iff, if \(H_i\) and E were true, then \(H_{i}\) would be the best and sufficiently good explanation of E among \(H_{1},\ldots ,H_{n}\). Given this definition, we can better formulate IBE: one can infer \(H_{i\in \left\{ 1,\ldots ,n\right\} }\) from evidence E iff \(H_{i}\) provides an adequate explanation of E among \(H_{1},\ldots ,H_{n}\) (Lipton, 2004, 148–50).Footnote 6

To formulate imprecise Bayesianism, probability is defined as follows. Consider a language \({\mathcal {L}}\). Its syntax is not specified in detail, but it is assumed that \({\mathcal {L}}\) has truth functional operators and \({\mathcal {L}}\)-sentences include explanatory sentences, interepreted as “\(\alpha\) adequately explaines \(\beta\),” where \(\alpha\) and \(\beta\) denote a hypothesis and evidence. For any set \({\mathfrak {F}}\) of \({\mathcal {L}}\)-sentences, we will say that \({\mathfrak {F}}\) is an algebra (defined over \({\mathcal {L}}\)-sentences) iff it is closed under disjunction and negation. Next, p is a probability function (defined over \({\mathfrak {F}}\)) iff \(p:{\mathfrak {F}}\rightarrow \left[ 0,1\right]\) and p satisfies non-negativity, normality, and additivity (formally: \(p(\cdot )\ge 0\); for any \({\mathcal {L}}\)-sentence X, if \(\models X\), then \(p(X)=1\); and for any \({\mathcal {L}}\)-sentences Y and Z, if \(\models \lnot (Y \& Z)\), then \(p(Y\vee Z)=p(Y)+p(Z)\)).Footnote 7 Then, the precise version of Bayesianism (hereafter: precise Bayesianism) states that a rational agent a’s doxastic state at \(t_{_{k}}\) is modeled by a single probability function \(c_{k}\left( \cdot \right)\), and a updates her doxastic states by conditioning on total evidence E (formally, \(c_{k+1}\left( \cdot \right) =c_{k}\left( \cdot |E\right)\)).

Precise Bayesianism is a very powerful tool for solving many philosophical problems. However, as already mentioned, it has problematic features. Most notably, it implies that a rational agent assigns a single real number to every sentence. To see why this is problematic, consider the following example (Halpern, 2005, 25). Example 1 Alex draws a random ball from an urn. He knows that, before drawing the ball, 30 percent of the balls in the urn are blue and the remainder are either red or yellow. Thus, Alex knows that the randomly chosen ball is (B) blue, (R) red, or (Y) yellow. However, he does not have any hint of the ratio of red balls to yellow ones. In this situation, it is natural to think that Alex should assign .3 to B, but it is unclear which credence he ought to assign to R or to Y. Since the situation is symmetrical, should he assign .35 to R and Y each? However, Alex does not appear to have statistical information that justifies the assignment of .35 to R or to Y.

Instead, we may regard Alex as justified in having an indefinite opinion about the probabilities of R and Y. To capture this idea, a more complex model of a probabilistic opinion has been used. Define a credal state \({\mathfrak {C}}\) as a collection of probability functions p defined over an algebra \({\mathfrak {F}}\). Given this definition, the whole credal state \({\mathfrak {C}}_{k}\), not any particular member of it, shall be used to model a rational agent’s indefinite doxastic state at \(t_{k}\). In addition, for any \(X,Y\in {\mathfrak {F}}\), we define \({\mathfrak {C}}_{k}\left( X\right)\) as \(\left\{ p\left( X\right) | p\in {\mathfrak {C}}_{k}\right\}\) and \({\mathfrak {C}}_{k}\left( X|Y\right)\) as \(\left\{ p\left( X|Y\right) | p\in {\mathfrak {C}}_{k}\right\}\). When these values are not singletons, we shall call them “imprecise (conditional) credences.” For any \(X,Y\in {\mathfrak {F}}\), we define \(\overline{{\mathfrak {C}}_{k}}\left( X\right)\) as \(\sup _{p\in {\mathfrak {C}}_{k}}p\left( X\right)\) and \(\underline{{\mathfrak {C}}_{k}}\left( X\right)\) as \(\inf _{p\in {\mathfrak {C}}_{k}}p\left( \cdot \right)\). We define \(\overline{{\mathfrak {C}}_{k}}\left( \cdot |\cdot \right)\) and \(\underline{{\mathfrak {C}}_{k}}\left( \cdot |\cdot \right)\) in the same way. Finally, we shall call the common domain \({\mathfrak {F}}_{k}\) of all \(p\in {\mathfrak {C}}_{k}\) “the domain of \({\mathfrak {C}}_{k}\).”

In most models of imprecise credal states, the domain of the probability functions is assumed to be fixed. In this paper, however, this assumption is loosened: the domain of an earlier credal state should be a subset of but not necessarily identical to that of a later credal state. (Formally, \({\mathfrak {F}}_{0}\subseteq {\mathfrak {F}}_{1} \subseteq {\mathfrak {F}}_{2}\subseteq \ldots {\mathfrak {F}}_{k} \subseteq {\mathfrak {F}}_{k+1}\ldots\)). This kind of formal framework is especially useful when modeling an agent who comes to form an opinion regarding a sentence X that she had absolutely no opinion about beforehand.

Another important issue is how to update imprecise credal states. A popular suggestion is to use a slightly modified form of conditioning (Halpern, 2005, 81):

$$\begin{aligned} {\mathfrak {C}}_{k+1}=\left\{ p\left( \cdot |E\right) |p\in {\mathfrak {C}}_{k},p\left( E\right) >0\right\} , \end{aligned}$$

where E is the given agent’s evidence at \(t_{k+1}\). This paper also adopts it as the rule for updating imprecise credences but not exactly in the above form because this form of conditioning assumes that \({\mathfrak {C}}_{k}\) and \({\mathfrak {C}}_{k+1}\) share the same domain. Later, we will discuss how to fix this flaw.

3 Explanationist Imprecise Bayesianism

According to Lipton, IBE and Bayesianism serve different purposes (2004, 105). The former is meant to be a descriptive theory. It is designed to mimic the common patterns of a real agent’s inferences. By contrast, the latter is intended to be a normative theory. As such, it does not state how an ordinary agent actually updates her credal state. Rather, it states how an agent ought to respond to her newly learned data. For example, a normal agent may fail to update her credal state by conditioning, but even in such a case, it still holds true that she ought to have updated it by conditioning.Footnote 8

If so, what is the point of trying to combine the two models? After all, we cannot expect that real agents’ doxastic states will comply with the axioms of probability theory perfectly or that they will always change their degrees of belief by conditioning on evidence. Conversely, even if IBE provides an excellent model for real agents’ inference patterns, we may still wonder how they should have updated their doxastic states. Perhaps it is better to keep the two theories separate.

However, it is not always easy to draw a clear line between normative and descriptive theories of reasoning. Consider the following case. An agent learns evidence E, and based on this, she tries to judge which of the hypotheses A and B is more probable. In making this judgment, she wants to take the following factors into consideration: (\(X_{A}\)) A is an adequate explanation of E, but it is not the case that (\(X_{B})\) B is an adequate explanation of E. One may wonder why such factors are relevant to A versus B, but it is difficult to deny that, in reality, many ordinary agents would regard \(X_{A} \& \lnot X_{B}\) as positively relevant to A. Suppose that the given agent does, too. Given this fact, what norm ought she to observe in judging which hypothesis is more probable? Clearly, she has learned not only E but also \(X_{A} \& \lnot X_{B}\). Hence, the agent should update her credences in A and B by conditioning on \(E \& X_{A} \& \lnot X_{B}\). Of course, she will have to judge that A is more probable than B if the former comes to have a higher posterior credence in A as a result. In this way, a hybrid model with both probabilistic and explanationist elements can help us understand epistemic norms under specific factual conditions.

Still, this approach has a potential problem. Typically, an ordinary agent thinks about how well hypotheses A and B explain E only after she actually acquires E as evidence. In such a case, the domain of the agent’s prior credence function will not include relevant explanatory sentences \(X_{A}\) and \(X_{B}\) and her prior credence in A conditioned on \(E \& X_{A} \& \lnot X_{B}\) will be undefined; the same is the case for the agent’s prior conditional credence in B.

Fortunately, Halpern suggested a plausible modification of Bayesianism for the cases of domain extension (2005, 28). Let \({\mathfrak {C}}_{1}\) be an agent’s credal state at \(t_{1}\) with \({\mathfrak {F}}_{1}\) as the domain. Suppose that at \(t_{1},\) the agent already considered hypotheses A and B and thought that she might receive E as evidence in the future. However, she neither thought about whether \(X_{A}\) holds nor did she think about whether \(X_{B}\) holds. Let \({\mathfrak {F}}_{2}\) be the smallest algebraic superset of \({\mathfrak {F}}_{1}\cup \left\{ X_A,X_B\right\} .\) Thus, \(A,B,E\in {\mathfrak {F}}_{1}\), but \(X_{A},X_{B}\notin {\mathfrak {F}}_{1}\). This means that her prior credence in A conditional on \(E \& X_{A} \& \lnot X_{B}\) was not defined.

At \(t_{2}>t_{1}\), the agent receives evidence E and realizes that \(X_{A} \& \lnot X_{B}\). In this case, she can update her credence in A using the following procedure. First, the agent extends her prior credal state \({\mathfrak {C}}_{1}\) into a credal state \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) with the domain \({\mathfrak {F}}_{2}\), where \(U\left( \cdot ,\cdot ,\cdot \right)\) is tentatively defined as follows:

\(U\left( {\mathfrak {C}}_{1},\mathfrak {F}_{1},{\mathfrak {F}}_{2}\right) =_{df.}\{p\subset {\mathfrak {F}}_{2}\times \left[ 0,1\right] |\) p is a probability function and \(\exists q\in {\mathfrak {C}}_{1}\forall X\in {\mathfrak {F}}_{1}p\left( X\right) =q\left( X\right)\) \(\}.\)Footnote 9

In other words, \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) is the set of extensions of any \(q\in {\mathfrak {C}}_{1}\) for the enlarged domain \({\mathfrak {F}}_{2}\). In a similar case, we shall call \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) “the extended prior credal state for \({\mathfrak {F}}_{2}\)” and its members “the extended priors for \({\mathfrak {F}}_{2}\).” For brevity, we will abbreviate “\(U\left( \left\{ c_{1}\right\} ,{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\)” into “\(U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\),” where \(c_{1}\left( \cdot \right)\) is a precise credence function. Second, the agent acquires a new credal state by conditioning \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) on \(E \& X_{A} \& \lnot X_{B}\).

However, the above model will lead to an unacceptable consequence unless the extended prior credal state is appropriately restricted. By definition, it includes every coherent probability function assigning the same values as those assigned by some member of the prior credal state. Thus, it will include a probability function p satisfying the following condition:

$$\begin{aligned} p\left( A|E \& X_{A} \& \lnot X_{B}\right) \le p\left( A|E \& \lnot X_{A} \& X_{B}\right) . \end{aligned}$$

This is an abnormal result. On the one hand, \(E \& X_{A} \& \lnot X_{B}\) states that A adequately explains E but B fails to do so; on the other hand, \(E \& \lnot X_{A} \& X_{B}\) states that A fails to explain E adequately but B offers an adequate explanation for it. In many cases, human agents will regard A as more probable under the former condition.Footnote 10 To rule out such a result, this constraint will be imposed on any \(p\in U\left( {\mathfrak {C}}_{1},\mathfrak {{\mathfrak {F}}}_{1},{\mathfrak {F}}_{2}\right)\):

$$\begin{aligned} p\left( A|E \& X_{A} \& \lnot X_{B}\right) >p\left( A|E \& \lnot X_{A} \& X_{B}\right) . \end{aligned}$$
(1)

For similar reasons, the following constraints will be imposed on \(U\left( {\mathfrak {C}}_{1},\mathfrak {{\mathfrak {F}}}_{1},{\mathfrak {F}}_{2}\right)\). For any \(p\in U\left( {\mathfrak {C}}_{1},\mathfrak {{\mathfrak {F}}}_{1},{\mathfrak {F}}_{2}\right)\),

$$\begin{aligned} p\left( A|E \& X_{A} \& \lnot X_{B}\right) >p\left( A|E \& \lnot X_{A} \& \lnot X_{B}\right) \end{aligned}$$
(2)

and

$$\begin{aligned} p\left( A|E \& X_{A} \& \lnot X_{B}\right) >p\left( A|E \& X_{A} \& X_{B}\right) . \end{aligned}$$
(3)

Concerning each pair of conditions, a normal agent will consider A to be more probable under the first condition than under the second.Footnote 11 Hence, the earlier tentative definition of \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) should be revised as follows:

\(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right) =_{df.}\{p\subset {\mathfrak {F}}_{2}\times \left[ 0,1\right] |\) p is a probability function, \(\exists q\in {\mathfrak {C}}_{1}\forall X\in {\mathfrak {F}}_{1}p\left( X\right) =q\left( X\right)\), and p satisfies (1)–(3)\(\}.\)

From this official definition, it follows that for any \(p\in U\left( {\mathfrak {C}}_{1},\mathfrak {{\mathfrak {F}}}_{1},{\mathfrak {F}}_{2}\right)\),Footnote 12

$$\begin{aligned} p\left( A|E \& X_{A} \& \lnot X_{B}\right) >p\left( A|E\right) . \end{aligned}$$
(4)

Thus, for any \(x\in {\mathfrak {C}}_{2}\left( A\right)\), \(x>\underline{{\mathfrak {C}}_{1}}\left( A|E\right)\), where \({\mathfrak {C}}_{2}\) is the result of conditioning \(U\left( {\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) on \(E \& X_{A} \& \lnot X_{B}\). This indicates that learning E can strengthen a rational agent’s belief in A more heavily in the present updating model than in the standard one. Still, this updating method is plausible because the additional strength seems to come from the agent’s realization that \(X_{A} \& \lnot X_{B}\).Footnote 13

Exactly what will the agent’s posterior credal state be like? Consider the following examples of precise prior credal states. Example 2 Becky is a physicist in the early 20th century. Her friend, Albert Einstein, tells her about his new theory of special relativity (hereafter: S), but she is not sure whether S is a better theory than the classical mechanics (hereafter: C). Then, she learns about the result of the Michelson–Morley experiment (hereafter: M). At the same time, she realizes that (\(X_{S}\)) S offers an adequate explanation for M, but (\(\lnot X_{C}\)) C does not. Let \(t_{1}\) be a moment when Becky thinks about which is more probable between S and C but has neither learned M nor thought about its relationship with those theories yet, and \(t_{2}\) be the moment when she learns M and realizes that \(X_{S} \& \lnot X_{C}\). Let \(c_{1}\left( \cdot \right)\) be her precise credence function at \(t_{1}\) and \({\mathfrak {C}}_{2}\) be her credal state at \(t_{2}\). Thus, \(c_{1}:{\mathfrak {F}}_{1}\rightarrow \left[ 0,1\right]\), where \({\mathfrak {F}}_{1}\) is the smallest algebra that includes M, S, and C, and for any \(p\in {\mathfrak {C}}_{2}\), \(p:{\mathfrak {F}}_{2}\rightarrow \left[ 0,1\right]\), where \({\mathfrak {F}}_{2}\) is the smallest algebraic superset of \({\mathfrak {F}}_{1}\cup \left\{ X_S,X_C\right\} .\) For an easier discussion, suppose that \(c_1\left( S\right) +c_1\left( C\right) =1\) and \(c_1\left( S \& C\right) =0\). Thus, any \(c_{1}\left( \cdot \right)\) can be characterized by assigning a probability to each pairwise conjunction of M\(\lnot M,\) S,  and \(\lnot S\). For specificity, assume that

$$\begin{aligned} c_{1}\left( M \& S\right) =\frac{2}{5},\text { }c_{1}\left( M \& \lnot S\right) =c_{1}\left( \lnot M \& S\right) =c_{1}\left( \lnot M \& \lnot S\right) =\frac{1}{5}. \end{aligned}$$
(5)

Consequently, \(c_{1}\left( S|M\right) =\frac{2}{3}>\frac{3}{5}=c_{1}\left( S\right) .\) By definition, for any \(p\in U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\),

$$\begin{aligned}&p\left( S|M \& X_{S} \& \lnot X_{C}\right) >p\left( S|M \& \lnot X_{S} \& X_{C}\right) , \end{aligned}$$
(6)
$$\begin{aligned}&p\left( S|M \& X_{S} \& \lnot X_{C}\right) >p\left( S|M \& \lnot X_{S} \& \lnot X_{C}\right) , \end{aligned}$$
(7)

and

$$\begin{aligned} p\left( S|M \& X_{S} \& \lnot X_{C}\right) >p\left( S|M \& X_{S} \& X_{C}\right) . \end{aligned}$$
(8)

From (6)–(8), it follows that for any \(p\in U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\),Footnote 14

$$\begin{aligned} p\left( S|M \& X_{S} \& \lnot X_{C}\right) >p\left( S|M\right) =c_{1}\left( S|M\right) =\frac{2}{3}. \end{aligned}$$
(9)

According to our model,Footnote 15

$$\begin{aligned} {\mathfrak {C}}_{2}\left( S\right) =\left( U\left( c_{1}, {\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right) \right) \left( S|M \& X_{S} \& \lnot X_{C}\right) =\left( \frac{2}{3},1\right] . \end{aligned}$$
(10)

Clearly, Becky’s belief in S has been strengthened. This is a highly intuitive result because E is not only positively relevant to A but it is also adequately explained by A. One may think that the strengthening of her belief in S is just a result of M’s positive relevance to S, but observe that, for all \(r\in {\mathfrak {C}}_2\left( S\right)\), \(r>c_1\left( S|M\right)\). In this sense, Becky’s belief in S has become stronger than it would’ve by just conditioning on M.

In the above discussion, we assumed that the given agent receives a piece of evidence and figures out its explanatory relation to the hypotheses at the same moment. However, this will not always be the case. It is very common for a scientist to prepare for an experiment with a possible result E and before carrying it out realize that, if E indeed results from the experiment, A will be an adequate explanation of E but B will not (\(=X_{A} \& \lnot X_{B}\)). Call this moment “\(t_1\).” Later, the scientist will conduct the experiment and acquires E as the actual result. Call this moment “\(t_2\).” In this scenario, no domain extension occurs on learning E at \(t_2\) because she has already thought about \(X_{A}\) and \(X_{B}\). Thus, one may think that EIB does not apply here.Footnote 16 However, the problem of undefined conditional credence still arises because the agent’s domain must have been extended at an earlier moment, \(t_1\). For, the scientist ought to update her credences in two steps: by conditioning on \(X_{A} \& \lnot X_{B}\) at \(t_1\) and by conditioning on E at \(t_2\). Of course, the scientist cannot take the first step if, as is most likely, she has never thought about \(X_{A}\) and \(X_{B}\) before preparing for the experiment. Fortunately, our model enables the scientist to take both steps: first, she updates by conditioning the extended prior credal state on \(X_{A} \& \lnot X_{B}\) at \(t_1\) and then reupdates by conditioning on E at \(t_2\).

To see how this procedure works, consider the following example. Example 3 Charles is a young physicist in the early 20th century. Indeed, he is a graduate student of Michelson and Morley and is helping them design a new experiment. Let S, C, M, \(X_{S}\), and \(X_{C}\) be the same as in the previous example. At \(t_{1}\), Charles assigns \(\frac{2}{5}\) to \(M \& S\) and the same precise credence, \(\frac{1}{5}\), to \(M \& \lnot S\), \(\lnot M \& S\), and \(\lnot M \& \lnot S.\) At \(t_{2}>t_{1}\), while Charles is designing the experiment, he realizes that S will provide an adequate explanation for M but C will not (\(=X_{S} \& \lnot X_{C}\)). After this, he actually carries out the experiment. At \(t_{3}>t_{2}\), he acquires M as the result. Let \(c_{1}\) be Charles’s credence function at \(t_{1}\) and let \({\mathfrak {C}}_{2}\) and \({\mathfrak {C}}_{3}\) be his credal states at \(t_{2}\) and \(t_{3}\). Let \({\mathfrak {F}}_{1}\) be the domain of \(c_{1}\) and \({\mathfrak {F}}_{2}\) be that of \({\mathfrak {C}}_{2}.\) Clearly, \({\mathfrak {F}}_{1}\not \ni X_{S},X_{C}\in {\mathfrak {F}}_{2}\) because Charles did not think about \(X_{S}\) and \(X_{C}\) at \(t_{1}\) but he started thinking about them at \(t_{2}.\)Footnote 17 In this situation, at \(t_{2}\), he updated to a new credal state \({\mathfrak {C}}_{2}\) by conditioning \(U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right)\) on \(X_{S} \& \lnot X_{C}\), and at \(t_{3}\), he reupdated by conditioning \(\mathfrak {C}_{2}\) on E. Therefore,

$$\begin{aligned} {\mathfrak {C}}_{3}\left( S\right) ={\mathfrak {C}}_{2}\left( S|M\right) =\left( U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2} \right) \right) \left( S|M \& X_{S} \& \lnot X_{C}\right) , \end{aligned}$$
(11)

which must be \(\left( \frac{2}{3},1\right]\) by the same reasoning as in Example 2. Since \(c_{1}\left( S\right) =c_{1}\left( M \& S\right) +c_{1}\left( \lnot M \& S\right) =\frac{3}{5}\), Charles’s belief in S has been strengthened.

Generally,

if an agent (i) has never thought about which hypothesis would explain E until \(t_{1}\), (ii) realizes at \(t_{2}\) that A provides an adequate explanation for E but B does not, and (iii) receives E as evidence at \(t_{3}\ge t_{2}\), then

$$\begin{aligned} {\mathfrak {C}}_{3}\left( A\right) =\left( c_{1}\left( A|E\right) ,1\right] , \end{aligned}$$

where \(\left\{ c_{1}\left( \cdot \right) \right\}\) and \({\mathfrak {C}}_{3}\) are her credal states at \(t_{1}\) and \(t_{3}\).Footnote 18

From now on, we will call this updating procedure “explanationist imprecise Bayesian model” (EIB). It follows that

if (i)–(iii) hold and (iv) the agent did not regard E as negatively relevant to A at \(t_{1}\), then

$$\begin{aligned} {\mathfrak {C}}_{3}\left( A\right) =\left( c_{1}\left( A\right) +d,1\right] \end{aligned}$$

for some \(d\ge 0\) (where ...).

It is not always easy to tell whether a single real number or an interval is larger, for example, in cases in which the real number is between the interval’s smaller and larger endpoints. However, it is clear in the above case that the given agent’s credence in A has increased.

What if the agent’s prior credal state was imprecise?Footnote 19 Even in such a case, the following general fact will still hold:

if (i)–(iii) hold and (iv’) E was not negatively relevant to A according to any \(p\in {\mathfrak {C}}_{1}\), then

$$\begin{aligned} {\mathfrak {C}}_{3}\left( A\right) =\left( \underline{{\mathfrak {C}}_{1}} \left( A|E\right) ,1\right] =\left( \underline{{\mathfrak {C}}_{1}}\left( A\right) +d,1\right] \end{aligned}$$

for some \(d\ge 0\), where \({\mathfrak {C}}_{1}\) and \({\mathfrak {C}}_{3}\) are the credal states at \(t_{1}\) and \(t_{3}\).Footnote 20

Suppose that (i)–(iii) and (iv’) hold and the agent has updated her credal state in accordance with EIB. Then, either \(\overline{{\mathfrak {C}}_{1}}\left( A\right) <1\), or \(d>0\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) =1\), or \(d=0\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) =1\). In the first case, \(\underline{{\mathfrak {C}}_{1}}\left( A\right) \le \underline{{\mathfrak {C}}_{3}}\left( A\right)\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) <\overline{{\mathfrak {C}}_{3}}\left( A\right)\) and in the second, \(\underline{{\mathfrak {C}}_{1}}\left( A\right) <\underline{{\mathfrak {C}}_{3}}\left( A\right)\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) \le \overline{{\mathfrak {C}}_{3}}\left( A\right)\). In these two cases, it is still intuitive that the agent’s opinion has become more favorable to A as a result of updating. In the third, we cannot say the same because \(\underline{{\mathfrak {C}}_{1}}\left( A\right) =\underline{{\mathfrak {C}}_{3}}\left( A\right)\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) =\overline{{\mathfrak {C}}_{3}}\left( A\right)\). However, this appears to be normal because, from the beginning, \(\underline{{\mathfrak {C}}_{1}}\left( A\right) =\underline{{\mathfrak {C}}_{1}}\left( A|E\right)\) and \(\overline{{\mathfrak {C}}_{1}}\left( A\right) =1\). Therefore, even if EIB is applied to the case of imprecise priors, it will result in the confirmation of a hypothesis offering an adequate explanation with a reasonable exception.

4 Extra Boost View

Surely, EIB is not the first attempt to combine Bayesianism with IBE. Indeed, van Fraasen (1989) discussed another such model (although he did not endorse it). The core idea was simple: whenever a rational agent receives E as evidence, she will first figure out how she would update her credence in a hypothesis H by standard conditioning, but she will adjust that quantity by taking how well H explains E into consideration. The resulting value will be her actual posterior credence in H. Douven (2013) formalizes this idea as follows. Let \(\left\{ H_{i}\right\} _{i\in I}\) be a partition whose members are competing hypotheses. Let \(c_{1}\left( \cdot \right)\) be an agent’s credence function at \(t_{1}.\) Later, at \(t_{2}>t_{1},\) the agent receives evidence E. According to the traditional form of Bayesianism, she ought to assign \(\frac{c_{1}\left( H_{j}\right) c_{1}\left( E|H_{j}\right) }{\sum _{k\in I}c_{1}\left( H_{k}\right) c_{1}\left( E|H_{k}\right) }\) to \(H_{j}\) for each \(j\in I.\) However, the agent realizes at \(t_{2}\) that a particular hypothesis, \(H_{i},\) provides an adequate explanation for E. Seemingly, it is reasonable to adjust the quantities in the fraction so that

$$\begin{aligned} c_{2}\left( H_{i}\right) =\frac{c_{1}\left( H_{i}\right) c_{1} \left( E|H_{i}\right) +f\left( H_{i},E\right) }{\sum _{j\in I}c_{1} \left( H_{j}\right) c_{1}\left( E|H_{j}\right) +f\left( H_{j},E\right) }, \end{aligned}$$
(12)

where \(f\left( \cdot ,\cdot \right)\) is a function that returns the proper amount of extra boost for explaining the given piece of evidence. Let us call this the “extra boost view.”

Why did van Fraassen reject this view? It is because, if an agent updates her credence in \(H_{i}\) in this way, it will make her vulnerable to a diachronic Dutch book (van Fraassen 1989). Suppose that an agent always updates her credences in accordance (12) and that \(c_{1}\left( \cdot \right)\) is her credence function at \(t_{1}.\) Knowing these, a cunning bookie can predict at \(t_{1}\) that, if the agent receives E at \(t_{2},\) then \(c_{2}\left( H_{j}\right) \ne\) \(c_{1}\left( H_{j}|E\right)\) for some \(j\in I.\) Since \(\left\{ H_{i}\right\} _{i\in I}\) is a partition, there is \(i\in I\) such that \(c_{2}\left( H_{i}\right) >c_{1}\left( H_{i}|E\right) .\) At \(t_{1},\) the bookie buys

(Bet1) [$1 if \(H_{i} \& E\); $0 if \(\lnot H_{i} \& E\); $\(c_{1}\left( H_{i}|E\right)\) if \(\lnot E\)]

at the price of $\(c_{1}\left( H_{i}|E\right)\) and sells the agent

(Bet2) [$ \(\left( c_{2}\left( H_{i}\right) -c_{1}\left( H_{i}|E\right) \right)\) if E; $0 if \(\lnot E\)]

for $ \(c_{1}\left( E\right) \left( c_{2}\left( H_{i}\right) -c_{1}\left( H_{i}|E\right) \right) .\) At \(t_{2},\) if E turns out to be true, then he sells the agent

(Bet3) [$1 if \(H_{i}\); $0 if \(\lnot H_{i}\)]

for $\(c_{2}\left( H_{i}\right) .\) Regardless of the result, the agent will suffer net loss of $\(c_{1}\left( E\right) (c_{2}\left( H_{i}\right) -c_{1}\left( H_{i}|E\right) )>0.\)

Is there a similar Dutch book against EIB? No. There is a proof of the fact that, if no agent in a perhaps imprecise credal state falls prey to synchronic Dutch books, then any such agent will be immune to diachronic Dutch books against EIB.Footnote 21 Also, Bradley (2012) argues, convincingly in my opinion, that a rational agent in an imprecise credal state will only be vulnerable to synchronic Dutch books if her preference among bets satisfies several formal conditions but one of them, called “complementarity,” will not be satisfied by any such agent (pp. 8–11). Since Lehman (1955) proved that there is no synchronic Dutch book against a precise credal state, an agent will be immune to synchronic Dutch books as long as her credal state consists of (a) probability function(s).

However, it is unlikely that some clever bookies are really trying to earn money from us (Douven, 2013, p. 431). Thus, although EIB is immune to diachronic Dutch books but the extra boost view is not, this fact itself will not make the former much more practically useful. Moreover, Douven has argued that the extra boost view’s vulnerability to diachronic Dutch books is compensated by its rapid convergence to the truth and tendency to yield more accurate posterior credences (Douven, 2013; Douven & Wenmackers, 2017). In response, Pettigrew (2021) recently argued that any such merits of the extra boost view are illusory.

Their debate is concerned with the extra boost view versus precise conditioning (that is, conditioning precise priors on total evidence), but it is also relevant to the comparison between the extra boost view and EIB. To see why, we need to distinguish three cases. Let \({\mathfrak {C}}_{1}\) be an agent’s credal state at \(t_{1}\) defined on an algebra \({\mathfrak {F}}_{1}.\) Let \(E^{*}\) be her total evidence at \(t_{2},\) which may include some explanatory information.Footnote 22 In the extension case, at \(t_{2},\) the agent acquires some explanatory information, such as \(X_{A} \& \lnot X_{B}\), in addition to observational data, but until \(t_{1}\), she never thought about the former. So \(E^{*}\notin {\mathfrak {F}}_{1}.\) Hence, the domain of her credal state needs to be extended to \({\mathfrak {F}}_{2},\) which includes \(E^{*}.\) In the imprecise non-extension case, \({\mathfrak {C}}_{1}\) is imprecise, and at \(t_{2}\), she either acquires no explanatory information or learns such information but the relevant explanatory sentence was already in \({\mathfrak {F}}_{1}.\) So \(E^{*}\in {\mathfrak {F}}_{1}.\) The precise non-extension case is the same except that \({\mathfrak {C}}_{1}\) was precise.

When applied to the extension case, EIB yields the result that \({\mathfrak {C}}_{2}= \{ p(\cdot |\) \(E \& X_{A} \& \lnot X_{B})|p\in U({\mathfrak {C}}_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2})\}\), where E and \(X_{A} \& \lnot X_{B}\) are the purely empirical and explanatory parts of \(E^{*}.\) So far, I have said nothing about how EIB is supposed to operate in the non-extension cases, but a careful reader might have noticed that EIB is meant to generalize standard conditioning, not to replace it. So, at this point, I make it explicit that EIB subsumes precise (imprecise) conditioning as the special rule for the precise (imprecise) non-extension case. Hence, the result of applying EIB to the imprecise non-extension case is that \({\mathfrak {C}}_{2}=\left\{ p\left( \cdot |E^{*}\right) |p\in {\mathfrak {C}}_{1}\right\}\) and that of applying EIB to the precise non-extension case is that \(c_{2}\left( \cdot \right) =c_{1}\left( \cdot |E^{*}\right) .\) By contrast, the extra boost view does not apply to the extension and imprecise non-extension cases. It is impossible to compare two updating models by judging how well they do in a case to which only one of them applies. Alternatively, we can say that, if one updating model applies to a certain case but the other does not, then the former is clearly better for that case. So, to establish that EIB is better in some aspect (e.g., in the epistemic aspect, in the practical aspect) than the extra boost view, it suffices to show that the result of applying EIB to the precise non-extension case is better in that aspect. Since EIB subsumes precise conditioning for the precise non-extension case, it suffices to show that applying precise conditioning to that case tend to produce better results in the relevant aspect.

However, Douven (2013) claims that, in some case, the extra boost view produces better results. He uses the following experiment:

  • There are two agents, the precise Bayesian and the extra booster. The precise Bayesian always update her precise credences by conditioning on total evidence and the extra booster does in accordance with the extra boost view.

  • There is a coin, c, whose bias for heads is one of \(0,\frac{1}{10},\ldots ,\frac{9}{10},1\), but the exact value is unknown to either agent. Correspondingly, define \(H_{i}\) as the hypothesis that c’s bias for heads is \(\frac{i}{10}\) (\(i=0,1,\ldots ,10\)).

  • In this experiment, c is tossed many times. Let \(E_{j}\) be the conjunction of the outcome of the first toss, that of the second toss,..., and that of the j-th toss. We will say that \(H_{i}\) best explains \(E_{j}\) iff, for any \(k\in \left\{ 0,\ldots ,10\right\}\) such that \(k\ne i\), the frequency of heads among the outcomes of the first j tosses is not less close to \(\frac{i}{10}\) than \(\frac{k}{10}\).

  • When \(H_{i}\) is the best explanation of \(E_{j}\), then \(f\left( H_{i},E_{j}\right) =0.1\). If \(H_{i}\) and \(H_{k}\) are the best explanations of \(E_{j}\), then \(f\left( H_{i},E_{j}\right) =0.05=f\left( H_{k},E_{j}\right)\). When \(H_{i}\) is not one of the best explanations of \(E_{j}\), then \(f\left( H_{i},E_{j}\right) =0\).

  • We will say that the extra booster’s credences converge to the truth faster than the precise Bayesian’s iff, for some number k, the former’s credence in the true hypothesis \(H_{i}\) becomes equal to or larger than 0.99 at the k-th toss of c and, for any \(l<k\), neither agent’s credence in \(H_{j}\) was equal to or larger than 0.99 at the l–th toss of c.

Douven ran 9000 simulated tosses, the first 1000 tosses with c’s bias for heads being 0.1, the second 1000 tosses with it being 0.2, and so on. In almost all cases, the extra booster’s credences converged to the truth faster.Footnote 23

However, he admits that there is a “possible downside to the apparent success of” the extra boost view (2013, p. 434). He writes:

The reason why [the extra booster] so often beats the [precise] Bayesian in these simulations in assigning a high probability to the truth is that she is, in a clear sense, bolder in her responses to new information, due to the fact that she adds a bonus to the best explanation. ... the same feature makes [the extra booster] more prone to assign a high probability to some false hypothesis: a row of consecutive tosses porducing a subsequence in which the relative frequency of heads starkly deviates from the probability for heads is more likely to push the explanationist’s probability for some false bias hypothesis over the .99 threshold ... (Douven, 2013, p. 434)

So, the above result should not be interpreted as simply suggesting that the extra boost view is epistemically better than precise conditioning; for, while the extra booster’s credences tend to converge to the truth faster, she also takes an increased risk of strongly believing a false hypothesis. Even so, does the faster convergence to the truth not indicate that the extra boost view can offer an important advantage for practically more important matters than the bias of a coin? Douven writes:

...imagine that the hypotheses concern some scientifically interesting quantity—such as the success rate of a medical treatment, or the probability of depressive relapse—rather than the bias of a coin ... Which researcher would not want to use an update rule that increases her chances of being in a position to make public a scientific theory, or a new medical treatment, before [the competitor who uses precise conditioning] is? (Douven, 2013, p. 433)

Seemingly, finding a medical treatment for, say, a pandemic is a practically important goal, for which speed is a key factor. So, the extra boost view’s faster convergence to the truth looks like an important practical merit. However, we should be cautious. First, at best, the above result will only show that, in the similar cases to the above experiment, an extra booster’s credal states tend to get to the truth faster. Second, it is based upon raw intuition rather than a rigorous theory of practical rationality.

In response, Pettigrew (2021) points out that an agent who updates her credences by precise conditioning will come to choose a better future action than any action she would choose if she updated differently. More precisely, he offers the proof of the following claim:

  • Let W be the set of possible worlds. For convenience, we assume the finiteness of W and identify each \(w\in W\) with a big sentence in the domain of the given agent’s credence functions.

  • Let \({\mathcal {A}}_{k+1}\) be the set of the given agent’s possible actions at \(t_{k+1}\).

  • Let \({\mathcal {E}}_{k+1}\) be a partition such that each \(E\in {\mathcal {E}}_{k+1}\) is evidence that she may receive at \(t_{k+1}\). For any \(w\in W,\) define \(E_{w}\) as \(E\in {\mathcal {E}}_{k+1}\) that she receives at \(t_{k+1}\) in w.

  • Let \(u\left( \cdot ,\cdot \right)\) be her utility function. So, for any action \(a\in {\mathcal {A}}_{k+1}\) and any possible world \(w\in W\), \(u\left( a,w\right)\) is the utility in w of a.

  • Let \(c_{k}\) be her credence function at \(t_{k}\).

  • Define \(p^{\left\langle \alpha ,E\right\rangle }\) as the result of applying an updating rule \(\alpha\) to a probability function p with evidence E. For any \(E_{w}\in {\mathcal {E}},\) we write \(p^{\left\langle \alpha ,w\right\rangle }\) for \(p^{\left\langle \alpha ,E_{w}\right\rangle }.\)

  • For any probability function \(p\left( \cdot \right)\), define \(a^{p}\) as an act \(a\in {\mathcal {A}}_{k+1}\) such that, for any \(a\in {\mathcal {A}}_{k+1},\) \(\sum _{w\in W}p\left( w\right) u\left( a^p,w\right) \ge \sum _{w\in W}p\left( w\right) u\left( a,w\right) .\)

Then, for any updating rule \(\alpha ,\)

$$\begin{aligned} \sum _{w\in W}c_{k}\left( w\right) u\left( a^{c_{k}^{\left\langle \beta ,w\right\rangle }},w\right) \ge \sum _{w\in W}c_{k}\left( w\right) u\left( a^{c_{k}^{\left\langle \alpha ,w\right\rangle }},w\right) , \end{aligned}$$
(13)

where \(\beta\) is precise condititioning. This can be strengthened to strict inequality unless, for every \(w\in W\), if \(c_{k}\left( w\right) >0\), \(u\left( a^{c_{k}^{\left\langle \beta ,w\right\rangle }},w\right) =u\left( a^{c_{k}^{\left\langle \alpha ,w\right\rangle }},w\right) .\)Footnote 24 Barring such an exception, if an agent updates her present precise credences in any other way than conditioning, then she will come to choose a future action whose expected utility (calculated from her present credences) is not maximal. In sum, the extra boost view’s faster convergence to the truth does not mean that it is always practically better than precise conditioning and, indeed, the standard theory of practical rationality can be used to argue that precise conditioning is generally a pratically better updating rule (at least for maximizing the expected practical utility of one’s next action).

In Douven (2013), he reports the result of another simulated experiment, and based on it, he argues that, in many cases, the extra boost view might serve some epistemic goals better than precise conditioning. Before evaluating this argument, remember that there is a well-known proof of the following inequality (Pettigrew, 2016; Greaves & Wallace, 2006): for any updating rule \(\alpha\),

$$\begin{aligned} \sum _{w\in W}c_{k}\left( w\right) {\mathfrak {B}}\left( c_{k}^{\left\langle \beta ,w\right\rangle },w\right) \ge \sum _{w\in W}c_{k}\left( w\right) {\mathfrak {B}}\left( c_{k}^{\left\langle \alpha ,w\right\rangle },w\right) , \end{aligned}$$
(14)

where \(\beta\) is precise conditioning and \({\mathfrak {B}}\left( \cdot ,\cdot \right)\) is the Brier score function, the most popular inaccuracy measure.Footnote 25 This result can be strengthened to strict inequality unless, for any \(w\in W\), if \(c_{k}\left( w\right) >0,\) then \({\mathfrak {B}}\left( c_{k}^{\left\langle \beta ,w\right\rangle },w\right) ={\mathfrak {B}}\left( c_{k}^{\left\langle \alpha ,w\right\rangle },w\right) .\) Except in such a case, precise conditioning is the only updating rule that maximizes the expected accuracy of the posterior credence function.

Returning to the second computer simulation discussed by Douven (2013), it is similar to the first but this time he pays attention to the accuracies of the two agents’ posterior credences in \(H_{0},\ldots ,H_{10}\) rather than the speed of convergence to the truth. Due to limitation in length, I will omit the details, but the result clearly shows that, in most cases, the extra booster’s posterior credences in \(H_{0},\ldots ,H_{10}\) are more accurate (that is, closer to the truth) than those of the precise Bayesian. Seemingly, the extra boost view serves the epistemic goals of seeking true beliefs and avoiding false ones better than precise conditioning. However, Douven himself admits that, in a small number of cases in which the extra booster’s credences are less accurate, her credences are dramatically inaccurate, while in the other cases, her credences are only slightly more accurate. Hence, Douven’s second simulated experiment does not decisively tell in favor of the extra boost view. In fact, (14) still supports the epistemic superiority of precise conditioning (at least for maximizing the expected accuracy of posterior credences).

To be fair, Douven only claims that, if one cares about being more accurate most of the time but not much about having the greated expected accuracy of credences, then one should be an extra booster. However, Pettigrew argues that it is unreasonable to “care about the probability of comparative performance and ignore the distribution of absolute performance” (Pettigrew, 2021, p. 14230). I find this argument persuasive.

In summary: (i) The extra boost view is vulnerable to diachronic Dutch books but EIB is immune to them. (ii) Based on the results of some computer simulations, Douven claims that, in many cases, an extra booser’s credences get to the truth more quickly and are inclined to be more accurate than the precise Bayesian’s, but these results do not decisively show that the extra boost view is practically or epistemically better because the extra booster takes an increased risk of inaccurate credences. (iii) Moreover, the practical utility and epistemic accuracy arguments support Pettigrew’s claim that precise conditioning is generally superior to the extra boost view in the relevant aspects. As mentioned, EIB coincides with precise conditioning in the precise non-extension case and the extra boost view does not apply to the other cases. Judging from (i)–(iii), I conclude that EIB is an overall better updating model than the extra boost view.

At this point, one may ask: is it not possible to develop a similar updating model to EIB without introducing imprecise credal states?Footnote 26 It is. By Appendix C and Lehman’s (1955) converse Dutch book theorem, if an agent updates her credences by conditioning a single extended prior credence function on empirical evidence conjoined with relevant explanatory information, then she does not have to worry about diachronic Dutch books. Formally, if \(c_{2}\left( \cdot \right) =p\left( \cdot |X_{A} \& \lnot X_{B} \& E\right)\) for some \(p\left( \cdot \right) \in U\left( c_{1},{\mathfrak {F}}_{1},{\mathfrak {F}}_{2}\right) ,\) then there is no sure loss strategy against this updating method, where \({\mathfrak {F}}_{1}=\text {dom}\left( c_{1}\right)\) and \({\mathfrak {F}}_{2}\) is the minimal algebraic superset of \({\mathfrak {F}}_{1}\cup \left\{ X_{A},X_{B}\right\} .\) Call such a model “Explanationist Precise Bayesianism” (EPB). Also, it is natural to let EPB subsume precise conditioning as a special rule for the precise non-extension case. So, the above mentioned practical utility and epistemic accuracy arguments will support EPB’s superiority to the extra boost view in the relevant aspects. However, the full development of EPB will ask us, among other things, to find and justify a mechanism for selecting a single extended prior credence function for conditioning, a seemingly daunting task. This would be an interesting project but will not be pursued in this paper.

5 Constraint Compatibilism

In his well-known book, Lipton (2004) offered a different hybrid model. He proposed that a rational agent should restrict the prior likelihoods of hypotheses by taking explanatory factors into consideration. As a result of subsequent updating, she will come to have a favorable credal opinion to a hypothesis offering an adequate explanation for evidence.

To make this proposal clearer, let \(t_{0}\) be when an agent forms her initial credence function \(c_{0}\), before the agent receives any evidence. Let \(t_{1}\) be when the agent receives her first evidence, \(t_{2}\) be when the agent receives her second evidence, and so on. For any \(k\in \aleph\), \(c_{k}:{{\mathfrak {F}}} \rightarrow \left[ 0,1\right]\) is the agent credence function at \(t_{k}\) (where \({\mathfrak {F}}\) is the fixed domain). Let \(\left\{ E_{i}^{1}\right\} _{i\in I}\) be the set of sentences that the agent might learn the truth of as her first evidence, \(\left\{ E_{i}^{2}\right\} _{i\in I}\) be the set of sentences that the agent might learn the truth of as her second evidence, and so on. Thus, until each \(t_{k}>t_{0},\) the agent will have learned \(E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}\) for some \(\left\langle i_{1},\ldots ,i_{k} \right\rangle \in I^{k}\).

Naturally, the agent will keep thinking about which of the competing hypotheses is more probable. Of course, there might be many groups of mutually competing hypotheses, but for simplicity, assume that the agent considers only one such group, consisting of A and B. By Bayes’s theorem,

$$\begin{aligned} \begin{aligned} c_{k}\left( A\right)&=c_{0}\left( A|E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}\right) \\&=\frac{c_{0}\left( E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}|A\right) c_{0}\left( A\right) }{c_{0}\left( E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}|A\right) c_{0}\left( A\right) +c_{0}\left( E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}|B\right) c_{0}\left( B\right) }. \end{aligned} \end{aligned}$$
(15)

Now, we impose the following constraint on \(c_{0}\left( \cdot \right)\):

(EC) for any \(\left\langle i_{1},\ldots ,i_{k}\right\rangle \in I^{k}\), if A explains \(E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}\) adequately but B does not, then \(c_{0}\left( E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}|A\right) >c_{0}\left( E_{i_{1}}^{1} \& \ldots \& E_{i_{k}}^{k}|B\right)\).

This ensures that regardless of which pieces of evidence the agent receives until \(t_{k}\), the agent will assign, at \(t_{k}\), a higher credence to A than to B if she initially judged that A explains them adequately but B does not and the agent’s initial credence in A was not lower than her initial credence in B. Following Henderson (2013), we will call this model the “constraint(-based) compatibilism.”

Constraint compatibilism has two merits. First, as mentioned, an agent who updates in accordance with it tends to favor a hypothesis offering an adequate explanation of evidence. Second, unlike the extra boost view, constraint compatibilism does not ask the agent to violate conditioning, which would expose her to the danger of a diachronic Dutch book.Footnote 27

However, remember that EIB also has these advantages. If so, which theory is better, EIB or constraint compatibilism? Presumably, the answer will depend on the purpose of using these theories. First, they may be used to describe how our credal states are really updated. For this purpose, EIB does a better job. According to constraint compatibilism, we set up our initial priors in accordance with EC. Afterwards, we keep updating our credences by conditioning. According to EIB, we did not necessarily set up our initial priors in that way but later, we continue to update our credal states by extending our prior credal state and conditioning it on evidence and explanatory information. In reality, when we were born, we did not think about, for example, which of classical mechanics and special relativity would provide a better explanation of the Michelson-Morley experiment’s result. Nor did we assign a higher initial likelihood to that result on the assumption that special relativity is true than on the asumption that classical mechanics is true. Similar examples abound. Thus, EIB provides a more realistic description of our updating pattern.

Second, the two theories can be used to know how our credal states should be updated. For this purpose, there are reasons to prefer EIB to constraint compatibilism. Consider the following desideratum for any normative theory T of credal updating:

(DES) T must tell us statements in the form of “a ought to assign such and such a credence to A in such and such an evidential situation at \(t_k\).”

In other words, a normative theory of credal updating must tell us how to decide our credences in a specific evidential situation. Prima facie, this is a plausible desideratum. So we can use DES to judge which theory is better between EIB and constraint compatiblism. Now, consider the following instances of EIB and contraint compatibilism:

(\(I_{EIB}\)) If a was in the credal state \({\mathfrak {C}}_{k-1}\) at \(t_{k-1}\) and never thought about which hypothesis would explain E, then she ought to assign \(\left( \underline{{\mathfrak {C}}_{k-1}}\left( A|E \right) ,1\right]\) to A, upon learning \(E \& X_A \& \lnot X_B\) at \(t_{k}\).

(\(I_{CC}\)) If a set up her initial credence function \(c_{0}\left( \cdot \right)\) in accordance with EC, initially judged that A explains \(E_{1} \& \ldots \& E_{k}\) adequately but B does not, and did not assign a lower initial credence to A than to B, then she ought to assign a higher credence to A than to B, having received \(E_{1} \& \ldots \& E_{k}\) until \(t_k\).

Let \(A_{EIB}\) and \(A_{CC}\) be the antecedents of \(I_{EIB}\) and \(I_{CC}\) (in that order), \(C_{EIB}\) be that a assigns \(\left( \underline{{\mathfrak {C}}_{k-1}}\left( A|E\right) , 1\right]\) to A upon receiving \(E \& X_A \& \lnot X_B\) at \(t_k\), and \(C_{CC}\) be that a assigns a higher credence to A than to B, having received \(E_{1} \& \ldots \& E_{k}\) until \(t_k\). We use “O” as the unary obligation operator of standard deontic logic (SDL). Now, two questions must be asked. First, can we derive \(O\left( C_{EIB}\right)\), that is,

a ought to assign \(\left( {\mathfrak {C}}_{k-1}\left( A\right) ,1\right]\) to A upon learning \(E \& X_A \& \lnot X_B\) at \(t_{k}\).

from \(I_{EIB}\)? If yes, then EIB meets the desideratum DES. Second, can we infer \(O\left( C_{CC}\right)\), that is,

a ought to assign a higher credence to A than to B, having received \(E_{1} \& \ldots \& E_{k}\) until \(t_k\).

from \(I_{CC}\)? If so, constraint compatibilism satisfies DES.

To answer, two general issues must be discussed about conditional oughts, or sentences in the form of “If \(\phi\), then it ought to be the case that \(\psi\).” In the rest of this paper, such a sentence will be abbreviated into \(O\left( \psi |\phi \right)\). Since Chisholm’s seminal paper was published in 1963, there have been active debates about, first, how to interpret \(O\left( \psi |\phi \right)\) and, second, how to derive \(\psi\) from \(O\left( \psi |\phi \right)\) (e.g., Greenspan, 1975; Bonevac, 1998; Horty, 2001). According to the narrow-scope interpretation, \(O\left( \psi |\phi \right)\) should be interpreted as meaning \(\phi \rightarrow O\left( \phi \right)\), but according to the wide-scope interpretation, \(O\left( \psi |\phi \right)\) should be understood as meaning \(O\left( \phi \rightarrow \psi \right)\) (McNamara & van De Putte, 2021).Footnote 28 Since SDL is a normal modal logic,

(Deontic Detachment) \(O\left( \phi \rightarrow \psi \right) , O\left( \phi \right) \models _{SDL}O\left( \psi \right) .\)

So the proponents of the wide-scope interpretation (hereafter: widescopers) adopt deontic detachment as a valid inference rule.Footnote 29 However, it is implausible that \(O\left( \psi \right)\) can always be derived from \(O\left( \psi |\phi \right)\) with the help of \(O\left( \phi \right)\) only, because obviously, unconditional obligations depend upon factual conditions as well. For examples, David ought to take care of Elisabeth because she is his child and Fred ought to pay tax to the govenment of Canada because he is Canadian. However, it makes sense to say that only David has a real duty because Fred has the option of emigrating to the USA. In general, it is uncontroversial that \(O(\phi \rightarrow \psi )\) implies \(O(\psi )\) when \(\phi\) is unalterably true. Thus, SDL is usually extended into a logic SDL+ such that \(O\left( \phi \rightarrow \psi \right) ,U\left( \phi \right) \models _{SDL+}O\left( \psi \right) ,\) where “U” is the unary unalterable truth operator (Greenspan, 1975; Horty, 2001). Since unalterable truth entails truth, \(U(\phi )\models _{SDL+}\phi .\) Hence, whether \(O\left( \psi |\phi \right)\) is narrowly or widely interpreted,

(Factual Detachment) \(O\left( \psi |\phi \right) ,U\left( \phi \right) \models _{SDL+}O\left( \psi \right) .\)

Most philosophers, narrow-scopers (i.e., the proponents of narrow-scope interpretation) and wide-scopers alike, consider Factual Detachment, as defined above, to be a valid inference rule.Footnote 30

Let us return to the issue of which theory, EIB or constraint compatibilism, satisfies DES. Of course, \(O(C_{EIB})\) cannot be derived from \(I_{EIB}\) alone. An additional premise is needed. Given the above discussion on conditional obligation, the proponent of EIB needs to choose one of the two options below:

(Option 1) Defend \(O(A_{EIB})\) and derive \(O(C_{EIB})\) from \(O(A_{EIB}\rightarrow C_{EIB})\) and \(O(A_{EIB})\).

(Option 2) Defend \(U(A_{EIB})\) and derive \(O(C_{EIB})\) from \(O(C_{EIB}|A_{EIB})\) and \(U(A_{EIB})\).

A few clarifications are due. First, “defend \(\phi\) and derive \(O(\psi )\)” means defending that many instances of \(\phi\) are true and deriving the corresponding instances of \(O(\psi )\) from the given premises. Second, if we defend many instances of \(O(\psi )\) in this way, by deriving them from a general updating theory T, then T satisfies DES and can tell us how to update our credal states in many cases. Third, the present issue is whether such a theory T will satisfy DES if T is true. Thus, when we discuss this issue regarding EIB or compatibilism, its truth will be simply assumed.

Remember that the narrow-scope versus wide-scope debate is an ongoing one. Obviously, Option 1 is incompatible with the narrow-scope interpretation. Hence, if it were her only available option, the advocate of EIB could not defend her theory’s satisfaction of DES if the narrow-scope interpretation turns out to be correct.

Fortunately, Option 2 is compatible with both interpretations. Suppose that the proponent of EIB chooses this option. Then, she need defend \(U(A_{EIB})\), which is not difficult. As discussed above, when we learn \(E \& X_A \& \lnot X_B\) and update our credal states accordingly, it is often true that (\(A_{EIB}\)) (i) we were previously in such and such a credal state \({\mathfrak {C}}_{k-1}\) and (ii) we had not thought about which theory would adequately explain E until the previous moment, \(t_{k-1}\). From our present point of view at \(t_k\), (i) and (ii) are unalterably true because they describe our past epistemic states. By Factual Detachment, \(O(C_{EIB})\) can be derived from \(I_{EIB}\), identical to \(O(C_{EIB}|A_{EIB})\), and the thus defended \(U(A_{EIB})\). Example 2 is a typical case. From Becky’s point of view at \(t_2\), it is unalterably true that her previous credal state was \(\{c_1(\cdot )\}\) and she had not thought about which theory, S or C, would explain M adequately. From this and the relevant instance of EIB, it follows that she ought to assign \(\left( \frac{2}{3},1\right]\) to S at \(t_2\). Clearly, there are many similar cases and EIB can tell us how to update our credal states in such a case. Therefore, the proponent of EIB can argue that her theory satisfies DES, whether the narrow-scope or wide-scope interpretation is true.

Next, to argue that constraint compatibilism satisfies DES, the compatibilist needs to select one of these two options:

(Option 3) Defend \(O(A_{CC})\) and derive \(O(C_{CC})\) from \(O(A_{CC}\rightarrow C_{CC})\) and \(O(A_{CC})\).

(Option 4) Defend \(U(A_{CC})\) and derive \(O(C_{CC})\) from \(O(C_{CC}|A_{CC})\) and \(U(A_{CC})\).

The compatibilist must not choose Option 4. As discussed, none of us actually set up our initial priors in accordance with EC. A fortiori, it is not unalterably true that we set up our initial priors in that way. Thus, \(U\left( A_{CC}\right)\) cannot be defended.

This leaves Option 3. Like Option 1, it is incompatible with the narrow-interpretation of conditional oughts. So, assume that wide-scopers win and the compatibilist chooses Option 3. Consider a case where an agent a has received \(E_1 \& \ldots \& E_k\) and A provides a better explanation of \(E_1 \& \ldots \& E_k\) than B does. Then, the compatibilist wants to explain why it ought to be the case that

(\(C_{CC}\)) a assigns a higher credence to A than to B at \(t_k\).

For this, she needs to defend that it ought to be the case that

(\(A_{CC}\)) a had an ideal initial credence function \(c_0^i(\cdot )\) such that

  • \(c_0^i(\cdot )\) satisfies EC,

  • a initially judged that A provides a better explanation of \(E_1 \& \ldots \& E_k\) than B, and

  • \(c_0^i(A)\ge c_0^i(B)\).

Assuming that this succeeds, the compatibilist can derive \(O(C_{CC})\) from the thus defended \(O(A_{CC})\) and, the widely interpreted instance of compatibilism, \(O( A_{CC}\rightarrow C_{CC} )\). In this way, it can be shown that contraint compatibilism satsifies DES.

Of course, the big question is whether \(O(A_{CC})\) can be really defended. We will not try to answer it in this paper. Even so, we can identify a few reasons to prefer EIB. First, as we saw above, the compatibilist depends on the wide-scope interpretation in showing that her theory satisfies DES. This is a potential problem because some authors endorse the narrow-scope interpretation (e.g., Bonevac, 1998). Second, if we really obey compatibilism in updating our credences, we will become vulnerable to a diachronic Dutch book. Returning to the case in the last paragraph, let \(c_0(\cdot )\) be a’s actual initial credence function. If a is like most of us, \(c_0(\cdot )\) did not satisfy EC. Hence, \(c_0(\cdot )\ne c_0^i(\cdot )\). Therefore, it is likely that \(c_{k-1}(\cdot |E_k)\ne c_k^i(\cdot )\), where \(c_{k-1}(\cdot )=c_0(\cdot |E_1 \& \ldots \& E_{k-1})\) and \(c_k^i(\cdot )= c_0^i(\cdot |E_1 \& \ldots \& E_{k-1} \& E_k)\). Assuming that a has actually updated her credal state by conditioning, \(c_{k-1}(\cdot )\) is her actual credence function at \(t_{k-1}\) and \(c_k^i(\cdot )\) is the credence function that she would have at \(t_k\) if a actually updated in accordance with compatibilism at \(t_k\). It is easy to arrange a diachronic Dutch book that exploits the difference between \(c_{k-1}(\cdot |E_k)\) and \(c_k^i(\cdot )\).

6 Roche and Sober’s Objection to IBE

In their 2013 paper, Roche and Sober (hereafter: R&S) provide an example in which frequency data “screen off” explanatory information. It can be easily modified into a counterexample to EIB. In this section, first, we will discuss how to revise EIB to accommodate R&S’s example as a harmless exception and, second, it will be argued that there are realistic cases in which statistic data do not screen off explanatory information.

First, consider this example. Example 4. Until \(t_{1},\) Gloria has observed a sufficiently large set S of men. Some of S’s members were habitual smokers before 50 and some of them developed lung cancer after 50. Let \(L\subset S\) be the set of S’s members who got lung cancer after 50. According to her observation, the frequency in L of heavy smoking (before 50) was higher than that in S of heavy smoking. At \(t_{2},\) Gloria meets Harry and learns that (Ca) he got lung cancer after 50. At the same time, she realizes that (\(X_{Sm}\)) Sm is an adequate causal explanation of Ca but (\(\lnot X_{\lnot Sm}\)) \(\lnot Sm\) is not, where Sm is that Harry was a habitual smoker before 50.Footnote 31 Before \(t_{2},\) she never thought about \(X_{Sm}\) and \(X_{\lnot Sm}.\)

To see why it is a counterexample to EIB, let r be a reasonable value as the credence at \(t_1\) in Sm conditional on Ca and let s be an appropriate value as the credence at \(t_1\) in Sm. Given the frequencies in S and L of heavy smoking, \(r>s.\) That is,

$$\begin{aligned} c_{1}\left( Sm|Ca \& K\right) >c_{1}\left( Sm|K\right) , \end{aligned}$$
(16)

where \(c_{1}\left( \cdot \right)\) is her credence function at \(t_{1}\) defined on an algebra \({\mathfrak {F}}_{1}\) and K is her background knowledge at \(t_{1}.\) By definition,

$$\begin{aligned} p\left( Sm|Ca \& K\right) >p\left( Sm|K\right) , \end{aligned}$$
(17)

for any extension \(p\left( \cdot \right)\) of \(c_{1}\left( \cdot \right)\) to \({\mathfrak {F}}_{2},\) the smallest algebraic superset of \({\mathfrak {F}}_{1}\cup \left\{ X_{Sm},X_{\lnot Sm}\right\} .\) In (17), if \(X_{Sm} \& \lnot X_{\lnot Sm}\) is added to \(Ca \& K\), will it increase the value of the conditional credence in Sm on the lefthand side? According to R&S, the frequency in L of heavy smokers provides a good estimate of \(p\left( Sm|Ca \& K\right)\), and adding \(X_{Sm} \& \lnot X_{\lnot Sm}\) “does not change what the estimate should be” (2013: p. 661). Thus,

$$\begin{aligned} p\left( Sm|Ca \& X_{Sm} \& \lnot X_{\lnot Sm} \& K\right) =p\left( Sm|Ca \& K\right) , \end{aligned}$$
(18)

(for any ...). If correct, this result contradicts (4), which derives from (1)–(3). Thus, Gloria cannot update her credal state in accordance with EIB, which presupposes that the extended prior functions satisfy (1)–(3). Hence, Example 4 is a counterexample to EIB.

To accommodate such cases as legitimate exceptions, EIB needs to be modified as follows. First, we introduce a new technical concept, E-inadmissibility. Roughly, empirical evidence E is E-inadmissible with respect to explanatory information X when E directly determines the credence in A conditioned on (or after learning) EX,  and perhaps background knowledge K, overriding the confirmtive effect of X on A. In Example 4, Ca directly determines her credence in Sm (conditional on ...), thanks to the frequency data included in K. That is, Ca overrides the confirmative effect of \(X_{Sm} \& \lnot X_{\lnot Sm}\) on Sm. Example 5.Footnote 32 For another example, let E be that Isaac received ten jacks of clubs in a row, A be that he received ten jacks in a row, and B be that he is cheating. Intuitively, (\(\lnot X_A\)) A does not explain E adequately but (\(X_B\)) B adequately explains E. Usually, \(\lnot X_A \& X_B\) is supposed to disconfirm A, but in this case, \(c(A|E \& \lnot X_A \& X_B)=c(A|E)\) because E logically entails A. Equipped with the notion of E-admissibility, we are ready to revise the earlier formulation of EIB:

if an agent (i) has never thought about which hypothesis would explain E until \(t_{1}\), (ii) realizes at \(t_{2}\) that A provides an adequate explanation for E but B does not, and (iii) receives E as evidence at \(t_{3}\ge t_{2}\), and (iv) E is not E-inadmissible w.r.t. \(X_A \& \lnot X_B\), then

$$\begin{aligned} {\mathfrak {C}}_{3}\left( A\right) =\left( \underline{{\mathfrak {C}}_{1}} \left( A|E\right) ,1\right] , \end{aligned}$$

where \({\mathfrak {C}}_{1}\) and \({\mathfrak {C}}_{3}\) are the credal states at \(t_{1}\) and \(t_{3}\).

This new version of EIB does not apply to the cases in which an agent receives E-inadmissible evidence E w.r.t. explanatory information X, including, but not limited to, those in which E logically entails A under background knowledge K or the given frequency data screen off explanatory information. So, the above mentioned counterexamples to the old version of EIB are legitimate exceptions to this new version. From now on, we will use “EIB” to refer to the latter.

One may worry that E-inadmissibility is not a clearly defined concept and so EIB is unacceptably obscure. To address this worry, it will be useful to compare EIB with Lewis’s (1980) Principal Principle, which states that a rational agent ought to assign r to a hypothesis A conditional on \(E \& \left\langle chance(A)=r\right\rangle\), unless empirical evidence E is inadmissible w.r.t. \(\left\langle chance(A)=r\right\rangle\). To distinguish, we will use “C-inadmissible” as a jargon expressing the chance-related notion of inadmissibility. In formulating PP, Lewis did not provide a definition of C-inadmissibility, but nobody thinks that this makes PP completely useless. This is because, although we do not have a clearly defined necessary sufficient condition for C-inadmissiblity, we can often figure out whether the given piece of evidence is C-inadmissible. For example, if a time traveler from the future told us that a fair coin about to be tossed would land heads, we know that we got C-inadmissible evidence. For another, if E logically entails A, E is C-inadmissible w.r.t. \(\left\langle chance(A)=r\right\rangle\) (\(0<r<1\)). Similarly, although we do not have a definition of E-inadmissibility, we have some level of understanding about when a piece of evidence is E-inadmissible. Examples 4 and 5 are such cases.

Another worry is that the range of EIB’s application might be too narrow. Concerning PP, Lewis says “if nothing is [C-]admissible, then [PP] is vacuous” (Lewis, 1980, p. 272). In a similar vein, if we receive only E-inadmissible evidence in most cases, EIB will be useless. Indeed, R&S claim that the following thesis is true in most of the realistic cases:

(SOT) Let A be a hypothesis. Let E be empirical evidence at \(t_{1}\). Let X be relevant explanatory information. Then,

$$\begin{aligned} p\left( A|E \& X \& K\right) =p\left( A|E \& K\right) , \end{aligned}$$

where \(p(\cdot )\) is a prior credence function.

In the present context, \(p(\cdot )\) should be interpreted as an extended prior credence function.Footnote 33 Whenever SOT (understood in this way) holds, the value of \(p(A|\ldots )\) will be entirely determined by the statistic relation between E and A, and explanatory information, such as \(X_A \& \lnot X_B\), will play no role in deciding that value. Since E is E-inadmissible w.r.t. \(X_A \& \lnot X_B\), EIB does not apply to such a case. Hence, if SOT is true in a broad range of realistic cases, then EIB will be almost completely useless.

To establish that SOT is true in many such cases, R&S first point out that it appears to be true at least in Example 4 and then argue that, in many similar cases, frequency data should screen off explanatory information. To block this argumentative strategy, it is crucial to show that there are many realistic cases in which SOT is false. Two such examples are found in the literature. First, Lange (2017) discusses the following example:Footnote 34

Example 6. Suppose that J is that Jones is the person who stole the jewel from the safe, B is that the single strand of hair found inside the safe was blond, and K,  a female detective’s background knowledge at \(t_{1}\), tells her that there was exactly one robber and one strand of hair found inside the safe, that Jones has blond hair, and that such a hair has a serious (though not overwhelming) likelihood to have been left by the robber during the robbery (though there are other ways in which the hair could have gotten into the safe). ... ( Lange, 2017, p. 305)

Since the person who left hair in the safe is likely to be the robber, B confirms J to some degree. Thus,

$$\begin{aligned} c_{1}\left( J|B \& K\right) >c_{1}\left( J \& K\right) , \end{aligned}$$
(19)

where \(c_{1}\left( \cdot \right)\) is the detective’s credence function at \(t_{1}\) defined on \({\mathfrak {F}}_{1}\). However, B does not confirm J to a very high degree because she could not completely rule out that the hair left in the safe might have been not the robber’s. Now, let \(X_J\) be that J is an adequate causal explanation of B and \(X_{\lnot J}\) be that \(\lnot J\) is an adequate causal explanation of B. Since \(X_{J}\) implies that there was a causal connection between the colors of the hair found in the safe and Jones’s, \(X_{J}\) indicates that the person who left hair in the safe was actually Jones. Thus, adding \(X_{J} \& \lnot X_{\lnot J}\) to the probabilistic antecedent should increase the conditional credence in J.Footnote 35 Hence, it is uncontroversial that

$$\begin{aligned} c_1\left( J|B \& X_{J} \& \lnot X_{\lnot J} \& K\right) >c_1\left( J|B \& K\right) . \end{aligned}$$
(20)

Even if we additionally assume that the detective never thought about \(X_J\) and \(X_{\lnot J}\) before learning B and we replace \(c_1(\cdot )\) with an extension \(p(\cdot )\) of \(c_1(\cdot )\) to \({\mathfrak {F}}_2\) in the above inference, the resulting inference should remain sound, where \({\mathfrak {F}}_2\) is the smallest algebraic superset of \({\mathfrak {F}}_1\cup \left\{ X_J,X_{\lnot J}\right\}\). Hence,

$$\begin{aligned} p\left( J|B \& X_{J} \& \lnot X_{\lnot J} \& K\right) >p\left( J|B \& K\right) , \end{aligned}$$
(21)

(for any extension \(p(\cdot )\)...). This counterexample to SOT is realistic. There are likely to be many similar cases.

Second, consider the following case (Garber, 1983). Example 7. Let S be the special relativity theory, C be the classical mechanics, and P be Mercury’s perihelion precession. So \(S\vdash _{K}P\) but \(C\not \vdash _{K}P\), where K is the scientists’ background knowledge in the early 20th century and \(\vdash\) is a logical entailment operator. So, S provides an adequate deductive-nomological explanation for P but C does not. Let \(t_{1}\) be a time after Einstein had proposed the special relativity theory but before anybody figured out that S entails P (under K).Footnote 36 Let \(t_{2}\) be a time at which scientists finally realized that S entails P. Assume that, before \(t_{2}\), scientists never thought about whether (\(X_{S}\)) S provides an adequate deductive nomological explanation of P (with the help of K), but at \(t_{2}\), they realized that \(X_{S}\) holds. So, \({\mathfrak {F}}_{1}\not \ni X_{S}\in {\mathfrak {F}}_{2}\), where \({\mathfrak {F}}_{1}\) and \({\mathfrak {F}}_{2}\) are the domains of scientists’ credence functions at \(t_{1}\) and \(t_{2}\).

Until \(t_1\), scientists have observed P for a long time. Thus,

$$\begin{aligned} c_{1}\left( S|P \& K\right) =c_{1}\left( S|K\right) . \end{aligned}$$
(22)

By definition, \(p\left( S|P \& K\right) =p\left( S|K\right)\) for any extension \(p\left( \cdot \right)\) of \(c_{1}\left( \cdot \right)\) to \({\mathfrak {F}}_{2}\). Also, the fact that \(S\vdash _{K}P\)—i.e., the fact that special relativity theory entails Mercury’s perihelion precession under K—must have been evidence for S. Since \(X_{S}\) entails that fact,

$$\begin{aligned} p\left( S|P \& X_{S} \& K\right) >p\left( S|M \& K\right) \end{aligned}$$
(23)

(for any ...). For centuries, scientists had known that (\(\lnot X_{C})\) C fails to provide an adequate deductive-nomological explanation for P. So, K includes \(\lnot X_{C}.\) Thus,

$$\begin{aligned} p\left( S|P \& X_{S} \& \lnot X_{C} \& K\right) >p\left( S|M \& K\right) \end{aligned}$$
(24)

(for any ...). This counterexample to SOT is from the real history of science. There are likely to be many similar cases. Indeed, Example 2 was another case in which the special relativity theory was confirmed by predicting observed data through logical reasoning.Footnote 37

Admittedly, the above defense of EIB needs further elaboration. For instance, we have not discussed whether EIB can be successfully used when an agent realizes that a hypothesis provides an adequately unified explanation of observed phenomena (Kitcher, 1989). Unfortunately, a more detailed discussion of this and other related issues is beyond the scope of this paper. Still, two points should be noted. First, McCain and Poston write:

Given the ubiquity of IBE in everyday life and the sciences, we find it surprising that William Roche and Elliott Sober (2013) have recently attempted to show that explanatoriness is evidentially irrelevant. (McCain & Poston, 2014)

For this reason, it is unlikely that statistic data always screen off explanatory information in all ordinary and scientific inquiries. So there must be a nonempty set of realistic cases in which one can update the credal state in accordance with EIB. Second, in no matter how broad a range of cases in which such data screen off such information, EIB will be a better hybrid model of inference for the no-screening-off cases than the extra boost view or constraint compatibilism, for the reasons discussed in the earlier sections.

7 Conclusion

This paper defended a new hybrid model of scientific reasoning, EIB. First, I developed it by combining imprecise Bayesianism with IBE. When applied to examples, EIB returned the plausible verdict that evidence tends to favor a hypothesis providing a better explanation of it. Then, I compared it with two other hybrid models. Unlike the extra boost view, EIB does not expose those who comply with it to the risk of diachronic Dutch book. Whenever the given agent’s priors are precise and not extended, EIB maximizes the expected practical and epistemic utilities of her future action and credences. In comparison with constraint compatibilism, EIB is a better model of reasoning for the descriptive and normative purposes. Finally, EIB applies to all cases in which frequency data do not screen off explanatory information. Overall, EIB is the best of the three models.