A Counterexample to Modus Tollens

This paper defends a counterexample to Modus Tollens, and uses it to draw some conclusions about the logic and semantics of indicative conditionals and probability operators in natural language. Along the way we investigate some of the interactions of these expressions with knows, and we call into question the thesis that all knowledge ascriptions have truth-conditions. A probabilistic dynamic semantics for probability operators, conditionals, and acceptance attitudes is developed around the idea of representing the common ground of a conversation as a set of probability spaces.

A marble is selected at random and placed under a cup. This is all the information given about the situation. Against this background, the following claims about the marble under the cup are licensed: (P1) If the marble is big, then it's likely red. (P2) The marble is not likely red.
However, from these, the following conclusion does not intuitively follow: (C1) The marble is not big.
But this conclusion would follow, were Modus Tollens (MT) valid. So MT is not generally valid.
We do well to put the point in various ways. For example: it is entirely possible to believe (P1) and (P2), but fail to believe (C1), even after full and complete rational reflection. Contrariwise, if one believed only (P1) and (P2), and on the basis of these concluded (C1), one would be making a mistake. Further, it is possible to suppose (P1) and (P2), without (C1) in any way following from one's suppositions. Further, one can be given the information that (P1) and (P2) express, without the information given by (C1)'s being part of one's information in any way.
I take it that the probability operators likely and probably are synonyms, so I will use them interchangeably. Schematically, our case is of this form: where φ, ψ are themselves assumed to be probability operator-free. 1 This argument form is invalid. Since it is just a special case of MT, it is a counterexample to the claim that MT is a generally valid pattern.
Let me stress before continuing that I am by no means the first to advance a counterexample to MT. Lewis Carroll's barbershop paradox [5] effectively supplies a counterexample to MT, one employing a right-nested indicative conditional. 2 Frank Veltman developed a similar counterexample in his pioneering dissertation, citing Carroll for inspiration [30]. On a natural reading, Forrester's gentle murder paradox [7] calls attention to what looks like a 1 And epistemic modal-free. (Probability operators just are epistemic modals, in the relevant semantic respects; see [18,37].) Or at least, where whatever probability operators/epistemic modals there be in φ, ψ are safely nestled in the appropriate embedded context-under 'John believes', for instance. Unless otherwise noted, all of my schematic letters range over 'unhedged' sentences of this sort. 2 Cleaned up, here are Carroll's premises: If Carr is out, then if Allen is out, Brown is in; and it's not the case that if Allen is out, Brown is in. The first we know because we know one of the three is in; the second we know because Allen never leaves without Brown. But we cannot conclude from our premises (by MT) that Carr is in. After all, if Allen is in, Carr might well be out. counterexample to MT involving deontic modals in the consequent. 3 Building on Forrester, [4] suggests that MT admits of failures in cases wherein the consequent is deontically or epistemically modalized. Kolodny and MacFarlane [15] make the same claim. 4 Aside from examples involving modalized or conditional consequents, we might also reach for examples involving consequents which superficially appear to contain adverbs of quantification. For instance: It is not the case that the alarm always sounds.
If there is a break-in, the alarm always sounds.
There is no break-in.
If one is worried about a break-in, the premises here are less than comforting. 5 My objective in this note is not to defend all of these counterexamples, though I am sympathetic with most of them. Rather, my aim to to defend just one-the one we began with-in a sustained way, and in a way which meets some replies left unmet in existing discussions. I focus on the interaction of the conditional with probability operators for three reasons: first, because the intuition that the relevant patterns are invalid is relatively clear; second, because probability operators don't admit of the diversity of possible readings that many other ordinary modal auxiliaries do; and third, because the role of probabilities in the interpretation of conditionals is of considerable independent interest.
A further advantage of focusing on probability operators is that they embed comparatively easily under knows. Once we motivate our counterexample to MT, we will show how it can be leveraged to learn something about the logic of knowledge operators. Specifically, we will show how it can be leveraged to argue that at least some knowledge ascriptions do not have truth-conditions. 6

Objections and Replies
Here are three possible replies to our counterexample. The first reply is that I have misrepresented the logical form of (P1). The probability operator in the sentence is really taking scope over, not under, the conditional operator; and as a result the pattern is a non-instance of MT. The second reply is that, thanks to the context-sensitivity of likely, its semantic contribution is not constant across the premises, and so what is negated in (P2) is not the consequent of (P1). The third reply is indirect: it merely stresses the abundance of cases wherein MT is obviously valid, and says that it will always be more reasonable to suppose that something is suspect with my example than to give up MT. Let me take these objections in reverse order.
We can be brief with the third response. There are indeed many instances of MT which are semantically valid. 7 Notable are those cases wherein the conditional is free of explicit modals ('bare conditionals'). If we snip the probability operators from our premises, for example: (P1 ) If the marble is big, then it is red. (P2 ) The marble is not red.
The conclusion (C1) certainly does seem to follow. Does this counteract the force of our counterexample to MT? It is hard to see how. What we should want is simply a semantics and definition of consequence which match the data: an account which will validate the inference from (P1 ) and (P2 ) to (C1), but which will invalidate the inference from (P1) and (P2) to (C1). (And do the same for all structurally parallel examples.) Now if there were some principled difficulty in giving an account that could do this, that might be a theoretical motivation to reconsider our judgments about our counterexample. But there is no such difficulty. There are existing accounts of indicative conditionals and of probability operators that will do the job. 8 Meanwhile, let us agree wholeheartedly that MT is valid, except for when it is not. 9 Turn now to the second reply, that the context-sensitivity of likely is what generates the illusion of a counterexample. Now there is no question that probability operators exhibit context-sensitivity. For instance, when one says a certain outcome is likely, what is said depends on what the alternatives to that outcome are taken to be in context (see [37]). The question is, what positive reason is there to think that there is some illicit context-shifting going on between our two premises (P1) and (P2)? There appears to be little independent motivation for the idea. In conversation I have heard some cite the feeling that the thing being said to be likely in (P1) is a conditional probability, namely, the conditional probability that the marble is red if big, whereas the thing said to be likely in (P2) is not a conditional probability. Or to put essentially the same thought in less loaded terms, the idea is that the likely in (P1) is in some sense semantically evaluated with respect to more 7 One valid instance occurred in the first paragraph of this paper, when I concluded that MT is not generally valid. 8 See, inter alia, [9,10,15,30,35,37] for relevant work and the appendix for explicit examples. 9 Could one retreat to the claim that MT should be construed as a generalization only about bare conditionals? This would leave out valid patterns that seem to be instances of MT: for example (where '♦' corresponds to epistemic possibility): 'If φ, then ♦ψ; ¬♦ψ; therefore ¬φ. ' We could deny that this is an instance of MT if we like, but it is difficult to see what advantage there is to talking this way. information than the likely in (P2). This is taken to be evidence in support of the idea of illicit context-shifting.
It seems to me the intuition that the likely in (P1) is in some sense semantically evaluated with respect to more information than the likely in (P2) is surely correct. But the inference from this intuition to illicit context-shifting is a nonsequitur. There is no tension between (i) the intuition that the thing said to be likely in (P1), but not (P2), is a conditional probability, and (ii) the idea that the semantic contribution of the probability operator in (P1) and (P2) is the same. To square these, we merely require the assumption that probability operators are sensitive in their interpretation to a parameter which is semantically shiftable by other linguistic expressions-and in particular, by conditional environments. And there is already solid evidence that modal operators are in general sensitive to some such semantically shiftable parameter; indeed, that has claim to being the standard view of the matter. 10 So, bracketing some independent evidence, appealing to context-sensitivity and context-shifting here is quite unmotivated.
We come finally to the third objection. Again, this objection says that at the relevant level of logical form, (P1) is not well-schematized by φ → probably ψ Rather, the compositionally revealing logical form is something else, namely: Hence our first premise is not really a conditional; and hence our argument is not really an instance of MT.
The first point to make here is a clarificatory one. The worry I press about MT depends only on the availability of the narrow scope reading of probably, not on the unavailability of the wide scope reading. So the theorist who insists that probably takes wide scope in (P1) carries a strange burden: he must explain what obligatorily rules out the possibility of the scopal order →, probably-the superficial order.
We should be clear about the nature of this burden. It would be one thing if we could detect some semantic difference between these two allegedly logically possible scopal orders, and then simply declare that intuitively, the scopal order in (P1) is obligatorily probably, →. This style of argument certainly works in principle. 11 But the trouble is, if there are two scopal orders logically possible for sentences like (P1), it seems that they yield semantically 10 See [18]. For further discussion see [35]. 11 To give an example of a case where this style of reasoning works, consider: "Everyone probably lost the lottery". In this sentence there are two logically possible scopal orders for the interpretation of probably and the quantifier everyone, each of which would yield a (truthconditionally) different reading of the sentence. But intuitively, only the reading with probably taking wide scope is actually available. See [32]. equivalent readings. 12 And if they yield equivalent readings, what motivation is there for maintaining that the probability operator takes, and must take, wide scope? For this assumption would not be required to explain how the sentence is interpreted. Rather, what would need explaining from a semantic point of view is just why the two logically possible scopal orders yield equivalent readings.
So the burden is on the wide-scoper to motivate her view. (I take it "Otherwise, MT would fail" is not adequate motivation.) Moreover the burden appears to be a very difficult one to carry, for at least the following two reasons.
we can apparently conclude (2) The marble is probably red.
by Modus Ponens (MP). But this could only be an instance of MP if the probability operator in (P1) does not have wide scope-hence only if the problematic scopal order is in fact available. So the wide-scoper is obliged to deny a whole class of what seem to be routine applications of MP. Second and more damaging, we have problems with conjunctive consequents whose conjuncts are of variable modality. Plainly (3) If Sally is at the party, then Isaac is at the party and Steve is probably at the party.
is not equivalent to: (4) Probably, if Sally is at the party, then Isaac is at the party and Steve is at the party.
For instance (3), together with (5) Sally is at the party. entails (6) Isaac is at the party.
So it is clear that we should not represent the probability operator in (3) as taking widest scope in the sentence (let alone mandatorily taking wide scope). If we are working under the hypothesis that the sentence contains a conditional 12 The point that sentences superficially of the form if φ, probably φ and sentences of the form probably, if φ, ψ generally strike us as semantically equivalent has been made by many others; for instance [29]. See [25] for the view that there are some subtle cases in which these readings can in fact be teased apart.
operator, the natural thing to say is that it takes scope over the probability operator. But if the conditional operator may take scope over the probability operator in (3), we should of course expect that it may in (P1). Pleaing wide scope is thus unmotivated. Other things being equal, we may that assume scope-taking expressions can assume the relative scopes that they superficially occupy. If things are not equal here, we are owed some account why.

Flank Attack from the Restrictor Analysis
There is another reason to worry about the general validity of MT. This is a very different worry. It is the worry that the compositional semantics of conditional constructions is such that required notions of antecedent and consequent do not really make sense. Let me explain.
In accord with most philosophical discussions of conditionals, we have so far assumed that conditional meanings are the result of the semantic contribution of a single dyadic conditional operator. But many find a second idea about the semantics of conditionals much more plausible. On this idea, we assume that modals are much like quantifiers, and we suppose that if-clauses are devices for restricting the quantificational force of modals. (The idea is most famously developed by [16,17], building on [20]; see also [12].) Just as quantificational determiners combine with a (perhaps tacit) domain restriction and a nuclear scope, so modals combine with a (perhaps tacit) modal restriction and a matrix clause. Supposing that quantificational sentences have this kind of structure: The idea would be that conditional constructions are analogous: As quantifiers express quantification over individuals, so modals express quantification over possibilities. As quantifiers express relations between properties, so modals express relations between propositions. On this view, it isn't that if adds to natural language a new dyadic modal operator. Rather, various modal operators of natural language, operators traditionally supposed to be monadic, are now hypothesized to be fundamentally dyadic in nature. Moreover, if-clauses mark restrictions and express no modality on their own. Multiple if-clauses may correspond to multiple restrictions, but needn't entail the presence of multiple modals. Where no modal operator is superficially apparent in a conditional construction, the presence of a tacit epistemic necessity modal is typically assumed. 13 As for our (P1) above, the relevant modal operator would be probably.
Now it is clear that if this kind analysis is correct, the inference we began with is not plausibly an instance of MT. For on such an analysis, what is negated in (P2) is not even a constituent in (P1); a fortiori, what is negated is not the consequent of (P1). (I take it the consequent of a conditional, whatever else it is supposed to be, is at least a constituent of the conditional.) This may appear to be good news for the Modus Tollens Lobby, for it eliminates our counterexample. On the contrary: it is bad news. For the larger upshot of this analysis is that MT is based on a mistake. MT is most naturally taken as a generalization concerning a certain dyadic modal operator, the conditional operator. But there is no such operator according to the restrictor analysis. There is no single sort of dyadic modal operator figuring in every conditional, or even in every indicative conditional. Rather, there are just various dyadic modal operators, corresponding to the various modals of natural language. And MT certainly does not characterize all of these operators.
One way to bring this out is to see that there is no stable notion of 'antecedent' and 'consequent' in the setting of the restrictor analysis which could vindicate MT. Suppose we try to construct an instance of MT from (P1). How to proceed? We need only add a premise which negates its consequent.-But what is its consequent? We cannot identify its consequent with its matrix clause, for that would, absurdly, make the following an instance of MT: P1. If the marble is big, then it's likely red. P2 . The marble is not red. C1. The marble is not big.
It would also, absurdly, make the following an instance of MP: P1. If the marble is big, then it's likely red. P2 . The marble is big. C1. The marble is red.
The larger upshot is that if modals are really just quantifiers over worlds and if-clauses just ways of restricting these quantifiers explicitly, it is not obvious that we should expect there to be any distinctive logic of conditionals going beyond the logic of generalized quantifiers.
Note further that the restrictor analysis problematizes MP just as it does MT, for just the same reasons. It seems to me that if there is a good reason to doubt MP over and above MT, it is this abstract one stemming from the consequences of the restrictor analysis. 14 For MP, unlike MT, does not give rise to the same abundance of apparent counterexamples. 15

Truth and Consequence
My brief against MT is essentially complete. Let me close it by anticipating a line of concern I expect some readers will have.
Consider the following argument: (i) The semantic value of an indicative conditional relative to context is, or has, a possible worlds truth-condition. (ii) If the semantic value of an indicative conditional relative to context is, or has, a possible worlds truth-condition, then the only plausible truthcondition for it will be a truth-condition making MT valid. (iii) So MT is valid.
The argument above is of course intuitively valid, 16 so if our preceding conclusions about MT are correct, one or both of the premises must be rejected. Now, it is not our burden to explain where every argument in support of MT goes wrong, and we are not particularly obliged to respond to the above argument until its premises are defended. But as many readers will want to embrace both of the above premises, it is worth pausing to note some respects in which these premises are both highly nontrivial-indeed, contentious. This will help clarify the burden on those wishing to defend the premises, and it will help to clarify what rejecting MT does and does not entail.
Some would embrace (i) for the following reason: they believe (plausibly) that indicative conditionals have a compositional semantics, and they also believe: (iv) If indicative conditionals have a compositional semantics, then (i) must be true. 14 Kolodny and MacFarlane [15] reject MP in addition to MT. They do so, not in the face of any worry stemming from the restrictor analysis, but rather en route to solving a certain puzzle about sentences expressing conditional obligation (the miners puzzle). Here I wish to note two points: (i) In connection with solving the miners puzzle, the particular way of defining of consequence Kolodny and MacFarlane adopt is not superior to rival definitions which would validate MP, such as the notion called 'informational consequence' in [35]. (ii) Kolodny and MacFarlane's preferred formalization of consequence has a problematic feature: the semantic analogue of the deduction principle (i.e., the principle that if , φ ψ, then φ → ψ) fails. That suggests an unexpected disconnect between conditionals and consequence. (The rival notion of consequence recently mentioned, by contrast, does vindicate this principle.) 15 The best known putative counterexamples to MP are due to [23]. In his discussion, McGee crucially assumes a certain strong connection obtains between the notion of having good reason to believe and the notion of consequence MP (putatively) characterizes. Had I the space I would question this connection, and I would worry about whether McGee's examples really have the status of linguistic explananda. 16 Not to say that it is valid because MP is; as recently noted, that is controversial.
The thesis (iv) seems to be presupposed with little or no argument in many discussions of conditionals. 17 But it is a quite nontrivial thesis. Indeed, in light of the existence of well-motivated compositional semantic systems for conditionals which effect no straightforward semantic association between indicative conditionals and possible worlds truth-conditions, 18 it appears to be false. The larger point, in any case, is that there is no in-principle tension between accepting compositionality and rejecting (i).
Others would embrace (i) because they believe (plausibly) that indicative conditionals participate nontrivially in valid arguments, and they also believe the following two claims: (v) If indicative conditionals participate nontrivially in valid arguments, then they have truth values. (vi) If indicative conditionals have truth values, then (i) must be true.
One will believe (v) if one assumes that consequence is to be modeled in terms of truth-preservation, roughly along Tarksian lines: a valid argument is one such that if the premises are true, the conclusion must be true. Many discussions of consequence proceed under the assumption that adopting this way of characterizing consequence is a theoretically innocent move. But let us be explicit that this is not so. Whether consequence should be modeled in terms of truth-preservation is a substantive and debated question. Indeed, in the context of natural language semantics, the view that consequence is to be modeled in terms of truth-preservation is really a (high-level) empirical thesis. (One with a number of competitors. 19 ) For a view about the compositional semantics of indicative conditionals in natural language only makes robust predictions when paired with some characterization of consequence. It is really the package of the two that makes predictions. In response to data, either part of the package may in principle be revised. And amongst the data that must be factored into the cost-benefit analysis for any given formal characterization of consequence are our judgments about the argument discussed at the opening of this paper. Thus it would make no sense-indeed, it would get things quite backward-to reject the counterexample merely on the grounds that it sits uneasily with a Tarksian definition of consequence. This is why, incidentally, our counterexample is not an attempt to describe a world with respect to which the premises are true, but the conclusion false. To assume that this is what would be required to refute MT is to illicitly assume something like (i), and a Tarskian account of consequence, in advance. But one of the issues the counterexample raises just is the question whether such a notion of consequence could be adequate for modeling natural language. 20 The thesis (vi) is also substantive, although here the reason is more technical. A sentence may have a compositional semantic value which determines a truth-value with respect to a point of evaluation (in a model), without the relevant points of evaluation needing themselves to be possible worlds (or context-world pairs). In such a setting, indicative conditionals might have truth-values relative to points of evaluation, but fail to have truth-values with respect to worlds in any interesting sense-fail to correspond to a way the world might be. 21 So (vi), too, is nontrivial.
Finally, consider (ii). If validity is understood in terms of truth-preservation, and if (i) is true, then it is indeed difficult to see what the possible worlds truthconditions of indicative conditionals could be if MT is not valid for indicatives. MT is of course valid on the leading possible world semantics for conditionals, namely the Stalnaker-Lewis analysis [19,27,28]; and it is valid on the two other most widely-discussed truth-conditional accounts, namely the strict conditional analysis and the material conditional analysis. But the difficulty here is mainly for the case in which we assume, additionally, that the conditional involves a dyadic sentential operator. If instead we adopt the Kratzer-Lewis style syntax for conditionals described in the previous section, it is perfectly clear how (i), but not (ii), could hold; just see [17]. Suffice to say there is nothing trivial about (ii).

Knowledge Operators in MT Inferences
The evidence suggests that MT is either not generally valid, or based upon a mistake. We should take this result as a constraint on semantic theory. We should like a semantic theory for indicative conditionals and for probability operators which, together with an appropriate formal characterization of consequence, shows MT to be invalid in the kind of case we have discussed.
In this section I want to consider a question about the scope of the failure of MT. Setting aside the syntactic worry for the moment, let us pretend the notions of 'antecedent' and 'consequent' do make sense. Above we noted that MT inferences wherein the consequent lacks an overt modal generally strike us as valid. We can add that when modals appearing in the consequent are nestled 20 A relevant comparison for the notion of consequence at work here might be with the theoretical notion of grammaticality in natural language syntax. The syntactician will often characterize subjects as judging that some sentences are grammatical and that some are not. When she does this, she is employing a theoretical notion of grammaticality in a empirically-driven enterprise, one which may be (usually is) alien to the subjects being described. Her use of this notion, and her characterization of subjects, is justified insofar as it plays a role in a theory which best explains a target range of facts-in this case, certain linguistic capacities of the relevant subjects. I am taking it that the notion of consequence employed in natural language semantics has a parallel theoretical status. 21 See the works cited in footnote 19 above. in some appropriate embedded context, the result is also valid. For example the following MT inference, with a probability modal appearing in the consequent under a belief operator, is valid: (P3) If the marble is big, then John believes it is probably red. (P4) John does not believe that the marble is probably red. (C1) The marble is not big.
What constitutes an "appropriate embedded context"? I will not attempt to answer that question in full generality here, but let us consider the special case of the factive attitude verb knows.
Initial appearances suggest nothing different from believes. Consider: (P5) If the marble is big, then John knows it is probably red. (P6) John does not know that it is probably red. (C1) The marble is not big.
The argument is intuitively valid. Trouble is not far off, however. To see the difficulty, consider first: (8) # If the marble is not big, then it is probably big.
I take it this conditional is incoherent. I take it also that if we add a knowledge operator to the consequent, the result is still marked and uninterpretable: (9) # If the marble is not big, then John knows it is probably big.
That (9) is defective is unsurprising, given that (8) is defective. For (9) seems obviously to entail (8); and sentences that obviously entail defective sentences are generally defective themselves. 22 Predicably, if we negate the consequent of (9), the result is acceptable: (10) If the marble is not big, then John does not know it is probably big. Now to come to the difficulty, fix your attention upon (10). Let us consider an MT inference involving it. We add the relevant minor premise: (11) John knows the marble is probably big.
(1) The marble is big. 22 The entailment from (9) to (8) would follow from the factivity of knowledge operators (Kφ φ) and from transitivity for indicatives. Transitivity for indicatives is controversial in some quarters; see [3]. The fact that transitivity would help explain the defect in (9) (by reducing it to the defect in (8)) is a reason to favor it.
To clarify, we are asking about arguments fitting this schema: ¬φ → ¬K(probably φ) The validity of arguments fitting this schema would follow from MT and Double Negation. We have seen no independent reason to question Double Negation, so I will just assume without question that if the argument here is invalid, it is another case of the failure of MT. Intuitive judgments about this example may be less clear than they were with our original counterexample to MT. But consider the following line of thought: If the marble is not big, then (a fortiori) it is not the case that it is probably big. Hence it is not the case that anyone, for instance John, knows that the marble is probably big. So the conditional (10) is really quite trivial. Now to this triviality, let us add an assumption about John's knowledge state, namely the assumption that John knows that the marble is probably big. This assumption obviously does not, by itself, entail that the marble is big. So why think that it would together with a triviality? Thus the inference from (10) and (11) to (1) is invalid; and similarly for any argument of the same schematic form.
If this line of reasoning is not sound, then presumably what must be rejected is the idea that (10) is in some sense trivial. But it is difficult to see why we should not regard it is as trivial. Doesn't the marble's not being big preclude it's likely being big, and hence preclude anyone's knowing that it is likely big?
In support of the thought, we can note that there is undoubtedly some logical tension between (12) and (13): (12) The marble is not big. (13) John knows the marble is probably big.
The conditional (9) clearly illustrates this. We observe a tension also when we simply attempt to hypothetically entertain their conjunction: (14) # Suppose the marble is not big and John knows the marble is probably big.
As the trouble with (9) is rooted in the factivity of knows together with the badness of (8), so the trouble with (14) is plausibly rooted in the factivity of knows together with the badness of (15): (15) # Suppose the marble is not big and the marble is probably big. (cf. [35]) These considerations favor the idea that the inference from (10) and (11) to (1) is indeed invalid.
Is there a way of pushing back against this conclusion? Consider the following rejoinder: Granted (12) and (13) seem incompatible. If they are incompatible, however, then the truth of the latter would entail the negation of the former. But that would be absurd: If we are given merely that John knows the marble is probably big, it does not follow that the marble is big. So we should resist the superficially compelling idea that (12) and (13) are incompatible.
We should agree that it would be absurd to hold that (13) entails the negation of (12). But why can't we reject that entailment, and also maintain that the two sentences are incompatible? To assume that we cannot-as the italicized remark above does-is to beg the question in favor of a classical account of consequence. And a natural thought is that these data call classicality into question if anything does. On balance, what these data prima facie suggests is simply that we need a semantics and an account of consequence according to which (12) and (13) incompatible, despite the latter's not entailing the negation of the former. That is, generally speaking, we should like a semantic theory equipped with an account of consequence according to which: (i) is supported by data like (9) and (14); (ii) is supported by the factivity of knowledge plus the idea, suggested by (8) and (15), that ¬φ ¬probablyφ; and (iii) is obvious. Or to put it differently, the properties are desiderata for our semantic theory because the following are desiderata: -and because knows is factive. Obviously a classical account of consequence could not satisfy (i)-(iii) together. (Or (i )-(iii ) together.) Yalcin [35] notes that it is difficult to see how to reconcile (i )-(iii ) with possible worlds truth-conditions for probablyφ-sentences. Given the way consequence is usually defined in the setting of possible worlds semantics, 23 no truth-conditions could satisfy these demands. To this we can add that the same is true for K(probablyφ)-sentences, with respect to (i)-(iii). Given the way consequence is usually defined in the setting of possible worlds semantics, there is no way of associating K(probablyφ) with possible worlds truthconditions that would satisfy these constraints. 23 I have in mind a formalization of consequence along the following lines: φ 1 , ..., φ n ψ just in case for any world w, if φ 1 w = 1, ..., φ n w = 1, then ψ w = 1, for all sentences φ 1 , ..., φ n , ψ.
Where does this leave us? First, it leaves us tentatively of the view that the inference from (10) and (11) to (1) is indeed invalid, as is anything of the same schematic form. A probability operator in the consequent position of a conditional will invalidate MT reasoning on that conditional even when the probability operator is embedded under a knowledge operator, which is in turn embedded under a negation operator. Second and more interestingly, it leaves us with some evidence against the idea that all knowledge ascriptions can be associated with possible worlds truth-conditions. Thanks to the factivity of knows, knowledge ascriptions inherit the unusual semantic features of the epistemic modal sentences they embed.

Closing
How to understand the failure of MT, and this surprising conclusion about the truth-conditions for knowledge ascriptions embedding epistemic modals? Our objective here has been entirely negative: the aim was just to shift the burden to those who would take the general validity of MT as a desideratum for a theory of conditionals. We have already noted a number of contemporary frameworks in which MT is rejected. Let me close by sketching the direction of explanation I favor, gestured at in various places above, without pretending to fully motivate it here (see [35,38], and the Appendix below for further discussion).
We observed that it is entirely possible to believe (P1) and (P2) while failing to believe (C1), even after full and complete rational reflection. 24 Let us step back and ask: what exactly is it to believe (P1)? Suggestion: it is, ideally, to be in a credal state giving the outcome that the marble is red is better-thaneven odds, conditional on the marble's being big. What is it to believe (P2)? Suggestion: roughly, it is to be in a credal state that gives the outcome that the marble is red even-or-lower odds. What is it to believe (C1)? Suggestion: it is to be in a state whose content rules out that possibility that the marble is big.
If these suggestions are on track, then it is clear enough why it can be rational to believe (P1) and (P2), but fail to believe (C1). We can make the matter clearer if we model idealized credal states, and information states generally, as probability spaces. Then trivially there will be credal states which satisfy the requirement that but which nevertheless fail to rule out big. (Indeed there will be rational credal states satisfying (a) and (b), and yet also such that Pr(big) > .5).
To get from these observations back to semantics, note that on this way of thinking about what it is to believe sentences such as (P1) and (P2), believing what they say is not tantamount to ruling some possibilities in or out. (At least not without nontrivial further assumptions.) It is instead a matter of one's doxastic state's satisfying certain global features, features that do not simply reduce to the way the state represents the world to be. This works nicely with the idea already motivated, namely that the semantic values of probablyφsentences are not given by possible worlds truth-conditions (functions from possibilities to truth-values, sets of possibilities, ways of dividing logical space, ways the world might have been, etc.). It points to a different idea about the semantic values of our target sentences, namely the idea that they correspond to constraints (not on the way the world is, but) on states of information. What we should want out of the compositional semantic values of sentences like (P1) and (P2) are just constraints on states of information, of the sort delivered by (a) and (b). Formally such constraints correspond to sets of information states-or sets of probability spaces, if that is how we elect to model states of information like credal states. This simple thought here is that with such sentences we can (inter alia) express aspects of our credal states, aspects which do not correspond in any straightforward way to a view about the way the world is. 25 This gives us a new kind of object to design a compositional semantics around. Instead of possible worlds truth-conditions, we have constraints on states of information-or 'probability conditions', if states of information are probabilistically modelled. (Cf. the notion of a "probasition" in [14].) Now suppose we could compositionally associate sentences in general with something like probability conditions. Then consequence could be modeled as a relation between a set of sentences (premises) and a sentence (conclusion) which holds when the satisfaction of the constraints expressed by the premises suffices for the satisfaction of the constraint expressed by the conclusion, for any given information state. It would be a relation that preserves probability conditions. Such a model of consequence would predict the failure of MT we have pointed to. And it may even allow that although (C1) does not follow from (P1) and (P2), the three statements together are not jointly consistent (i.e., not jointly satisfied by any information state).
The central burden of this approach, of course, is to compositionally associate sentences in general with probability conditions, and to motivate that semantics over rival accounts on a broad array of data. See [35,37,38] and the appendix for a start at this burden.
Finally, what of the apparently non-truth-conditional character of knowledge ascriptions embedding epistemic modals like probably? We have just suggested that having a view about what is likely is not, or not merely, to have a view about which possibilities are still open; it is also a matter of how one distributes the probabilities over the possibilities one takes to be open. Given this, the same holds about having a view concerning whether some other agent knows, where what is at issue is whether he knows that something is likely. To agree that John knows that the marble is probably big is partly to have the view that the marble is probably big, which is itself not a view about a purely factual matter-not purely a matter of what kind of possibilities one rules out. And this yields a restricted form of nonfactualism about the state of knowing itself. 26 And both employ the notion of an information state: Def. An information state i in M is a pair s, Pr of a set of worlds s (some subset of W M ; call it the domain of i) and a function Pr assigning the elements of some Boolean algebra of subsets of W a number in [0, 1] satisfying the following: An information state is a probability space conditionalized on a primitively given set of possibilities. Semantically these will play the role of epistemic possibilities in each system below.

Probabilistic Static Semantics
Our first formal semantics builds upon [22,35,37]. It is a static account assigning truth values to sentences relative to indices: Def. An index in M is any world-information state pair w M , i M .
The semantics takes the form of a recursive characterization of truth at an index for sentences of L, as follows: Def. For any M, a valuation w,i for M is a function assigning either 0 or 1 to each wff relative to each index w, i in M subject to the following constraints, where α is any propositional letter, φ and ψ are any wffs: (Where φ i is an abbreviation for {w : φ w,i = 1}.) Observe negation and conjunction have their classical semantics. To add the indicative conditional operator →, two further definitions will be helpful: Def. An information state i accepts φ iff ∀w ∈ s i : φ w,i = 1.
Acceptance is meant to track the intuitive idea of an information state incorporating the information associated with a sentence. We also want the idea of the nearest information state to a given information state accepting a sentence.
Def. The nearest information state to i accepting φ, i + φ, is defined as follows: I assume the usual definition of conditional probability in terms of unconditional probability. With these definitions in place, we add the following clause to our recursive characterization of truth with respect to an index: Observe the connection between indicative conditionals and conditional probabilities this semantics supplies. Finally, we define consequence over the semantics. As in [35], we understand the consequence relation to preserve, not truth with respect to an index, but rather acceptance, as follows: Def. φ 1 , ..., φ n ψ iff no information state which accepts φ 1 , ..., φ n fails to accept ψ. Now it is easy to see why MT fails on this account. Considering our initial counterexample, our semantics associates the premises with the following acceptance conditions: (P1) If the marble is big, then it's likely red.
Pr i (red|big) > .5 (P2) The marble is not likely red.
Pr i (red) ≤ .5 These conditions can be satisfied simultaneous with the constraint corresponding to the negation of the MT conclusion: (C1) The marble is not big.
Examples of information states with these properties are trivial to construct. 27 For independent motivation of various aspects of the above semantics and further discussion, see [10,15,22,25,26,35,37,38], . 28 For extensions to attitude verbs, see [26,35,38]. Anand and Hacquard [1] contains additional relevant discussion. 27 For the sake of explicitness: consider i where s i = {w 1 , w 2 , w 3 }, Pr i ({w 1 }) = .4, Pr i ({w 2 }) = .3, Pr i ({w 3 }) = .3, only w 1 , w 3 ∈ big, and only w 1 ∈ red. 28 Note that on the above semantics, expressions with stacked epistemic modals (e.g., ♦♦φ) will generally be semantically equivalent to the corresponding expression with the most narrow modal (♦φ). Stacked epistemic modals often are interpreted as vacuous in this way, especially when the modals stacked are the same (a phenomenon called modal concord); other times, such stacking is just anomalous; still other times, such stacking does allow for coherent interpretations not equivalent to corresponding expression with the most narrow modal. The latter case is not provided for by the above semantics. In such cases I would be inclined to appeal to tacit shifting of the information state parameter, akin to free indirect discourse. See [35,36] for some discussion. The issue is beyond the scope of this appendix.

Probabilistic Dynamic Semantics
Our second formal semantics essentially extends the semantics of [31] (see [2] for a nice overview), incorporating ideas described in [34,35,37]; see [40] for detailed discussion. First, rather than taking sentential semantic values to be functions from points of evaluations to truth values, we take them to be functions from information states to information states (update functions, or context change potentials), joining the dynamic tradition going back to [12]. Second, rather than modeling information states as sets of possible worlds (as in Veltman's update semantics) or as files (as in Heim's file change semantics), we continue to take them to be probability spaces as defined above. Sentences are recursively associated with update functions as follows: 29 Def. An dynamic valuation [·] is a function from wffs of L to functions from information states to information states subject to the following constraints, where α is any propositional letter, φ and ψ are any wffs: The semantics for negation and conjunction here in essence go back to [12,13]. The idea for the treatment of epistemic possibility modals here goes back at least to [31]. The conditional semantics here is a probabilistic analogue of the semantics of [9]. 30 Note the tight connection between indicative conditionals and the corresponding conditional probabilities built into this account.
Consequence may be defined in various ways over this semantics (see [2,31]). A relatively conservative choice, and one adequate for our purposes, would be the following: . 29 I an indebted to Justin Bledin here for suggestions which lead to a considerable simplification of the following semantics. 30 An alternative clause for the conditional, in the spirit of [13], would be the following: Ultimately, the failure of MT here is explained in the same fashion as in the static semantics. An information state which accepts the premises need not accept the conclusion.
The dynamic semantics presented here presupposes a thesis I have elsewhere [34,35] called context probabilism: Context probabilism: the common ground of a conversation is characterizable as a probability space, or as a set of such spaces.
(The static semantics above also naturally lends itself to this thesis, given the definition of consequence assumed there.) Now the question arises, should a state of presupposition be modeled as a probability space (an information state), or a set of such spaces? If we stick to a single probability measure, then if states of presupposition are assumed to be perfectly coordinated in the ideal case, the problem immediately arises how agents can be expected to coordinate on the probabilities of myriad propositions that go undiscussed in context. It seems implausible that propositions which are completely open in conversation must be assigned precise probabilities; among other things, these probabilities would reflect nothing about how we are conversationally coordinated.
We can escape this problem by supposing that states of presupposition, hence informational contexts, are representable by sets of probability spaces (sets of information states). A conversation which presupposes nothing about a proposition p is one which leaves open every possible way of associating a probability with it. That is to say, for every probability n, there will be an information state i in the common ground such that Pr i ( p) = n. On this approach, the common ground is representable as a probability condition. Then we can define on update on sets of information states I in a manner parasitic on our recursive characterization of updates for single information states, as follows: 32 For any set of information states I, I[φ] = {i ∈ I : i[φ] = i}. 31 I will be harmlessly loose about use and mention for expressions in teletype. 32 I am inspired by a structurally analogous idea developed by [33]. This has the additional advantage of supplying a more substantive account of the conversational dynamic change initiated by the epistemic language for which we have supplied a 'test'-like semantics (i.e., epistemic modals, probability operators, and indicative conditionals). Appeal to sets of probability spaces may also be ultimately needed to handle disjunction; see [26].
As goes the state of presupposition, so too, I think, should go other acceptance-like attitudes-for example, supposition, belief, and knowledge. We can assume that these states too determine sets of probability spaces. For example, suppose to any agent x and possible world w there corresponds the set of probability spaces B w x which reflect that agent's state of belief-an idea familiar from the formal epistemology literature. Then we can extend our semantics to belief ascriptions as follows: [26], who develops a similar idea in different semantic setting.) It is easy to imagine an analogous semantics for knowledge ascriptions. There we mainly need to incorporate the fact that knowledge ascriptions generally presuppose their complements. Suppose, in the spirit of [2], we introduce an artificial presupposition operator ∂: Letting an agent x's state of knowledge in w be the set of information states K w x , we can give dynamic semantics for knowledge ascriptions as follows: This aims to capture presupposition projection via the percolation of undefinedness in the calculation of a sentence's context change potential, as in [12]. Knowledge ascriptions will go undefined with respect to contexts which do not already accept φ, the desired result.
On what seems to me a perfectly natural interpretation, these formal proposals about what it is to accept and assert probabilistic sentences, whether developed statically or dynamically, yield a kind of expressivism about this fragment of language. If we restrict attention to those cases where the language is used to express or describe doxastic states, we could call the view credal expressivism. Again, to be in a state of mind which accepts a sentence of the form φ is generally not, on this picture, merely a matter of representing the world as being some particular way-not merely a matter of ruling some possibilities in and others out. It is a matter also of how one distributes the probabilities over the possibilities.
Once we see this way of formalizing the idea of expressing credence, another possibility-the idea of expressing preference, or utility-comes into view. Like states of credence, states of preference are not states that merely rule some possibilities in and others out. Just as the preceding story shows how we 33 I am indebted here to conversations with Julien Dutant. might use language to coordinate on something other than a way of representing how the world is, so we can imagine an analogous story about the language by which we express preference and utility. As epistemically modalized and probabilistic sentences correspond to conditions on information states, so too might we describe deontically modalized sentences as expressing conditions on the allocations of utility one's state of mind leaves open. As we added probabilistic structure to the common ground, so too might we add utilitytheoretic structure. If that worked out, epistemic modals and deontic modals would be, in a sense, the Bayesian modalities, and we would have what we could call Bayesian expressivism. I investigate the possibility of this view elsewhere [39].