A Minimal Probability Space for Conditionals

One of central problems in the theory of conditionals is the construction of a probability space, where conditionals can be interpreted as events and assigned probabilities. The problem has been given a technical formulation by van Fraassen (23), who also discussed in great detail the solution in the form of Stalnaker Bernoulli spaces. These spaces are very complex – they have the cardinality of the continuum, even if the language is finite. A natural question is, therefore, whether a technically simpler (in particular finite) partial construction can be given. In the paper we provide a new solution to the problem. We show how to construct a finite probability space S#=Ω#,Σ#,P#\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {S}^\#=\left(\mathrm\Omega^\#,\mathrm\Sigma^\#,\mathrm P^\#\right)$$\end{document} in which simple conditionals and their Boolean combinations can be interpreted. The structure is minimal in terms of cardinality within a certain, naturally defined class of models – an interesting side-effect is an estimate of the number of non-equivalent propositions in the conditional language. We demand that the structure satisfy certain natural assumptions concerning the logic and semantics of conditionals and also that it satisfy PCCP. The construction can be easily iterated, producing interpretations for conditionals of arbitrary complexity.

One of central problems in the theory of conditionals is the construction of a probability space, where conditionals can be interpreted as events and assigned probabilities.The problem has been given a technical formulation by van Fraassen [23], who also discussed in great detail the solution in the form of Stalnaker Bernoulli spaces.These spaces are very complex -they have the cardinality of the continuum, even if the language is finite.A natural question is, therefore, whether a technically simpler (in particular finite) partial construction can be given.A partial answer has been given by Węgrecki and Wroński [24].
In the paper we provide a new solution to the problem.We show how to construct a finite probability space S # = Ω # , Σ # , P # in which simple conditionals and their Boolean combinations can be interpreted.The structure is minimal in terms of cardinality -an interesting side-effect is an estimate of the number of non-equivalent propositions in the conditional language.We demand that the structure satisfy certain natural assumptions concerning the logic and semantics of conditionals and also that it satisfy PCCP. 1 The construction can be easily iterated, producing interpretations for conditionals of arbitrary complexity.
The structure of the paper is as follows: In Section 1, Introductory remarks on probabilities of conditionals, we provide the general background and formulate the problem.In Section 2, Toy examples, we present the construction of S # = Ω # , Σ # , P # in the simplest cases.
In Section 3, The general definition of S # , we present the formal definition of the probability space S # = Ω # , Σ # , P # .In Section 4, The formal properties of S # , we show that S # indeed has the required properties.
In Section 5, Minimality of the model, we show that the model is minimal in the class of models satisfying a natural condition and present the estimates of the number of elements and non-equivalent formulas of L( →).
In Section 6, The iteration, we show that the construction can be iterated, so that models for conditionals of arbitrary complexity may be obtained in this way.In Section 7, A short comparison with the Stalnaker Bernoulli model, we make the mentioned comparison.Section 8 is a short summary.
Appendix 1 contains the proof that PCCP indeed holds within our model.Appendix 2 contains the proof of the minimality result.

Introductory Remarks on Probabilities of Conditionals
We are confronted with conditionals in scientific discourse, in everyday situations -and of course in philosophical analysis.We make statements like If I had drunk my morning coffee, I would not have a headache or If I became a football player, I would be happy, and -intuitively -we ascribe probabilities to such claims.When discussing scientific matters, we think about If the temperature was higher, this piece of metal would melt, and if we are discussing political fiction we can consider If Reagan worked for the KGB, I'll never find out ([18], 155) or the famous 1 3 A Minimal Probability Space for Conditionals Oswald-Kennedy examples: If Oswald didn't kill Kennedy, someone else did and If Oswald hadn't killed Kennedy, someone else would have given in Adams [2].
There is a lively and intense discussion on the proper logical and probabilistic description of conditionals.Standard propositional calculus with the conditional interpreted as a material implication is not an attractive solution.It is not clear what the proper logic of conditionals should be -i.e., the appropriate axioms and the rules of inference.Is the Import-Export Principle concerning nested conditionals true? 2 Is any version of the probabilistic Independence Principle valid? 3 And presenting appropriate semantics poses difficulties -in particular, difficulties in accounting for their truth values, and truth conditions themselves also pose notorious problems.
The general problem of the paper might be formulated as follows: Given a set of possible worlds endowed with a probabilistic structure, i.e., S = (Ω, Σ, P), we want to define a probabilistic structure S # = Ω # , Σ # , P # in such a way that: (i) simple conditionals can be interpreted in S # ; (ii) natural semantic assumptions concerning the conditional connective → are satisfied and; (iii) PCCP holds.
It is far from obvious what the class of natural semantic assumptions is, so we have to make a few choices.Here we accept the assumptions which have been discussed, for instance, in van Fraassen [23], i.e.: (I) (IV) (A → A) = K (the set of all possible worlds).
We make the uncontroversial assumption that the conditional is not a material implication. 4ur aim is modest, so we take A,B,C to be Boolean sentences. 5Observe that (I)-(IV) are semantic properties, which are independent of the probability measure.PCCP is the only postulate concerning the probability measure. 6here are several formal constructions of probabilistic models for conditionals in the literature.Stalnaker Bernoulli spaces present an impressive (PCCP) P(A → B) = P( B|A); construction [11-14, 23].They allow one to give an interpretation of all conditionals and to compute their probabilities.However, Stalnaker Bernoulli space has the cardinality of the continuum, so it is rather big.Elementary events are infinite sequences, only very few of them have a natural interpretation.The probability measure is defined on a kind of cylindric subset of the set of all sequences.If we are interested, for instance, only in simple conditionals A → B, then this space seems to be much too big for our purposes. 7Another example is the Markov graph model presented in Wójtowicz and Wójtowicz [25, 27].Here, for a given conditional α, a Markov graph G(α) is defined which gives rise to a probability space S(α) which, in turn, allows one to produce a natural interpretation of α and to compute its probability.The original construction is limited to the particular conditional α in question.The generated probability space S(α) is countably infinite, and its structure is very simple. 8However, we want to do even better and present a finite, minimal probability space which allows us to model a possibly large class of conditionals.We can say that the ontological commitments are kept minimal in the model to be presented here. 9e start with a finite probability space S = (Ω, Σ, P) .Ω = {W 1 , … , W n } might be thought of as the set of possible worlds, i.e.,Σ is σ-field and P is a probability distribution defined on the subsets of Ω.As Ω is finite, the σ-field Σ is 2 Ω , i.e., the power set of Ω, so there are no problems or subtleties concerning Σ and the measurability of the subsets of Ω.Any subset of Ω can be considered to be a proposition, in particular Ω represents the logical truth T and the empty set ∅ represents the contradiction ⊥.This means that if Ω consists of n elementary events (worlds), the language in question has 2 n propositions.The Boolean connectives ∧ and ¬ correspond to the intersection ∩ and the complement X c of sets of worlds (events).So what we call the "Boolean language" is the language corresponding to the set of possible worlds, and propositions from that language are identified with the appropriate subsets of Ω. 10 Notation We shall use roman letters when referring to events from the space S = Ω, 2 Ω , P and italics when referring to the corresponding propositions: for instance, we write B instead of {B} or A ∨ B ∨ C instead of {A,B,C}.In 1 3 A Minimal Probability Space for Conditionals some cases (especially in the examples), we use letters like B,G,R,W to denote elementary events in Ω (i.e., individual worlds).In the general case X,Y will denote arbitrary subsets X,Y ⊆ Ω, and X,Y are the corresponding sentences of the language L. We are interested in the conditional language L( →) (extending L) which contains simple conditionals (and their Boolean combinations). 11We know (by definition) when a world W from Ω satisfies a Boolean sentence A ∈ L , i.e., when W |= A .But -apart from trivial cases -the structure S = Ω, 2 Ω , P is not suited to provide an interpretation to conditionals. 12This means that we need to define a new structure S # = Ω # , Σ # , P # in which the conditional lan- guage L( →) can be interpreted.In this structure Ω # is the new set of elementary events, Σ # is the σ-field (in our case it will be trivial, i.e., the power set of Ω # ), and P # is the probability distribution.We can think of the set of elementary events Ω # as of the set of fine-grained possible worlds (which somehow depend on the original worlds from Ω).The term "more fine grained possible worlds" is used in Stalnaker and Jeffrey's [22] and refers to non-basic entities that are nonetheless capable of making a conditional true or false.The elementary events ω ∈ Ω # play this role.
Answering them amounts to providing interpretations to sentences from the language L( →) within S # and to assigning probabilities to them.We shall use the symbol [α] # to denote the interpretation of α within Ω # , i.e., [α] # = ω ∈ Ω # ∶ ω |=α .The probability of α is standardly defined as the probability of the set of worlds which make α true, i.e., as P # [α] # .The prob- abilities of Boolean sentences X ∈ L are expected to be preserved in S # which means that P # [X] # = P(X).
A Short Comment on PCCP PCCP is the acronym for "Probability of Conditionals is Conditional Probability".Adams's exact definition is as follows: the probability of a conditional P(A → B) is P( B|A) [1-4].It is important to stress that it is meant as a definition of P(A → B): no probability space is constructed. 13dams' proposal is very simple and conforms to our intuitions in a large class of cases.For instance, the probability of the conditional If it is even, then it is a six is intuitively 1/3 (for a fair die), and this is the conditional probability P(It is a six | It is even). 14However, if we want to use the notion of probability in a mathematically sound fashion, we need to define a probability space, in which such intuitive claims are given sound mathematical underpinning.
Before we give the formal definition, we first present some toy examples in order to make it more intuitive.

Three Balls
Consider three balls in an urn: G, R, W (Green, Red, White) with equal probabilities of being drawn (1/3).The sample space S = (Ω, 2 Ω , P) consists of three elementary events: Ω = {G,R,W}.Imagine that someone draws the white ball and considers alternative scenarios: 14 "What is the probability that I throw a six if I throw an even number, if not the probability that if I throw an even number, it will be a six?" [23, 273]. 15Hájek [6] shows that any non-trivial finite-ranged probability function has more distinct conditional probability values than distinct unconditional probability values.This means that if PCCP holds, the original probability space is not the right one.See also Hájek [7, 8] for an extensive discussion of the PCCP problem, including in particular its importance for decision theory.Hajek and Hall [9] discuss Lewis's [17] triviality results and argue that PCCP cannot be upheld as a general rule.However, see also Wójtowicz and Wójtowicz [26] for a discussion of Lewis's proof -the authors suggest that it is based on an unjustified application of the law of total probability. 13PCCP is also called "Stalnaker's Thesis": Stalnaker [20] has been very influential and important for the discussion.There are differences between Adams's and Stalnaker's versions: Stalnaker speaks of conditional degrees of belief, while Adams originally formulated his claims in terms of assertability.Moreover, Adams's definition is given for simple conditionals only, i.e., X → Y , when X, Y are Boolean sentences (i.e., not containing the conditional operator).Stalnaker's thesis also applies to compound conditionals -in spite of Adams's claim that "we should regard the inapplicability of probability to compounds of conditionals as a fundamental limitation of probability, on a par with the inapplicability of truth to simple conditionals" [3, 21] for a non-technical presentation or Khoo [15], see also Khoo and  Santorio [16] for a comprehensive discussion.

3
A Minimal Probability Space for Conditionals "If I did not draw the white ball, I would draw a green ball with a probability of ½ or a red ball with a probability of ½." The same applies to people drawing the green and red balls. 16e can imagine that drawing the white ball is the beginning of a certain process of drawing balls from the urn (without returning the balls).After drawing the white ball, both the green and the red ball can be drawn (with equal probabilities).The sequence ends when the last ball is drawn.We shall denote these scenarios by WGR and WRG.The first letter identifies the color of the first ball, the second letter indicates the second ball, and the third letter indicates the third and final ball.There is a total of six possible sequences in our scenario.
These sequences correspond to the permutations of the set {G, R, W} -and we take these permutations to be elementary events in the new probability space S # which means that Ω # = {GRW, GWR, RGW, RWG, WGR, WRG}.
Intuitively, all the permutations have equal probability, i.e., 1/6.Indeed, each ball has an initial probability of 1/3 to be drawn.And after the first ball is drawn, each of the remaining balls has a probability of ½.The last ball of course has a probability of 1.In the general case the probabilities will be assigned in a slightly more complex way, but the intuitions will remain the same.
Consider the conditional ¬W → G, i.e., the sentence (G ∨ R) → G .When is it true in the new space S # = (Ω # , Σ # , P # )? I.e., which of the elementary events (words) in Ω # make (G ∨ R) → G true? We can think about it in terms of winning a bet on (G ∨ R) → G : every permutation ω ∈ Ω # corresponds to a possible game scenario and decides whether the game is won or lost.Imagine we generate at random one of the six elementary events in Ω # (i.e., sequences of balls).If we draw the green ball first (which means that ultimately we can produce one of the sequences GRW or GWR), we certainly win, as we have drawn a green object.If we draw a red ball (i.e., the first element of RGW or RWG), we lose (we have drawn a red object).The interesting cases are the permutations starting with the white ball.Informally speaking, such a permutation can either have the "dominating tendency" towards green or towards red.If we draw the "greenish" sequence WGR we win.If we draw the red-like sequence WRG -we lose.This means that: Each of the three permutations has a probability of 1/6, so P # [¬W → G] # = ½, as expected.

Four Balls
Consider four balls B, G, R, W (to keep matters simple, they are equally likely to be drawn with a probability of ¼).Someone who draws the white ball might think about different scenarios: "If I did not draw the white ball, I would have three remaining options, all equally probable.So I could have drawn green or red or black with equal probabilities of 1/3."But there are 4 colors, so the they might also have more complex reflections on possible scenarios, for instance thinking: "If I did not draw white, I could draw green (with a probability of 1/3).But if neither of these happened there would still be two possibilities left (black, red), and the chances of either of those would be equal." Metaphorically speaking, we might think of the basic color in this case (white) as having a dominating tendency (green), which in turn has some still weaker "subtendency" (black).This possibility is represented by the permutation WGBR: the first ball is white, the second ball -i.e., the "dominating tendency" -is green, the third ball -i.e., the "second best tendency" -is black.Observe that the last element of the permutation (i.e., red) is uniquely determined by the first three elements.Intuitively, the probability of this sequence is There are six permutations beginning with W, which correspond to different alternative sequences beginning with the White ball.The same applies to the other three balls.
It is convenient to think about the permutations as representing the process of drawing balls from the urn without returning them.At each stage the probabilities change, as there are fewer balls in the urn.

What is ω |=α ?
Consider again the conditional (G ∨ R) → G .We have to decide which of the 24 per- mutations (sequences of balls) make Obviously, when we draw the green ball first, we consider the sentence to be true (regardless of the tendency, subtendency, etc.).This is the case of any of the 6 permutations GBRW, GBWR, GRBW, GRWB, GWBR, GWRB; we consider the conditional to be true (we shall use the symbol G*** to denote any of these six permutations).When we draw the red ball -we lose.
We also have permutations starting with B or W. For them the decision is also simple: we have to decide whether -metaphorically speaking -they are strongly attracted to green or red; if the tendency is green, (G ∨ R) → G is true.If they have a red tendency, it is false.This means that we are interested in the order in which the green and the red balls have been drawn: If we choose one of BGRW, BGWR, WGRB, WGBR,BWGR, WBGR -we win.If we choose one of BRGW, BRWG, WRBG, WRGB,BWRG, WBRG -we lose. Finally:

A Minimal Probability Space for Conditionals
But this means that the set [(G ∨ R) → G] # consists of 12 elements, and its prob- ability is ½, as expected.
The heuristic rule we use in order to decide when ω |=(G ∨ R) → G is simple, but very important: In a slightly different formulation, the rule states that the first element of ω which has the property G ∨ R , also has the property G. Indeed, [(G ∨ R) → G] # consists of all the permutations, where G precedes R. 17 The What-comes-first-rule will be used in the general definition.It is convenient to think of X → Y in terms of "the first element of ω with property X also has property Y" (i.e., "the first X-like element of ω is also Y-like").For instance, EvEn → PrimE means that the first even number to appear in the permutation ω is also a prime number. 18

Arbitrary Probabilities of the Balls B,G,R,W
Consider the four events (balls) B,G,R,W as having arbitrary probabilities in the sample space: for instance, P(B) = s, P(G) = p, P(R) = q, P(W) = r, with 0 < p,q,r,s < 1 and p + q + r + s = 1.The structure of Ω # and the relationship ω |= are fixed and inde- pendent of the initial probability distribution on Ω.Our task is therefore to assign probabilities to the permutations, i.e., to the elementary events ω ∈ Ω # .
The probability assignment is very intuitive if we think in terms of constructing a permutation by drawing balls from the urn (of course, without returning the drawn ball).Take for instance BGRW.The probability of drawing B is s.The probability of drawing G from the remaining three balls is p p+q+r .The probability of drawing R from the remaining two balls is q q+r .Finally, the chance of drawing W from the remaining one ball is 1.This means that the probability of BGRW is s ⋅ p p+q+r ⋅ q q+r . 1917 We might think in terms of playing a game which decides (G ∨ R) → G .This is done by drawing balls from the urn and waiting for an answer.If we draw a ball which is neither G nor R, we have to continue until a decisive ball is chosen.If it is G -we win.Similarly, once we see an R -we lose. 18Consider the conditional other words: a green or a red ball comes before the first white ball).This means that The probability of this event is 16/24 = 2/3, as expected.In the case of (G ∨ R) → (G ∨ B) we ask whetheramong the G ∨ R balls -it is a G ∨ B ball that comes first, or a non − G ∨ B ball, i.e., a R ∨ W ball.We win if G precedes R in the permutation. 19The same scheme applies to any of the permutations.If we consider permutations which start with a B, we have the following formulas for their probabilities: P # (BGRW) = s p p+q+r q q+r ; P # (BGWR) = s p p+q+r r q+r ; P # (BRGW) = s q p+q+r p p+r ; P # (BRWG) = s q p+q+r r p+r ; P # (BWGR) = s r p+q+r p p+q ; P # (BWRG) = s r p+q+r q p+q .We can think of these permutations as corresponding to the sequences which begin with the Black ball.Their joint probability is equal to P(B) = s.This is consistent with our toy example: if p = q = r = s = 1/4, every entry has the probability of 1/24.
We know that: After a slightly tedious but simple computation, we obtain the result: as expected. 20This means that P # [(G ∨ R) → G] # = P({G})|{G, R}) .This is not a surprise as we will show that in the model PCCP holds in general, i.e., that

The Definition of S #
Consider the general case when Ω = {W 1 ,…,W n } and P(W i ) = p i , for i = 1,…,n.To simplify the notation, we shall denote the elementary events with the numbers 1,2…n.For n = 6, a natural illustration is a die.
Obviously, the Boolean sentences X ∈ L need to be interpretable in S # and their probabilities P(X) need to be preserved.

The Set of Elementary Events Ω #
Take Ω # to be the set of all permutations of the set {1,2,…,n} (i.e., the set of elementary events is the permutation group S(n)).The σ-field Σ # is trivial and consists of all subsets of Ω # .

The Relation !|=˛
The definition is based on the What-comes-first-rule.Take a conditional X → Y, for X,Y ⊆ Ω.We first consider the case when X is non-empty.The case of X = ∅ is discussed in Section 4.2.4.
Alternatively, we might also say that in permutation ω the first element of X ∩ Y comes before the first element of X ∩ Y c .The formalized version of the definition is: 21 According to the definition, the set [X → Y] # consists of all permutations ω in which the first element with the property X also has property Y.For instance, if Ω = {1,2,3,4,5,6}, [even → prime] # consists of all the permutations, in which the first even number to appear is also a prime number (i.e.,2).The set [even → prime] # contains 240 permutations (like for instance 123645, 524361, 135264), i.e., 1/3 of all of the 720 permutations. 22For an n-sided die there are n! permutation, but the rules do not change.
For the sake of clarity, we shortly address the conditional X → ∅ (with X ≠ ∅).Intuitively, the probability of such a conditional should be 0 -and this is so in our model.The fact that [X → ∅] # = ∅ is consistent with the What-comes-first-rule: indeed, [X → ∅] # consists of those permutation ω where the first element with the property X has no property at all.Clearly, the set of such permutations is empty.
We also need to stipulate how Boolean sentences have to be interpreted within S # -but this is an obvious matter.For instance, the Boolean sentence B is true of every permutation beginning with a B.

The Probabilities of the Elementary Events ω ∈ Ω #
The general idea of defining the probability of a permutation ω ∈ Ω # has already been presented in Section 2.
As before, think of drawing balls from the urn (without returning them) and producing permutations in this way.For instance -what is the chance of obtaining 123…n?The chance of drawing 1 in the first move is p 1 .After the first draw there are (n-1) possibilities left.The chance that the second chosen element is 2 is . Now there are (n-2) possibilities left.The chance that the third element is 3 is . And so on -at each step there are fewer remaining possibilities.At the last step there 21 ω(i) is the i-th element of the permutation ω, for i = 1,…n. 22For Ω = {1, 2, 3, 4, 5, 6} , then the following facts are true (the sequences of numbers denote permutations): is only one possibility, and p n 1−(p 1 +p 2 +⋯+p n−1 ) =1.This means that the probability of generating the permutation 123…n is: In the general case of the permutation k 1 …k n (k 1 …k n being different natural numbers between 1 and n) the notation is slightly more cumbersome, but nothing changes.The probability of the permutation is: 23 In this way we define the probability measure P # on Ω # .

The Imbedding of S Within S #
The old space S can be imbedded into the new space S # in an obvious way: the counterpart of the elementary event A ∈ Ω is the set A # of permutations beginning with A. Of course, P(A) = P # ([A] # ).If we think in terms of the ball metaphor, the imbedding of a Black ball into Ω # (i.e., in the space of permutations) is the set consisting of all the sequences beginning with the Black ball.For our toy example, B # = {BGRW, BGWR, BRGW, BRWG, BWGR, BWRG}.
In the general case, if X ⊆ Ω then the image i(X) of X is the set of permutations beginning with an element of X, i.e.: i(X) = {ω ∈ Ω # : ω(1) ∈ X}. 24 It is easy to see that P # (i(X)) = P(X) , so the probabilities are preserved, as required.

Conjoined Conditionals and Negation
If α and β are two sentences from L( →), the rules for the interpretation of their Boolean combinations are very simple: What is [X → ¬Y] # ?It consists of permutations ω ∈ Ω # such that ω does make X → ¬Y true.So these are precisely the permutations where the first element of ω which has property X also has the property ¬Y, i.e., does not have the property Y.But this means precisely that ω does not make X → Y true -i.e., that ω |= ¬(X → Y).
Formally, we need to use the chain rule in elementary probability theory, i.e., 24 For any permutation ω, ω(1) is the first element of ω.

3
A Minimal Probability Space for Conditionals

The Conditions (I)-(IV) and PCCP
We shall prove that the conditions (I)-(IV) given by van Fraassen and PCCP are satisfied.These are the assumptions relativized to the space S # :

[(
# . [(A→C)] # is the set of permutations where the first A to appear is a C.
[(A→B)] # is the set of permutations where the first A to appear is a B. This means that # is the set of permutations where the first A to appear is both a B and a C.But this is exactly [A → (C∧B)] # .

[A ∧ (A → B)] # = [A ∧ B] #
[A ∧ (A → B)] # is the set of permutations which satisfy both sentences A and (A → B), which means that: (i) the first element of ω has property A; (ii) the first element of ω with the property A also has property B. This means exactly that they begin with an element having both properties A and B. But that is just [A∧B] # .

[A →
It is often assumed that formulas of the form A → A are tautologies, which in the case of our model means that the identity [A → A] # = Ω # should hold.This is obviously true for A ≠ ∅.Indeed, ω |=A → A if the first A to appear is an A, which is true.
The more problematic case is A = ∅, i.e., the conditional ∅ → ∅.Of course, the "what comes first" rule is vacuously true: the first element of ω which comes from ∅ comes from ∅. Indeed, the material implication x ∈ ∅⇒ φ(x) is true for any prop- erty φ.This means that we can set [∅ → Y] # = Ω # for any Y ⊆ Ω.However, we need to be careful here if we want to handle the negation of conditionals consistently.
Observe, that if we accept both: (1) the negation rule, i.e., that (X → ¬Y) ⇔¬(X → Y) (and we might get into trouble.Indeed: We have a contradiction.This is not a particular feature of the permutation model, but a general phenomenon arising from a careless treatment of negation and conditionals with impossible antecedents.We need to block the disaster.This can be done in a number of ways: (1) by restricting the negation rule; (2) by dropping the stipulation that This means that we can either: (1) claim that the equation P # (X → ¬Y) = 1-P(X → Y) holds only for X ≠ ∅; (2) consider only conditionals X → Y with X ≠ ∅.
We opt to take the second route: we do not discuss the probability of conditionals with impossible antecedents.For instance, we do not discuss the probability of conditionals like If the chosen number is biggEr than 50, then it is EvEn in the case of a die.
But we stress that if one felt the need to account for such conditionals, we could incorporate them into our framework at the cost of limiting the scope of some rules.So we could assume that [∅ → Y] # = Ω # for any Y (also for Y = ∅) and that P # (∅ → Y) = 1 for any Y.There are numerous accounts which welcome this stipulation. 25If we decided to take this approach, we would have to make certain restrictions concerning the negation rule: it would not work for ∅ → Y. 26 25 For instance, ∅ → Y has a probability of 1 in McGee's system from [19] (this follows from condition (C8), p. 504).However, we share McGee's reservation about it: "I include it not because I believe it to be the correct account of conditionals whose antecedents have probability zero, but because I do not have a correct account of such conditionals and it is technically cumbersome to have the probability function be only partially defined.I do not put any store by it, and it will not be crucial to anything that follows" [19], 506). 26Obviously, in this case the formulation of PCCP needs to be changed, as the conditional probability P(Y|∅ ) does not make sense (i.e., P # [X → Y] # = P(Y|X ) for all X such that P(X) ≠ 0.

3
A Minimal Probability Space for Conditionals

PCCP
PCCP holds in the model, i.e., [X] # .The proof is found in the Appendix.

The Truths of S #
It is an interesting prospect to investigate which sentences are true in S # and what the underlying logic is.A detailed discussion exceeds the scope of the present paper, we only list some of the sentences that are rendered true in S #27 : (i) The axioms (I)-(IV) and PCCP;

Minimality of the Model
In this section we make a useful observation: each permutation (i.e., fine-grained world) ω ∈ Ω # is uniquely characterized by a formula φ ω of the language L( →).We will use this observation to show that the permutation model is minimal within a certain class of models (i.e.models satisfying van Fraassen's conditions and a version of the Independence Principle IND).This means that the cardinality of the universe cannot be made smaller: the number of distinct worlds must be at least as big as the number of formulas φ ω , i.e. n!.
There is an intense discussion going on concerning the Independence Principle and PCCP.We do not investigate these matters here, as our primary aim is to present a technical result.However, observe that the Independence Principle is assumed for instance in the classic McGee [19] (who motivates it by a fair-bet argument), so it is in no way an exotic assumption.

Toy Example -Four Colors
Take a sample space consisting of four worlds (colors) B,G,R,W.We show that every permutation is uniquely characterized by a certain sentence in L( →).Consider the permutation ω = BGWR.It is uniquely characterized by the following conditions: This means that BGWR is uniquely characterized by the formula Alternatively, φ BGWR can be written as φ BGWR = B∧(¬B → G)∧((¬B∧¬G) → W).Every permutation ω ∈ Ω # is uniquely characterized by such a formula φ ω .There are 24 permutations which means that there are 24 formulas of this kind, each uniquely characterizing the permutation in question: φ ω ∈ L( →) is true only about ω.Obvi- ously, if ω* ≠ ω, then φ ω and φ ω* are different (and mutually inconsistent) formulas.

The General Case: Ω = {1,2,3…,n}
If 1,2,…n are worlds in the initial sample space S, then it is convenient to denote the corresponding formulas by 1,2,…n.
Of course, the formulas φ ω ∈ L( →) have a natural interpretation in the permu- tation space -but they also can be interpreted in other spaces.We will show that -under certain assumptions -their probabilities in such spaces is preserved.

3
A Minimal Probability Space for Conditionals

The Minimality of the Permutation Space
We start with the sample space consisting of n worlds.We claim that any probability space SPACE satisfying: (i) van Fraassen's condition and (ii) the appropriate (weak) version of the Independence Principle, contains at least n! worlds.This means, that within the class of such spaces, the permutation model has a minimal size.
In the general form, the Independence Principle states that: where A i , B i and C are Boolean sentences and C excludes A i , for i = 1,…,n.In this form it is accepted for instance in McGee's system from [19].
In order to exhibit the strength of the needed assumption we shall use the symbol IND(n) to denote the claim, that independence holds for conjunctions of length n i.e. for formulas . We might consider PCCP to be IND(0), as it is equivalent to the weaker claim P(¬A∧(A → B)) = P(¬A)⋅P(A → B). 29 So, the Independence Principle(s) might be viewed as a strengthening(s) of PCCP.
We shall prove two facts: 1.If φ ω and φ ω* are different formulas then they cannot both be satisfied in any world W ∈ SPACE.In other words, if W |= φ ω and W |= φ ω * , then φ ω = φ ω* .2. In SPACE satisfies the Independence Principle IND(n-3), every formula of the form φ ω is true in at least one world W ∈ SPACE.This means that: (i) each formula φ ω ∈ L( →) is true in at least one world W ∈ SPACE, and (ii) different formulas φ ω and φ ω* need different worlds W and W* to be interpreted in.As there as n! formulas of this kind, this demonstrates that there must at least n! worlds in SPACE.This, in turn, shows that the permutation model is minimal among the models satisfying IND(n-3).The proof of the theorem is simple but slightly tedious, so we present in Appendix 2.
Observe also that any subset X ⊆ Ω # is definable by the disjunction φ X = ∨{φ ω : ω ∈ X}.This means that any set of worlds in Ω # has an explicit linguistic representation in L( →). 30 So there are 2 n! non-equivalent propositions in L( →) when the number of worlds in Ω is n.This is an interesting estimate.Enriching the language with → has rather dramatic consequences as far as the complexity (and expressive power) of the . From this we have P(A → B) = P(B|A ) .A similar computation proves IND(0) from PCCP. 30 This is obviously not the case, for instance, in Stalnaker-Bernoulli spaces: the cardinality of the set of events, i.e., subsets, is 2 c -and the language in question is countable.language is concerned.If we have n possible worlds, the Boolean language L contains 2 n non-equivalent propositions (every subset is a proposition), and the conditional language L( →) contains 2 n! propositions.For n = 4 this means that there are 2 4 = 16 propositions in L and 2 4! = 2 24 propositions in L( →).

The Iteration
We have started with a pair (S, L), where S = (Ω, Σ, P) is a probability space containing n possible worlds and L is the corresponding Boolean language, and we have constructed a pair (an extension) (S # , L( →)).Here L( →) is the conditional language containing L and all simple conditionals and their Boolean combinations, and S # = (Ω # , Σ # , P # ) is a probability space which: (i) allows one to interpret the conditional language L( →); (ii) satisfies certain natural conditions; (iii) makes a natural imbedding of S into S # possible; and (iv) is minimal.
We can repeat the construction starting with the pair (S # , L( →)).We obtain a new pair ((S # ) # , L( →)( →)): now (S # ) # is the new probability space, and L( →)( →) is the language containing conditionals built from simple conditionals.It can be interpreted in (S # ) # , S # is imbedded into (S # ) # , all probabilities from S # are preserved -and so on.So if Ω consists of n worlds, then the cardinality of Ω # is n! and the cardinality of (Ω # ) # is (n!)!.By induction, we can define the following sequences: Every conditional (regardless of how nested and complicated it is) sooner or later appears in one of the languages L n .For instance, (i) all simple conditionals and their Boolean combinations appear in L 1 ; (ii) right nested conditionals A → (B → C), left nested conditionals (A → B) → C, and conditional conditionals (A → B) → (C → D) appear in L 2 ; (iii) and so on.
The construction is minimal and the imbeddings preserve the probabilities from the lower levels.Of course (I)-(IV) and PCCP also hold at higher levels.

A Short Comparison with Stalnaker Bernoulli Spaces
It is instructive to make a short comparison with Stalnaker Bernoulli spaces.To make the comparison easier, we can use a toy example featuring the space Ω = {W, G, R} consisting of three balls and the conditional ¬W → G.
1 3 A Minimal Probability Space for Conditionals

Types of Objects (i.e., the Universe of the Model)
Both models have a sequential character.In other words, the elements of the universe are sequences built from the objects in Ω.The sequences in the Stalnaker Bernoulli model are infinite.In the permutation model, they have a fixed length.This length equals the cardinality of the sample space.So, in our toy example we have: (i) the set of all arbitrary infinite sequences built from W,G,R in the Stalnaker Bernoulli space; the six-element set {GRW, GWR, RGW, RWG, WGR, WRG} in the permutation model.

Cardinality of the Spaces
Stalnaker Bernoulli spaces have the cardinality of the continuum.The permutation model is finite and has a minimal number of elements (in our toy example there are 6 permutations).

The Interpretation of Conditionals
In the Stalnaker Bernoulli space, the interpretation of ¬W → G consists of all sequences which start with a white ball followed by a green ball -and the (infinite) tail is arbitrary.An example is the sequence WWG RWR GGG WRR WGR WGR … This means that the interpretation of ¬W → G is an event which has the cardinality of the continuum.In the permutation model, the interpretation of the conditional ¬W → G is the set [¬W → G] # = {GRW, GWR, WGR}.

Definability of Elementary Events
In the Stalnaker Bernoulli spaces, no elementary event can be described by a single sentence.As the set of elementary events has the cardinality of the continuum, and the number of sentences in the language is countable, this means that some sentences have uncountable interpretations.In particular, in our toy example there is no formula from language L( →) which defines the elementary event WWG RWR GGG WRR WGR WGR ….In the permutation model, every elementary event precisely corresponds to a single proposition.For instance, the elementary event WGR is uniquely characterized by W∧(¬W → G).

The Scope of the Model
The Stalnaker Bernoulli spaces allow one to interpret all conditionals of arbitrary complexity.The permutation model allows one to interpret all simple conditionals and their Boolean combinations.However, the construction can be simply iterated so as to comprise all conditionals.

Ontological Commitments
We can say that the ontological commitments in the permutation model are minimal: the number of objects is minimal, within the class of spaces satisfying IND and we use only very elementary combinatorics and probability theory.In contrast, in the Stalnaker Bernoulli spaces you have to handle sets which have cardinality of the continuum and to use theorems concerning the existence of the product measure.31

Conclusion
Starting with the finite probability space S = (Ω, Σ, P) in which the Boolean language L is interpreted, we have constructed a new probabilistic structure S # = (Ω # , Σ # , P # ) in such a way that: • All simple conditionals and their Boolean combinations can be interpreted in S # .This means that for every ω ∈ Ω # and every sentence α ∈ L( →) it is clear whether ω |= α or not.• Apart from the semantic ingredient, S # also has the structure of a probability space, which means that the probabilities of sentences from L( →) can be computed using the probability measure P # .We make the common assumption that the probability of a sentence α is the probability of the set of worlds which make α true.• Sample space S can be naturally imbedded in S # and the probabilities of Boolean sentences are preserved.• The construction satisfies natural assumptions concerning the semantics of the conditional.• The probability measure P # is defined in such a way that PCCP holds -and this definition of P # is very intuitive if we take the structure of the initial sample space and the Boolean language into account.• The model is minimal in the class of models satisfying the appropriate variant of the independence principle.It has a cardinality of n! if the cardinality of S is n.This gives an interesting reading of on the number of propositions in the conditional language L( →): it is 2 n! (for n possible worlds).• It is possible to iterate the construction an arbitrary (finite) number of times, obtaining interpretations for all conditionals of a prescribed complexity in this way.
W X and W non-X are disjoint, so S X ∩ S non-X = ∅.This means that: Computing P # (S X ) is easy.Indeed: This is the set of permutations which begin with an element of X ∩ Y. Therefore: That was the easy part, now we need to compute P # (S non−X ). 1) ∉ X and ω |= X → Y} .So S non-X consists of permutations which satisfy two conditions: (i) they begin with an element of X c = {1,2,…,k}; (ii) they satisfy X → Y, i.e., the first element of ω which is X is also Y.
Any permutation ω ∈ W non-X begins with one of the numbers from {1,2,…,k}.We can therefore define a partition of W non-X into k disjoint sets W 1 ,…, W k , depending on the first element, i.e.: This means that: and this is a union of pairwise disjoint sets.Therefore, W j ∩ [X → Y] # , for j = 1,…k consist of the permutations ω ∈ S(n + 1) such that ω(1) = j and the first element of ω which is X, is also Y.
Consider W 1 ∩ [X → Y] # .Obviously: We want to compute P # [X → Y] # |W 1 , i.e., the probability that the first ele- ment of the permutation ω (of length n + 1) which has property X also has property Y -conditioned on W 1 , i.e., on the assumption that ω(1) = 1.
This means that we want to compute the probability that we generate a permutation from [X → Y] # , knowing that ω(1) = 1 (i.e., that it begins with 1).But that is just to say that we have to generate an appropriate tail of length n of ω, i.e., a A Minimal Probability Space for Conditionals permutation of length n satisfying the condition that the first X to appear is a Y.We have to generate this permutation using the numbers from the set {2,3,…n,n + 1} with their probabilities rescaled (as we have already "used up" object 1).The probability distribution P* on the n-element set {2,3,…n,n + 1} is: By construction: This is the right moment to make use of the induction hypothesis.We know that for any probability space S* = (Ω*, Σ*, P*) of size n, the probability that a permutation ω fulfills X → Y, i.e., that the first element of ω with property X also has property Y, is P * ( Y|X) .We apply the induction hypothesis to the tails of the permuta- tions, and see that: The same argumentation applies to W 2 ,..,W k , i.e., Finally: But this means, that: which completes the proof.
A Minimal Probability Space for Conditionals fact, we will prove a stronger claim, and show that the probability of φ 123…n within SPACE is

Toy Example
In order to present the idea of the proof, we start with the toy example taking n = 4.

IND(n-3) is an Essential Assumption
We shall show that IND is an essential assumption, i.e. without it we are able to construct a van Fraassen-like space of size less that n!This is not possible for n = 3, as PCCP to generate 6 worlds.But already for n = 4, we shall construct a space which satisfies van Fraassen's conditions but which has less than 24 elements.This is obtained by violating IND (1).
We start with S(4), i.e. with the set of permutations of length 4, but we shall eliminate some of them.Intuitively, some of the permutations will absorb other permutations -so the structure will be "less fine-grained" but still fine enough to conform to PCCP.Intuitively, PCCP is too weak to "provide insight" into the fine-grained structure of the worlds.This is the set of 24 permutations (worlds): Initially, each world has probability 1/24.Observe that within the 1-worlds we can distinguish three parts, corresponding to 1 ∧ (¬1 → 2), 1 ∧ (¬1 → 3) and 1 ∧ (¬1 → 4) .Indeed: The same observation applies to the 2-words, 3-words and 4-worlds.We shall make the necessary rearrangements within the 1-worlds and 2-worlds.Intuitively, some worlds absorb other worlds (and their probabilities), and -in parallel -another compensating absorption occurs, so that PCCP is preserved.
Let the world 1243 "absorb" the world 1234, so that: In parallel the world 2134 absorbs the world 2143, i.e.
We have produced a space with 22 worlds. 33This was possible as IND(1) is violated.
(i) B comes first in ω; (ii) G comes before R and W in ω; (iii) W comes before R in ω.These facts are expressed by (i) the simple sentence B; (ii) the conditional (G∨R∨W) → G; (iii) the conditional (R∨W) → W.