1 Introduction

As is well-known, the expansion of the real line by the addition of infinitesimalsFootnote 1 is no longer a Leibnizean pipe-dream: halfway through the last century Abraham Robinson showed that there exists an embedding of the first-order structure of the ordered field of real numbers into an extension containing infinitesimal and infinitely large numbers, satisfying all the first-order properties of real numbers, and of which the infinitely small and infinitely large numbers form completely ordered subrings. The members of such a nonstandard extension, of which there are infinitely many, are known as hyperreals. Footnote 2

One of the more persuasive arguments for probabilistic regularity, i.e. the principle that only the impossible event should receive zero unconditional probability, involves an appeal to probability functions whose domain is a standard algebra of events but whose range is the nonstandard unit interval in a hyperreal extension. Probably the most powerful objection to regularity is the mathematical fact that in uncountable outcome spaces, for example that of a fair coin tossed infinitely often (it has the cardinality of the real numbers) in which all singletons are assigned a probability, all but a countable number must have zero probability. Once the range of a probability function consists of hyperreals, however, the objection no longer has force because all those outcomes can be assigned a positive infinitesimal probability, even the same infinitesimal probability.Footnote 3

Thus may - and often does - the up-to-date defender of regularity argue. But the defence has been challenged by Timothy Williamson (2007), who used the model of a fair coin tossed infinitely many times to argue that at least one outcome sequence must have probability 0, and hence that regularity must fail. What is most striking about Williamson’s argument is his claim that it is valid even in the context of hyperreal probabilities themselves. In what follows I will argue that it is not valid. Footnote 4 That is the programme. I will start by reviewing Williamson’s argument.

2 The argument

A coin is flipped infinitely many times. Suppose H(1 …) represents the event ‘all the outcomes are heads’, and H(2 …) ‘all the outcomes after the first are heads’ (2007, p.4). Then according to Williamson,

H(1…) and H(2…) are isomorphic events. More precisely, we can map the constituent single-toss events of H(1…) one-one onto the constituent single-toss events of H(2…) in a natural way that preserves the physical structure of the set-up just by mapping each toss to its successor. H(1…) and H(2…) are events of exactly the same qualitative type; they differ only in the inconsequential respect that H(2…) starts one second after H(1…). That H(2…) is preceded by another toss is irrelevant, given the independence of the tosses. Thus H(1…) and H(2…) should have the same probability. (2007, p.5)

But if we assume that the singleton sequences of possible outcomes are assigned the same probability, we quickly infer that that probability must be 0. For

$$ \mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right)=\mathrm{P}\left(\mathrm{H}\left(2\dots \right)\Big|\mathrm{H}(1)\right)\mathrm{P}\left(\mathrm{H}(1)\right) $$
(1)

where H(1) is the event ‘the first toss lands heads’. So by independence

$$ \mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right)=\mathrm{P}\left(\mathrm{H}\left(2\dots \right)\right)\mathrm{P}\left(\mathrm{H}(1)\right) $$

i.e.

$$ \mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right)=\mathrm{P}\left(\mathrm{H}\left(2\dots \right)\right)/2. $$

But because of the presumed isomorphism of H(1 … ) and H(2 … ), we have

$$ \mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right)=\mathrm{P}\left(\mathrm{H}\left(2\dots \right)\right), $$

whence

$$ 2\mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right)=\mathrm{P}\left(\mathrm{H}\left(1\dots \right)\right), $$

and so, noting that infinitesimals obey the same field rules as the real numbers themselves, we must have P(H(1 …)) = 0, contradicting regularity.

Williamson reinforces his claim that H(1 … ) and H(2 … ) are isomorphic (in a way that demands their being assigned the same probability) by extending the example to include a second coin, in all relevant physical respects identical to the first:

To make the point vivid, suppose that another fair coin, qualitatively identical with the first, will also be tossed infinitely many times at one second intervals, starting at the same time as the second toss of the first coin, all tosses being independent. Let H*(1…) be the event that every toss of the second coin comes up heads, and H*(2…) the event that every toss after the first of the second coin comes up heads. Then H(1…) and H*(1…) should be equiprobable, because the probability that a coin comes up heads on every toss does not depend on when one starts tossing, and there is no qualitative difference between the coins. But for the same reason H*(1…) and H(2…) should also be equiprobable. These two infinite sequences of tosses proceed in parallel, synchronically, and there is no qualitative difference between the coins; in particular, that the first coin will be tossed once before the H(2…) sequence begins is irrelevant. By transitivity, H(1…) and H(2…) should be equiprobable: [hence] Prob(H(1…)) = Prob(H(2…)) (2007, p.6)

The argument, notes Williamson, ‘is neutral between standard and non-standard probabilities. Even when infinitesimal probabilities are allowed, the nature of the case still yields the conclusion that the probability of an infinite sequence of heads is 0.’

Williamson’s argument sounds very plausible. Nevertheless I will argue that it is mistaken, and that the mistake is not one about hyperreals but a confusion about what he calls ‘isomorphic events’, assisted by an inadequate notation. That his argument is indeed wrong is anyway strongly suggested by the fact that in the reference he himself supplies to the classic paper by Bernstein and Wattenberg (cited above), it is shown that there is a uniform positive infinitesimal probability distribution over the members of the standard closed unit interval [0,1].Footnote 5 It is well known that the real numbers in this interval are representable (mod equivalent sequences) by the infinite binary sequences in {0,1}, ℕ = {1, 2, … , n, …},Footnote 6 a set which can clearly also represent all possible infinite sequences of heads and tails. It would be very strange indeed if as soon as one interprets the set of sequences in this last way the positive infinitesimal distribution over its members suddenly becomes impossible. So what is Williamson’s mistake (if there is a mistake)?

3 The mistake

There is indeed a mistake, and it is a fairly elementary error in Williamson’s purely probabilistic reasoning. To identify it clearly however a little more notation is required. In the usual parlance of probabilists, a probability space is a triple (S, F, P), where S is the outcome-set of some experiment (understood loosely), F is an algebra of subsets of S containing S and the empty set, and P is a probability function on F. I shall follow Williamson in assuming that P is hyperreal-valued. In his example of the single coin, S is the set {0,1} of all possible infinite sequences of outcomes of tossing the coin (1 for heads, 0 for tails). Here, F is generated by the cylinder sets, i.e. sets whose member-sequences are identified by a finite number of indices, and is closed under set difference, countable intersections and unions. Therefore F will also contain all the singleton sequences, and Bernstein and Wattenberg’s result implies that we can assign infinitesimal probabilities to each of these.

The stage is now set to recapitulate Williamson’s argument. His H(1 … ) is the singleton of (1, 1, 1, … , 1, …). But in this space, a fact essential to step (1) of Williamson’s derivation, H(2 … ) unlike H(1 … ) is a compound event, the pair-set whose members are the two outcome-sequences (1, 1, 1, … , 1, …) and (0, 1, 1, … , 1, …) representing the disjunction ‘a head occurred first followed by all heads or a tail occurred first followed by all heads’. Clearly, however, it is strictly nonsensical to say that the singleton {(1, 1, 1, … , 1, …)} is isomorphic to a pair (a necessary condition for isomorphism is cardinal equivalence) and so the step to P(H(1 … )) = P(H(2 … )), supposedly justified by the appeal to isomorphism, fails and with it Williamson’s argument.Footnote 7

Similar considerations apply to his addition of the second coin to his example. We can represent the two coin-tossing experiments in two probability spaces, all the elements of the second of which, following Williamson’s example of H*(1 … ), I shall label with an asterisk (and the outcomes of the tosses are also recorded with an asterisk appended).Footnote 8 Other than the labelling-difference, the two spaces are identical, and so we can agree with Williamson that P(H(1…)) = P*(H*(1…)) ‘because the probability that a coin comes up heads on every toss does not depend on when one starts tossing, and there is no qualitative difference between the coins.’ But when he continues ‘But for the same reason H*(1…) and H(2…) should also be equiprobable’ he is not correct: H(2 … ) is not the original sequence truncated at the second toss but a set containing two sequences, one the original sequence and the other the same sequence whose first member is changed to 0. Hence there is no justification for equating P(H(1 … )) and P(H(2 … )) and Williamson’s argument fails.

It might be objected that Williamson has merely made an inappropriate choice of coin-tossing model, since there is one (it is claimed) in which his argument clearly succeeds: that of the same coin being tossed from minus infinity to plus infinity. Formally, this has the outcome space W = {0,1} where ℤ is the set of integers,Footnote 9 and where 1 stands for a head and 0 for a tail. Let x = <xi> be a sequence in W and suppose T is the so-called shift transformation of W, i.e. the mapping of W into itself such that (Tx)i = xi+1. Williamson’s argument can now go through straightforwardly (claims the objection) in the following way. Consider the measurable event (set) S of sequences x where xi = 1 for i ≥ 1. The shift transformation takes S to the set S’ of T-images of members of S, where each sequence in S’ is the isomorphic image of one in S under T, and conversely since T is invertible. Hence (the objection proceeds) S’ and S are isomorphic and so, if we follow the Williamson doctrine about isomorphism, should receive the same positive infinitesimal probability p. Now we just replicate Williamson’s argument making the appropriate changes.

Unfortunately the conclusion that regularity must fail here is as unwarranted as it was in Williamson’s original argument, with the mistake again lying in the appeal to isomorphism. For S and S’ are not themselves sequences but unordered sets of sequences, and are isomorphic only in the trivial sense that any two cardinally-equal unstructured sets are isomorphic (relations can of course be defined on them – e.g. the full and empty n-ary relations for any n – relative to which the sets are isomorphic, but that is beside the point). To use that fact to justify setting their probabilities equal would however clearly be absurd. It cannot be objected that S and S' are physically indistinguishable, since they are different events, nor that they are indistinguishable with regard to their probabilistic properties, for that would simply beg the question. Rather than Williamson’s argument showing that regularity fails, which it does not, what it implicitly does reveal is that a fundamental theorem of probabilistic dynamics, that shift transformations in Bernoulli processes are measure-preserving, can fail in this hyperreal context. This will probably not worry those working in nonstandard probability theory overmuch, since the object there is not so much, or at all, to regard hyperreal probabilities as on the same footing as real-valued onesFootnote 10 but to use the nonstandard universe simply as an aid to the standard theory by translating standard problems into nonstandard ones by means of the Transfer Principle, where they are often more tractable – which is exactly how Leibniz regarded his ‘ideal elements’, the infinitesimals and their reciprocals.