Symmetry arguments against regular probability: A reply to recent objections

A probability distribution is regular if it does not assign probability zero to any possible event. While some hold that probabilities should always be regular, three counter-arguments have been posed based on examples where, if regularity holds, then perfectly similar events must have different probabilities. Howson (2017) and Benci et al. (2018) have raised technical objections to these symmetry arguments, but we see here that their objections fail. Howson says that Williamson’s (2007) Bisomorphic^ events are not in fact isomorphic, but Howson is speaking of set-theoretic representations of events in a probability model. While those sets are not isomorphic, Williamson’s physical events are, in the relevant sense. Benci et al. claim that all three arguments rest on a conflation of different models, but they do not. They are founded on the premise that similar events should have the same probability in the same model, or in one case, on the assumption that a single rotationinvariant distribution is possible. Having failed to refute the symmetry arguments on such technical grounds, one could deny their implicit premises, which is a heavy cost, or adopt varying degrees of instrumentalism or pluralism about regularity, but that would not serve the project of accurately modelling chances.

Such a probability measure can be difficult to arrange, especially where the set of possible outcomes is infinite and all outcomes are equally likely, for then the regular, non-zero probabilities of these outcomes will normally add up to more than one. But we can avoid this problem if we allow P to take infinitesimal values. If we assign a nonzero infinitesimal probability to each outcome in our sample space, these need not add up to more than one. 2 The desideratum of regularity has been the main reason for introducing infinitesimal and hyperreal 3 probabilities.
Some think that this desire for regularity is a naïve mistake, but the arguments for it are serious enough to warrant response. (See above references. Benci et al. 2018 reviews some of the main arguments.) Williamson (2007), Parker (2012), Benci et al. (2018), and others (Bernstein and Wattenberg 1969;Barrett 2010;Pruss 2013) have proposed arguments against regularity based on the fact that, if regularity holds, certain perfectly similar events cannot have the same probability. Howson (2017) and Benci et al. (2018) have recently tried to refute those arguments. Here we will see how their refutations go wrong. 4 (Thus we will buttress the case against a general requirement of regularity. ) We will review the three symmetry arguments that Howson and Benci et al. criticize, fleshing them out in certain respects. We will then consider and rebut Howson and Benci et al.'s objections. Both Howson and Benci et al. focus on the details of formal probability models, claiming that, once Williamson's models are made explicit, his error becomes apparent. Benci et al. claim that this extends as well to my (2012; argument and their own proposal. But we will see that the technical errors alleged by Howson and Benci et al. are not present in the original symmetry arguments. 5 Those arguments are based on very general principles, which we will make more explicit here, 1 I do not claim that all of these authors are supporters of regularity, only that the particular works cited at least make suggestions in that direction. 2 For uncountable sample spaces, regularity and finite additivity already require infinitesimals, even if the probabilities are not uniform. It is easy to show that, for any function P: Ω → R + where Ω is uncountable and R + is the set of positive real numbers, there are finitely many ω 0 , ω 1 ,…, ω n ∈ Ω such that P(ω 0 ) + P(ω 1 ) + … + P(ω n ) > 1. 3 Hyperreal numbers are the elements of a field generated by real numbers and infinitesimals. 4 Easwaran (2014) defends Williamson against a different kind of objection, which we will not review here. He also argues that the expressiveness promised by regular probabilities is already provided by the nonnumerical aspects of probability models, e.g., by inclusion relations between sets of outcomes. 5 As we will see, Benci et al. in a sense misunderstand their own proposed argument against regularity, for they present it as a parallel to Williamson's, and in misconstruing Williamson's argument, they likewise misdiagnose their own. and which, while they are not above doubt, are not beholden to the technical details of any particular probability model. Finally, we will consider what stances a regularist might take given that the objections fail.  and by substitution, Thus, the possible event H(1…) has probability zero, so regularity fails. 7 6 Throughout we use 'Prob' for functions over physical events and 'P' for functions over sets that model physical events. 7 Williamson introduces a third sequence of coin tosses in order to make his point more vivid, but we need not consider it here.
If this argument is sound, it applies equally whether the values of Prob are real or hyperreal, since both number systems have the same first-order properties (those of a real closed field), including all the properties used in the argument.

Physical isomorphism
Before we review the other arguments, let us flesh out Williamson's a little. It clearly relies on the following assumption: Isomorphism Principle (IP): If two events are isomorphic (in the relevant sense), they should have the same probability. Hence, the fact that H(1…) and H(2…) have the same qualitative physical properties is, for Williamson, the reason that they should have the same probability, and presumably this inference is undergirded by a general principle like IP.
Why should one accept IP? An argument for a version of IP might run as follows: (I) The laws of physics are space-time invariant.
(II) The chance of an event is determined by the physical laws and local qualitative circumstances.
Therefore, (IP') Two events that differ at most in where and when they hypothetically occur (and perhaps in matters of bare identity but not in qualitative features) have the same chance.
What I mean by (I) is just that the laws of physics are the same in every place at every time, and they do not have any place-or time-dependent features. Whatever the laws imply about the outcome of an experiment is the same no matter where and when that experiment is conducted, other things being equal. 8 This in itself does not imply IP', because we might think that chances depend on something other than laws and qualitative circumstances. But if (II) holds as well, then IP' follows (barring any creative concept stretching). The above argument applies directly only to physical chances or propensities. But according to the Principal Principle (Lewis 1980(Lewis , 1994, our credences should generally track known chances, so we should also assign equal credence to such isomorphic events.
Thus, the regularist is in an awkward dilemma: She must either deny the standard and sensible principle that physical laws are space-time invariant, or deny that chances are determined by local circumstances and laws. Neither is inconceivable, but either is a weighty consequence, perhaps too weighty for the a priori arguments for regularity to sustain.
One might take the view that this argument from space-time invariance is irrelevant, since the stipulation that the individual tosses are independent and identically distributed (iid) already implies that H(1…) and H(2…) have the same Bstandard^probability. 9 Under the Kolmogorov axioms (including countable additivity), Prob(H(1…)) = Prob(H(2…)) = Π n ∈ N ½ = 0 (where N = {0, 1, 2,…}). However, to appeal to that result would be begging the question, for it assumes the standard axioms and number system, which regularists propose to revise. As Williamson's argument shows, the equality Prob(H(1…)) = Prob(H(2…)) fails under any alternative theory in which Prob(H(1…)) and Prob(H(2…)) are not strictly zero, since Prob(H(1…)) = ½ Prob(H(2…)). If one wishes to argue directly from iid to the equality of Prob(H(1…)) and Prob(H(2…)), one must assume either countable additivity, which regularists reject, or some other probability axiom that regularists would likely reject, e.g., that the probability of a conjunction of independent events is entirely determined by the probabilities of the conjuncts. (This is close to what Hofweber (2014) calls conjunctive local determination and does reject.) Such a strategy fails because it assumes too much, and since it fails, the appeal to space-time invariance is relevant, if it succeeds. In any case, Williamson does not argue directly from iid to Prob(H(1…)) = Prob(H(2…)), but appeals instead to the qualitative physical properties of the events.

The circle argument
Williamson's H(1…) and H(2…) are conjunctions of infinitely many coin flip outcomes, in the sense that they require infinitely many toss outcomes all to occur. (I do not wish to conflate the physical events H(1…) and H(2…) with conjunctive sentences.) Another argument against regularity (Bernstein and Wattenberg 1969;Barrett 2010;Parker 2012;Pruss 2013) involves disjunctive rather than conjunctive events.
Construct a set of points on the unit circle as follows: Let p 0 = (1, 0) in polar coordinates, i.e., the point on the circle due-right of the centre. Let p n + 1 be the point (1, n + 1) on the circle, one radian counter-clockwise from p n . Then let C 0 = {p 0 , p 1 , p 2 ,…} = {(1, n): n ∈ N}, and let C 1 = {p 1 , p 2 , p 3 ,…} = {(1, n + 1): n ∈ N}. Notice that C 1 is a rotation of C 0 by one radian, but is also a proper subset of C 0 , since C 1 does not contain p 0 . Now, let us choose a point on the circle randomly, say by throwing a dart at the interior disk and constructing a radius through the centre of the dart shaft to a point on the circle. What is the probability that this point lies in C 0 , and what is the probability that it lies in C 1 ? We can model this experiment with a probability space 〈S 1 , F, P〉, where S 1 is the unit circle and F an algebra on subsets of the circle that at least includes C 0 , C 1 , and the singleton {p 0 }. The event E C that the point chosen by our experiment lies in a given set C is modelled by that very set, i.e., Prob(E C ) = P(C), where Prob is the chance or credence of a physical occurrence and P is a function on sets that models Prob. Thus, in our model, the set C 0 represents the disjunctive event that the point chosen by the dart throw is p 0 or p 1 or p 2 or… . Now, assume 10 that P is rotationally symmetric. Then P(C 0 ) = P(C 1 ). But by finite additivity, P(C 0 ) = P({p 0 }) + P(C 1 ). Hence, P({p 0 }) = 0, contradicting regularity. And as with Williamson's argument, this holds whether P takes hyperreal values or only real values.
There are significant differences between this argument and Williamson's. Firstly, the circle experiment takes place in a finite region of space-time. It is just a single dart throw at a finitely bounded disc (or in other versions, a single spin of a spinner or a single quantum vacuum fluctuation). Thus it avoids Williamson's unrealistic hypothesis of an eternal sequence of tosses, in perfect rhythm, of a single, ever unchanging coin. And if one is tempted to dodge Williamson's argument by suggesting that space-time invariance only applies to finite experiments and not to temporally infinite sequences of events, such a dodge will not escape the circle argument.
Secondly, the circle argument does not rely on IP. 11 It simply assumes that the distribution is rotationally symmetric. However, this is only plausible if a dart throw or some other experiment really can be performed with a perfectly symmetric distribution. 12 Intuitively this ought to be possible, but to my knowledge there is no standard physical principle to guarantee it in the way that space-time invariance (along with (II)) guarantees Williamson's result. It does not help much to appeal to empirically confirmed laws that imply symmetry, for a committed regularist will claim that our empirical laws need a slight revision, so that any probabilities that they imply are adjusted by infinitesimal amounts to make them regular. Such subtle revisions are generally compatible with observed frequencies. If we could construct an example where the symmetry is due to some general principle rather than specific laws, the regularists would not have such an easy retort. Parker 2012 attempts to construct such an example involving quantum vacuum fluctuations, but the success of that example is debatable. 13 So an uncontroversially realistic and principled example is yet to be given but is far from being ruled out. 14 Furthermore, Benci et al. hold that a probability theory ought to be able to describe conceptually possible processes such as a fair lottery with infinitely many tickets, and a dart throw with a perfectly symmetric distribution seems at least as conceivable as a fair infinite lottery. 15 So if we need a probability theory that makes sense of the infinite lottery, as Benci et al. claim, then we arguably need one that also accommodates invariant continuous distributions. But the circle argument shows that such a theory cannot be regular. Benci et al. (2018) introduce their own symmetry argument against regularity, intending to refute it and thereby illustrate how the other ones go wrong. Their argument is based on a fair infinite lottery, which is also the main motivating example for their Non-Archimedean Probability (NAP) theory , 2018cf. Wenmackers 2011, Wenmackers and. Their argument runs as follows:

The urn argument
Imagine an urn containing a countably infinite collection of tickets and a mechanism to implement a fair lottery on the tickets in the urn.
In situation (1), all tickets are in the urn and we denote the probability of winning of each arbitrary single ticket in such a lottery as Prob(E 1 ), leaving open the possibility that this may be an infinitesimal.
In situation (2), one ticket is removed from the urn prior to the drawing of the winning ticket. There is one competing ticket less, so the probability of winning of each remaining ticket is Prob(E 2 ) = 1 1−Prob E 1 ð Þ Prob(E 1 ) (renormalization). Taken in isolation, however, situation (2) looks exactly as before the removal of a ticket, which is situation (1). Because of this isomorphism between situation (1) and situation (2), we find that the probability of winning of each individual ticket is equal to Prob(E 2 ) = Prob(E 1 ). … Even in a non-Archimedean [hyperreal] field, these equalities can only hold simultaneously if Prob(E 1 ) = Prob(E 2 ) = 0. (Benci et al. 2018) Thus, according to this argument, E 1 and E 2 are possible but have probability zero, so again regularity fails.
14 An anonymous reviewer suggests that none of our examples is physically realistic in any world whose laws are even remotely like our world's, and questions whether we can trust our intuitions about such examples. I am not certain that the examples are so unrealistic. The dart throw and the related vacuum fluctuation of Parker 2012 only require that exact real values are selected from a continuum in a rotation-or translation-invariant way. Quantum mechanics and indeed common sense suggest that we cannot measure such exact values, but they do not imply that no exact values exist in the world. If a quantum vacuum fluctuation does not in fact define any exact point in space and time, perhaps the exact center of some well defined lump in the Schrödinger wave function does. We need not be able to measure an exact value in order to argue that it is, or could be, determined by an invariant distribution and therefore contradicts regularity. Furthermore, if we must imagine another world in which such a distribution is possible, we do not need any further intuition about the physics of the process in order to conclude that it would violate regularity. 15 It is not obvious that a fair infinite lottery really is conceptually coherent, but Benci et al. take it to be so.
The relevant Bisomorphism^here is expressed in the stipulation that the new situation (2) Blooks exactly as before^. The qualitative physical circumstances are the same, or at least, the argument assumes that they are sufficiently alike that the probability of choosing a given ticket should be the same in situations (1) and (2). If we like, we can further stipulate that the remaining tickets in situation (2) shift so that they have exactly the same states as those in situation (1). Then, as in the coin argument, IP implies that the probabilities are the same.

A problem with the urn argument
Below we will turn to Benci et al.'s attempt to refute this argument, but let us note here a problem with it that they do not discuss. The renormalization step, according to which Prob(E 2 ) = 1 1−Prob E 1 ð Þ Prob(E 1 ), is not obviously correct. It assumes that removing a ticket increases the probability of being selected for each remaining ticket, but this need not be so. Removing a ticket changes the physical situation, at least in terms of bare identities, and for the regularist there is no general rule that says this will change the probability for any of the remaining tickets. Regularity does imply that removing a ticket increases the probability that the chosen ticket will lie in the set of all other tickets, 16 but that does not imply that the probability has changed for any particular ticket. It may be tempting to object, BHow can the probability increase for the set of remaining tickets if it does not increase for any individual ticket?^, but that objection presupposes countable additivity, or something like it, and proponents of infinitesimal probabilities are already willing to sacrifice countable additivity (e.g., Benci et al. 2013Benci et al. , 2018. One might think that the renormalization step is justified by conditionalization, as Benci et al. later seem to suggest (p. 527). Let C t be the event that a ticket t is chosen in situation (1). Since the lottery is fair, Prob(C t ) = Prob(C u ) = Prob(E 1 ) for any two tickets t and u. Now suppose that ticket t is the one removed in situation (2). If we therefore assume that Prob(E 2 ) is equal to the conditional probability Prob(C u |~C t ), the ratio formula for conditional probability gives us which is precisely the renormalization step. But this assumption that Prob(E 2 ) = Prob(C u |~C t ) is not obviously correct either. Conditional probability is commonly used to model situations where we have obtained some information about an outcome, such as the news that ticket t was not chosen. It also models cases where we adopt a policy, e.g., if ticket t is chosen, put it back and repeat the experiment. But Benci et al.'s case is neither of those. It is a case where a ticket has been physically removed, and there is no general rule about how that will affect the probabilities for the remaining tickets. So this application of conditional probability is unjustified. 16 Let T be the set of tickets and t the ticket removed. In situation (1), the probability that a particular ticket in T \ {t} is selected is 1 -Prob(E 1 ). Assuming regularity and that range(Prob) is contained in an ordered field, 1 -Prob(E 1 ) < 1. In situation (2), the probability that a ticket in T \ {t} is selected is one, which is larger.
Thus, the argument that regularity contradicts IP in this case is incomplete. However, there could conceivably be situations, perhaps specific selection mechanisms, for which the renormalization step or some similar move 17 is correct. As with the circle argument, such a situation is at least as conceivable as the fair infinite lottery itself. But regardless of whether the urn argument can be salvaged, we will see that Benci et al.'s way of countering it, as with the other arguments, is unsuccessful.

Howson's objection to Williamson
Howson claims that Williamson's argument fails due to Ba confusion about what he calls 'isomorphic events', assisted by an inadequate notation.^The key point in Howson's argument is that, in an appropriate probability model for Williamson's example, H(1…) is a singleton set, containing just one element of the sample space, while H(2…) is a pair, containing two elements of the sample space. Since a singleton is not isomorphic to a pair, the events are not isomorphic and the argument fails. Howson goes on to consider variations on Williamson's argument, but this is the main thrust of his objection.
In general, two collections are said to be isomorphic if there is a bijection between them that preserves all relevant structure. In algebra, for example, an isomorphism preserves the algebraic relations between elements. But the sets {〈1, 1, 1,… 〉} and {〈1, 1, 1,… 〉, 〈0, 1, 1,… 〉} are not isomorphic in any sense, because there is not even a bijection between them. Thus, according to Howson, Williamson's events are not isomorphic, so his argument is a non-starter.

Reply
What Williamson means by Bisomorphic events^does not concern Bevents^in the jargon of probability theory, i.e., sets in the algebra of a probability space, but physical 17 Benci et al.'s renormalization step implies that the probabilities for all remaining tickets are affected equally, i.e., multiplied by the same factor, but that is not needed. All that is needed to complete the urn argument is a case where removing a ticket multiplies the probability for some particular ticket by a factor other than one. 18 The assumption that F is an algebra generated by cylinder sets is unnecessary here. If we are willing to relinquish the possibility of translation invariance, as regularists must, we can define P on the entire power set of 2 N . But the particular domain of P does not bear on Howson's objection so long as it includes {〈1, 1, 1,… 〉} and {〈1, 1, 1,… 〉, 〈0, 1, 1,… 〉}. events, in the ordinary sense of things that happen, or things that might happen. As we saw, Williamson is concerned with Bthe physical structure of the set-up^and the Bqualitative type^of events. Moreover, he explains what he means by 'isomorphic events' in terms of a structure-preserving map, not between subsets of the sample space, but between Bthe constituent single-toss events^H (1) (I do not mean a mapping between the symbols that serve as conjuncts in a sentence, but between the physical events that form a larger event.) Furthermore, Williamson's set-up guarantees a mapping of the conjunct events that preserves qualitative physical properties and relations. He specifically stipulates that the coin tosses in his sequences use the same fair coin and that the time intervals between tosses are the same. We could go further and stipulate that all qualitative properties of the tosses and sequences of tosses are exactly the same. Then the events H(1…) and H(2…), construed not as subsets of a sample space but as physical things that might happen, are indeed isomorphic in Williamson's intended sense. On his view, P({〈1, 1, 1,… 〉}) should equal P({〈1, 1, 1,… 〉, 〈0, 1, 1,… 〉}), not because these two sets are isomorphic (which they clearly are not), but because the physical events they model are qualitatively alike. 19 Furthermore, there is an argument available that such isomorphism ought to imply equal probability, namely the argument above from (I) and (II), and this argument appeals only to the physical character of the events-in-the-ordinary-sense, not to the settheoretic structure of the sets that represent those events in a particular mathematical model. Whatever model we might adopt, the physical argument still holds sway, so far as its premises are plausible. So Howson's set-theoretic objection misses the mark by a wide margin.
To be fair, Williamson's use of 'isomorphism' to express physical similarity shoulders some blame for this misunderstanding. An isomorphism is normally a mapping between sets, not between physical things-that-could-happen. Nonetheless, the intended principle is clear: If the same experiment is conducted under the same conditions at two different times, the probabilities of the outcomes should be the same. Howson seems to have ignored this point in order to raise a technical issue that, in the end, is not relevant. 19 An anonymous reviewer objects that H(1…) and H(2…) are not qualitatively identical because H(2…) accommodates the possibility of a tail at the first flip and H(1…) does not. But as Williamson understands these events, neither accommodates a tail at Bthe first flip^. Let us label the flips of H(1…) flip 1, flip 2, etc. Then on Williamson's conception, the Bfirst flip^of H(2…) is flip 2, which must come up heads if H(2…) occurs. The reviewer's objection seems to require that H(2…) somehow incorporate the outcome of flip 1, even though flip 1 is logically and probabilistically independent of the occurrence or non-occurrence of H(2…). I think that Williamson, on the other hand, thinks of H(2…) as consisting in certain intrinsic structural properties of the physical motion of a coin beginning at time t 0 + 1, and not concerning in any way events occurring before t 0 + 1. Thus, H(2…) does not at all concern flip 1, which occurs at time t 0 . Likewise H(1…), for Williamson, concerns the intrinsic structural properties of the motion of a coin from time t 0 onward, and nothing else. Williamson's isomorphism claim is just the claim that these structural properties, those of the coin's motion from t 0 onwards in H(1…) and those of its motion from t 0 + 1 onwards in H(2…), are qualitatively identical, and this is just true by stipulation.

Events as sentences
Just as Williamson's Bisomorphic events^are not sets, they are not sentences either. 20 They are physical things-that-might-happen. Thus, as noted above, when we speak of a mapping between the conjuncts that constitute H(1…) and H(2…), this is not a mapping between the symbolic conjuncts in a sentence, but between the physical coin flip events that make up H(1…) and H(2…). Otherwise, one might be tempted to object that H(1…) and H(2…) cannot be isomorphic because the first is just the infinite But this brings us to the second point: Williamson is not concerned with a mapping between linguistic objects, but between possible physical occurrences. It is not a structural similarity between symbol strings that concerns him, but between physical events that might occur in the world. If you like, these can be understood as mathematical structures that might be instantiated by the behaviour of physical coins. They are not instantiated by letters on paper or words from a mouth. Again, this is clear from Williamson's deliberate specification of the physical conditions and his references to Bphysical structure^and Bqualitative type^. The important thing for Williamson is that, by stipulation, there is no intrinsic qualitative difference between the physical events H(1…) and H(2…). They differ in their times of occurrence, their spatiotemporal relations to each other and other events, and their bare haecceities if you like, but those are precisely the kinds of things that, on the plausible principle IP, make no difference to their probabilities.

Howson, the circle, and the urn
Howson 2017 is concerned only with the coin flip argument and a couple of variations on it, not with the circle argument or the urn. However, it is worth noting that, while Howson's critique of the coin flip argument misses the point, it does not apply at all to the other arguments. In the circle argument, the physical event that the point chosen by the dart throw lies in C 0 is represented by the set C 0 itself, and likewise for C 1 . Since C 1 is just a rotation of C 0 , the two sets are set-theoretically, topologically, and metrically isomorphic. But the circle argument does not rely on isomorphism per se. It uses the fact that the two sets are rotations of each other, and assumes explicitly that the probability distribution is rotationally symmetric. So Howson's complaint about Williamson's argument, that the sets representing the events are not isomorphic, is not true of the circle argument, nor relevant.
Nor is there a parallel of Howson's critique for the urn argument. There the event of drawing a given ticket is naturally represented by a singleton {n} ⊆ N, in both situations (1) and (2). Since any such singletons are isomorphic, the set-theoretic Bevents^in the models are indeed isomorphic and Howson's objection does not apply. Of course, the fact that two singletons are trivially isomorphic is no reason that they should be assigned the same probability, but the urn argument does not rely on such an isomorphism between subsets of the sample space. Like the coin flip argument, it turns instead on a qualitative similarity between physical situations. So here too, an objection along Howson's line is both false and irrelevant.

Benci et al.'s objection to the urn argument
Benci et al. point out that we can model the urn experiments in different ways, and while we can equate the probability of E 1 under one model with that of E 2 under another, we should not normally compare the probabilities given by two different models or interpretations. B[C]hanging the sample space mid-game,^as they put it, Bis, in general, not allowed.Ŝ pecifically, they suggest that we model situation (1) with the sample space N and a hyperreal-valued function P on the subsets of N such that P({n}) = 1/α for each n ∈ N, where α is the size of N in a non-standard measure (Bnumerosity^). 21 Benci et al. call this model A and write Prob A (E 1 ) for the probability that a given ticket is selected in situation (1) under model A. 22 According to them, we can represent situation (2) in the same model using conditional probability: The probability of selecting ticket number n given that some other ticket i has been removed is given, they claim, 23 by So, on model A, the probability that a given ticket is chosen varies from situation (1) to (2). On the other hand, they point out, we can also represent situation (2) by letting N represent the remaining tickets, after ticket i has been removed. In 21 Benci and others (e.g., Benci 1995;Benci and Di Nasso 2003;Di Nasso and Forti 2010;Benci, Bottazzi and Di Nasso 2014) have developed an alternative theory of set size called numerosity theory, where the size of an infinite set is not a Cantorian cardinal number but a hyper-integer. From this, one can derive a fair NAP distribution by assigning to each element of a sample space the hyperreal probability equal to the reciprocal of the numerosity of the sample space. 22 Remember, Prob A is not identical to P. Prob A applies to physical events, while P applies to subsets of N. If we let l A be a Blabelling^that maps physical events to sets in N, then Prob A can be understood as the composition P○l A , i.e., Prob A (E) = P(l A (E)). 23 As noted above, this is not obviously correct, since in situation (2) we are not merely conditionalizing but considering an altered set-up, and we cannot assume countable additivity here. that case, a natural model B gives the probability Prob B (E 2 ) = 1/α of choosing one of the remaining tickets in situation (2). Thus, Prob A (E 1 ) is equal to Prob B (E 2 ), but, according to Benci et al., A and B are two different models, and the fact that two events have the same probability under two different models does not imply that they must have the same probability in a single model.
Let us clarify something here: Technically, A and B are not different models. They are two names for the same mathematical model, with the same sample space N and the same assignments of values to subsets of N (or at least, Benci et al. do not indicate any difference between the assignments). However, this model is associated with two different interpretations or Blabellings^, different ways of associating physical events with sets in the model. For Bmodel A^, each natural number corresponds to one of the original tickets, while for Bmodel B^, each natural number corresponds to one of the remaining tickets, after one has been removed. So the real difference is not between two models but between two labellings. However, Benci et al.'s point is no less valid; the fact that two events have the same probability under different labellings does not imply that they must have the same probability under a single labelling.
Thus, according to Benci et al., the urn argument commits an oversight. We thought we had shown simply that Prob(E 1 ) = Prob(E 2 ), when actually we had only shown that Prob A (E 1 ) = Prob B (E 2 ), for two different labellings A and B, and this does not support the conclusion that regularity fails.

Reply
Benci et al. are quite right: The fact that two events have the same probability under two different labellings does not imply that they simply have the same probability. However, there is a further reason to think that, under any accurate model, the probability of drawing a given ticket in situations (1) and (2) should be the same. The reason is that the qualitative physical situation is exactly the same in both cases, and by our principle IP, the same event under the same qualitative circumstances should have the same probability. Benci et al. even make such an argument themselves when they write that Bsituation (2) looks exactly as before the removal of a ticket,…. Because of this isomorphism between situation (1) and situation (2), we find that the probability of winning of each individual ticket is equal^(2018, my emphasis). Yet, when they come to their reply, they ignore the premise that Bisomorphic^events should have the same probability and claim instead that the argument trades on a conflation of two different labellings. In fact, their own presentation of the argument involves no such conflation; it is clearly founded on IP, and as noted, there is an argument for IP from more basic principles. Given IP, any model in which the probability of drawing a given ticket (other than the one removed) varies from situation (1) to situation (2) is an inaccurate model. Benci et al. have tried to show that one can construct such a model, where Prob A (E 2 ) > Prob A (E 1 ), but that does nothing to refute the argument from IP that any such model is inaccurate.

Reply
There is no textual evidence that Williamson conflates two different models or labellings. In fact, there is clear evidence to the contrary: Williamson tells us why H(1…) and H(2…) should have the same probability. It is because they are (physically) isomorphic. That argument does not depend on the particular model or labelling employed. Williamson's point is that, in any accurate model of his proposed experiment, H(1…) and H(2…) will have the same probability. Thus he would insist, not that Prob A (H(1…)) = Prob B (H(2…)), but that Prob A (H(1…)) should equal Prob A (H(2…)), or else A is just not a good model. There is no reason to suppose that this is founded on any slide or conflation. It is clearly founded on the principle that physically isomorphic events should have the same probability, and again, there is a simple argument available for that principle. Thus, contrary to Benci et al.'s claim in the above passage, the contradiction can indeed be obtained in a single model, with a single sample space, if one only takes seriously Williamson's premise that physically isomorphic events have the same probability.
Later (p. 546), Benci et al. acknowledge the physical basis of Williamson's argument, writing, BWe know, one might say, that the laws of physics are timetranslation invariant.^Yet they then complain that Bit is still not easy to see why the NAP treatment of Williamson's scenario has to violate time-translation invariance.^Well, the reasons are straightforward: 1. Any regular probability model that assigns probabilities to H(1…) and H(2…) must assign a larger probability to H(2…).
It is also true that, given a NAP model that assigns a probability to H(1…), one could consider a different NAP model that assigns the same probability to H(2…), but that is irrelevant. The first model on its own must violate time invariance because it assigns a larger probability to H(2…) than H(1…), and the second model must also violate time invariance because it assigns a larger probability to H(3…) (the event that each flip after the second comes up heads) than to H(2…). Moreover, if we want to understand the relation between the probabilities of H(1…) and H(2…), we need to represent them together in one model. If that model is regular, it cannot be time invariant. 24 An anonymous reviewer objects that H(2…) is not a time translation of H(1…) because the former accommodates the possibility of a tail at the first flip and the latter does not. But as argued in note 19, H(2…) can be understood, as Williamson seems to understand it, as a physical event that occurs entirely after the first flip of H(1…) and does not involve that first flip at all.

Benci et al.'s objection to the circle argument
Benci et al. rehearse a version of the circle example, referring to Parker 2013 and others. 25 They then remark, BIt will be clear to the reader by now that our diagnosis of the argument from rotational symmetry against infinitesimal probabilities is structurally identical to our diagnosis of Williamson's argument. Hence, we do not describe it in detail here.Ŝ o let us describe it in detail. The diagnosis of Williamson's argument was that he conflates two different probability models, or more precisely, two different labellings. Presumably, then, Benci et al. would claim that the circle argument tacitly appeals to two different labellings l A and l B , such that Prob A (C 0 ) = P(l A (C 0 )) = P(l B (C 1 )) = Prob B (C 1 ). Then they will say (if the diagnosis is indeed structurally identical to that of Williamson's argument) that the circle argument tacitly switches labellings mid-game, and if we do not conflate Prob A with Prob B , there is no reason to suppose that Prob X (C 0 ) = Prob X (C 1 ) on any one labelling l X .

Reply
In reality, the circle argument stated here explicitly assumes that there is a single finitely additive (and possibly hyperreal) probability function P that assigns values to both C 0 and C 1 and which is rotationally symmetric. 26 It follows trivially that P(C 0 ) = P(C 1 ), because C 1 is a rotation of C 0 . And, as noted, the event E C that the point determined by a dart throw lies in a set C is represented by that very set C. So there is only one labelling in play, namely l(E C ) = C for each subset C of the circle. The argument involves no conflation of models or labellings. It only assumes that the distribution is rotationally symmetric, and hence, that a rotationally symmetric continuous distribution is possible. In the dart throwing implementation, this amounts to assuming that it is possible to throw a dart, or construct a device to throw a dart, in such a way as to yield a rotationally symmetric distribution. A dedicated regularist would have to deny that such a strictly symmetric distribution is possible. But that is a strong claim to make on the back of intuition, conceptual analysis, or theoretical virtues. It is at least conceptually possible that some perfectly symmetric set-up could produce a perfectly symmetric distribution. Benci et al. do not deny this; they only hint that the argument involves a conflation of two different models, and that is simply not the case. 25 The example in Parker 2013 is suggestive but does not concern probability. Rather it is used to argue that BEuclidean^theories of cardinality such as numerosity (see note 21) also violate rotation and translation invariance, and consequently lack certain theoretical virtues. Parker 2012 gives the parallel argument against regular probabilities. 26 Parker 2012 argues contrapositively from the assumption of regularity to the failure of rotation invariance, but again it is explicitly a failure of rotation invariance for a single probability function. Bernstein andWattenberg 1969, Barrett 2010, and Pruss 2013 also discuss invariance for a single probability function. Of course, Benci et al. could claim that these are all careless glosses, but there is no need for such accusations if the arguments are taken at face value.
If, as I have argued, Howson and Benci et al.'s replies fail to refute these three arguments, what more could they say in defense of regular and hyperreal probabilities?
Howson might respond by pointing to a merely instrumental role for hyperreal probabilities. He writes, [T]he object there is not so much, or at all, to regard hyperreal probabilities as on the same footing as real-valued ones but to use the nonstandard universe simply as an aid to the standard theory by translating standard problems into nonstandard ones by means of the Transfer Principle, where they are often more tractable….
Consequently, he might say, proponents of hyperreal probabilities will not be troubled by arguments from physical principles. However, this is not how philosophers typically use hyperreal probabilities. Hofweber (2014), defends hyperreal probabilities on conceptual grounds. His Minimal Constraint on probability measures is (MC) If the chance of p is 0, then not p. If the chance of p is 1, then p.
BI can't help but to judge,^he writes, Bthat (MC) is a conceptual truth about chance, given that 0 is the lowest and 1 the highest possible measure of chance. … It is a conceptual truth about chance that an event which happens has a better chance of happening than an event which is conceptually incoherent.^Lewis expresses a similar sentiment: BZero chance is no chance, and nothing with zero chance ever happens^ (1983,176). Lewis and Hofweber do not use infinitesimals to facilitate calculations, they just think that infinitesimals correctly represent the structure of chances in the real world (whether as a merely conceptual truth or a more realist metaphysical claim). Benci et al., on the other hand, champion infinitesimal probabilities in order to make better sense of what they consider to be conceptually possible scenarios, such as infinite lotteries. NAP models, they argue, have theoretical virtues over the De Finetti (1974) approach to infinite lotteries (which is essentially just to drop countable additivity) and even over the standard treatment of continuous sample spaces. To an extent, the possibility of calculation is one of their concerns, for it is one of the stated motivations for their generalized continuity axiom. But their primary motivation is not to simplify calculations. It is to find enlightening models, models that can give us a better theoretical handle on problematic hypothetical processes. So Howson's instrumentalist view of hyperreal probabilities is not in line with the philosophical literature. 27 Still, Benci et al. seem inclined to a milder pragmatism. Some of their discussion suggests a general antirealism about probability models. B [T]here is no reason to assume,^they write, Bthat there is a unique best way to model certain infinite probabilistic situations…^Thus they might counter the arguments against regularity by claiming that, even if the space-time invariance of probabilities is sometimes mandated by plausible or useful principles, the best models over all might involve an infinitesimal deviation from such invariance. Or they might just argue that it is useful to apply various models to a given process, if only to better understand the space of possible models and their virtues and limitations.
Yet, as Benci et al. themselves point out, it could be argued that, BThere is such a thing as physical chance. And it is a legitimate task of our mathematical models to track this property.^Plausibly, the chances for a given experiment have a definite structure. The outcomes in any sequence of die rolls or coin flips exhibit a distinctive and robust pattern, largely independent of the detailed circumstances or the observer's conceptions. It is one of the main goals of probability theory to accurately characterize and explain such patterns. Benci et al. respond to such a realist viewpoint as follows: But our models can only track physical chance in a mediated way. In order to describe a physical system and its behaviour, our probabilistic models have to select a sample space and label the point events (that is, establish a connection between reality and point events in the model). For finite sample spaces, the labelling does not matter; but for infinite sample spaces, different labellings can result in different probability assignments. All this induces a degree of relativity in probability values of events. (2018,542) Thus, according to Benci et al., any probability model with an infinite sample space will involve some arbitrariness, whether it is a standard Kolmogorovian model or a regular one. Their main concern in this passage is arbitrariness related to the choice of labelling and, for NAP models, the choice of an ultrafilter, but it suggests they might take a similarly noncommittal stance toward the choice between regular and space-time invariant models.
The problem with such a stance is that it appears incompatible with the goal of accurately modelling physical chances. For the kinds of experiments discussed here, a model cannot be both regular and space-time invariant. If our goal is to characterize the true structure of the chances in such experiments, we should take into account whether the chances are truly space-time invariant or regular (or neither). This leaves us little freedom to choose; either regular models are accurate or they are not, and the examples discussed here give us some reason to believe that, at least in those cases, they are not. This brings us to another possible position, namely that of a moderate, pluralistic regularist who holds that, in cases where there is a strong argument from IP against regularity, the latter might fail, but otherwise it should hold. However, this position is awkward, especially for Benci et al. Their main application of NAP is to the de Finetti lottery with an infinite number of tickets, but their own urn argument suggests that such a lottery can bring regularity into conflict with IP or other plausible symmetry assumptions (provided there are cases where something like their renormalization step applies). To hold this pluralistic regularist position would mean holding that infinite lotteries are not regular when the specific conditions that justify such a renormalization step hold, but they are generally regular otherwise. If we admit that regularity is false for certain selection mechanisms, why should we expect it to hold for others?
We can make this point more concrete. Suppose we have a lottery machine for which the renormalization step is valid, and suppose the moderate regularist admits that regularity fails for this lottery machine. Now let us add to this machine a component that detects which tickets are present in the urn. If one of the original tickets is removed, it applies a different selection mechanism for which no such renormalization formula applies. For this composite lottery machine, we cannot make Benci et al.'s urn argument. Will the moderate regularist then claim that regularity does hold for the composite machine? Surely, if the composite machine applies the same mechanism as the original machine when all the original tickets are present, then in that case it produces the same distribution as the original machine. Thus, such opportunistic regularism is generally untenable. Similarly, if we accept that there are realizations of the circle example where the distribution is fully rotation-symmetric and regularity fails, we should not expect that regularity holds whenever the distribution is not perfectly symmetric.
What this illustrates is that, if indeed we are concerned with accurately modelling the structure of objective chances, then the question of regularity turns not on theoretical virtues, but on the details of the probabilistic processes under study. If indeed there are cases where regularity does not hold, then (1) there is no sound and fully general argument for regularity, and (2) regularity is not needed to render such experiments conceptually coherent. At most, regular models boast certain theoretical virtues while lacking others, namely those of permitting invariance under various transformations. But if there are any facts about the structure of chances, the model should reflect those facts first, and desirable theoretical virtues only as accuracy permits. Of course, it may be difficult to determine what the most accurate model is in any particular case, but if we have good reason to believe that chances are not regular in certain cases, we can reasonably hypothesize that they are not regular in similar cases either.

Conclusion
We have reviewed three arguments that certain hypothetical experiments exhibit nonregular probabilities. If these arguments succeed, then regularity does not generally hold, and there is little reason to believe that it typically holds for other experiments, nor that we should demand it in our credences. Howson and Benci et al. have attempted to refute those arguments, but their refutations fail. Howson points out that Williamson's events are not in fact isomorphic, because one is a singleton while the other is a pair, but this misses the point. Howson is speaking of the abstract Bevents^of mathematical probability theory, which are sets, while Williamson is concerned with events in the ordinary sense of things that could happen. When Williamson says that his two coin flip sequences are isomorphic, he does not mean that they are subsets of a sample space that have a one-to-one correspondence, he means that they have all of the same physical qualitative properties, and this is true by hypothesis. Benci et al. claim that I, Williamson, and they themselves found their arguments on a conflation of different probability models. The symmetries between Williamson's coin flip sequences imply that they can be assigned the same probability in different models, but not that they must have the same probability in a single model, and likewise for my point sets and their own lottery draws. But none of these arguments is in fact based on such a conflation of models. Williamson's and Benci et al.'s are both founded on the principle that qualitatively identical events in qualitatively identical circumstances should have the same probability, and mine is based on the plausibility of a perfectly symmetric continuous distribution. All of us claim, not that our parallel events can be given the same probability in different models, but that the parallel events will have the same probability in any one model, if that model is accurate. This is no mere slide.
The principle underlying the coin and urn arguments, that Bisomorphic^events have the same probability, is not above dispute, but we have provided here a simple argument from more fundamental hypotheses. If (I) the laws of nature are space-time invariant, and (II) chances are determined by local qualitative circumstances and natural laws, it follows that qualitatively identical events have the same chance, and should also be assigned the same credence insofar as rational credences track chance. One who insists on regularity must therefore deny either the space-time invariance of laws or the grounding of chance in laws and qualitative circumstances.
This leaves the regularist several options, including at least the following: One may take a more or less instrumentalist view that is more concerned with the theoretical virtues of regular probabilities than with accurately modelling chances. One may hold that regularity fails in the cases discussed but is still plausible in other cases, though we have seen that this is an uncomfortable position to hold. Or, one might simply deny IP, as well as the very possibility of a symmetric continuous distribution. Hofweber (2014), at least, prefers the latter move, and denies premise (II), that chances are determined by laws and local circumstances. But if regularity requires that so-called objective chance is in reality such a contextual matter, or that the laws of physics are not in fact spacetime invariant, then the arguments for regularity should be regarded very sceptically.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.