1 Introduction

Probabilistic notions have been applied to mathematical objects and concepts. Probabilistic concepts have been applied in the theory of random graphs, random closed sets, random Banach spaces, random series, random fields, random vectors, etcetera. The aim of the present article is to apply a notion of probability not to any specific restricted class of mathematical objects, but to the mathematical universe as a whole. More in particular, we wish to explicate what it could mean for a property A of sets to have a probability of being true of a set y in the set theoretic universe V. Properties are identified with their extensions, so that A ranges over all proper and improper classes in V.

The aim is to develop a theory of the probability of events of the form \(A(\tau )\), where A is a class and the variable \(\tau\) is a random variable. Intuitively, then, ‘\(\Pr ( A(\tau ) )\)’ refers to the probability that the random variable \(\tau\) takes a value that has the property A.

Formally, a random variable is a function from an outcome space (also called a state space) \(\Omega\) to another space E, which is often a measurable space (Blitzstein and Hwang 2015, p. 92). The outcome space \(\Omega\) is associated with a probability function. The outcome space \(\Omega\) represents the states that the random variable can be in; the space E represents the values associated with the states that one is interested in.

A familiar simple example is the throwing of a fair coin, where the outcome space is \(\{\text {state of Heads}, \text {state of Tails}\}\), the space it maps into is \(\{0, 1 \}\), and the associated probability function is such that the probability of the coin landing Heads is the same as it landing Tails. In this situation (as in many others), the outcome space can be identified with the measurable space. This is also true for the case that we are interested in, where the state space must be large enough to have different states associated with the random variables taking different sets as value, but need not be larger. So we will take the outcome space \(\Omega\) to be V, and we will take the space E that it maps into also to be V.

Without invoking fixed sets of postulates, intuitions about probability have occasionally been used in set theory, for instance to motivate new basic principles (Freiling, 1986).Footnote 1 The article (Freiling, 1986) also provides an instance of an application of the notion of random variable to a class of mathematical entities (at the real numbers of the unit interval); so does (Scott, 1967) (random reals). In the light of all this, it is natural to wonder what we might require from probability functions associated with random variables on V.

Surely it would be unreasonable to insist on there being one uniquely correct probability function that yields the probability of a random variable taking a value in a given class of sets. At any rate, finding such a “uniquely correct” or “objective” probability function is not widely regarded as a viable research objective. On the other hand, for our functions to have any hope of meriting the label probability function, the usual rules for finitely manipulating probabilities must apply. Thus they have to satisfy the calculation rules for finitely additive probability functions. So all of the probability functions that we shall consider are finitely additive probability functions in the sense of Kolmogorov, except for his stipulation that probabilities are measured by elements of the real [0, 1] interval.

From the outset we impose three additional constraints on the class of probability functions that we are interested in:Footnote 2

  1. 1.

    Totality. The probability functions are defined on all classes.

  2. 2.

    Uniformity. All singleton events are given the same probability.

  3. 3.

    Regularity. All singleton events are given non-zero probability.

For familiar reasons these extra constraints are not compatible of infinite sample spaces if the values of probabilities lie in the real [0, 1] interval. Consider the fair lottery on \(\mathbb {N}\). By the extra constraints, there is some \(r \in ]0,1]\) such that for every ticket n, the probability of n being the winning ticket is r. But then, by Finite Additivity and the Archimedean property of \(\mathbb {R}\), there is a natural number k such that for every collection S of k tickets, the probability of the winning ticket belonging to S is greater than 1. This means that the desired probability functions will have be non-Archimedean.

As will be explained in detail in Sect. 2, unlike Kolmogorov probability functions, these non-Archimedean probability functions depend on a choice of ultrafilter on the “small” subsets of the sample space.Footnote 3 The resulting probability functions will also turn out not satisfy \(\sigma\)-additivity. Nonetheless we will see that they instead satisfy a generalised infinite additivity rule.Footnote 4

Totality means that there are no non-measurable classes. This makes the probabilities that we will consider in a sense maximally informative, which we take to be a desirable feature. Regularity means that our probability functions will be very fine-grained, since they will always distinguish between impossible and contingent events. We take this, too, to be a desirable feature.

Lastly, we turn to Uniformity. (Freiling, 1986) considers the abstract possibility of randomly throwing darts at the real number line. In this paper, we consider the even more abstract possibility of randomly throwing darts at the set theoretic universe. When considering a dart randomly thrown at \(\mathbb {R}\), it would be unnatural to consider probability functions that take the probability of the dart landing on the number \(\pi ^{-1}\) to be 0.9, say. Indeed, if the event of the randomly thrown dart landing on a given real number receives a probability at all (which it will, in the setting that we shall be considering), then the randomness assumption causes us to expect the probability of it landing on one real number to be the same as the probability of it landing on some other real number.Footnote 5 In the same vein, in our setting the Uniformity property says that the probability of one given set being randomly picked from the set theoretic universe is the same as the probability that some other set is randomly picked. In view of this, we take Uniformity to be desirable for the kinds of abstract processes that we want to model.

Uniformity is commonly seen as a symmetry property for probability functions.Footnote 6 Pruss rightly stresses, however, that Uniformity does not even come close to exhausting the content of the condition fairness and symmetry for probabilistic scenarios: “A lottery with our radically skewed probabilities that nonetheless treat[s] all individual integers as equiprobable does not intuitively appear to be fair. Thus, [even] weak translation invariance plus equiprobability of singletons does not appear to be sufficient to capture our intuitions of fairness and symmetry” (Pruss 2021, p. 8521).

In this article, we investigate additional constraints that a finitely additive, total, uniform, and regular probability function might satisfy. In particular, we will consider extra requirements that, given Totality and Regularity and Uniformity, from a pre-theoretic perspective represent possible or even natural probabilistic scenarios.Footnote 7 For instance, we may ask if such a probability function might in addition satisfy the principle that if any two sets A and B have the same cardinality, then a random variable on V will be as likely to take a value in A as in B; or one may ask whether such a probability function might in addition satisfy the principle that the probability of a random variable taking the value of an ordinal is infinitesimally close to 0. We will see that some such additional requirements are jointly satisfiable, others not.

The arguments that establish the satisfiability of combinations of such additional constraints turn on a careful selection of ultrafilters on “small” subsets of the state space. A key objective of the present article is to explore the sensitivity of properties of random variables on V on the choice of ultrafilter. We aim to chart, to some extent, the relationship between and trade-offs that come with various technical choices in the definition of random variables on the set theoretic universe. This strikes us as a worthy foundational enterprise.

The project in which we are engaging in this article is related to the work in (Benci et al., 2007), (Benci et al., 2006). The aim of these articles is to construct a theory of sizes for mathematical universes inspired by the Euclidean principle that the size of the whole is larger than the sizes of its proper parts. Now there is of course a familiar theory of size—Cantor’s theory of cardinality,—which does not satisfy this Euclidean principle. But Benci and his co-authors prefer the Euclidean theory. In the abstract of (Benci et al., 2006), they write that they maintain the Euclidean principle in exchange for giving up half of Cantor’s equinumerosity principle for sets (Benci et al 2006, p. 43).Footnote 8 So Benci and his co-authors propose their Euclidean theory of size as a rival to Cantor’s theory.

There is a close relationship between the mathematical techniques that are used in the present article and the mathematical techniques that are used in (Benci et al., 2007), (Benci et al., 2006). But ideologically, our standpoints are very dissimilar. Whereas Benci and his co-authors from (Benci et al., 2007) and (Benci et al., 2006) reject Cantor’s theory of cardinality in favour of the Euclidean principle, we fully embrace it. Nonetheless, the probability functions that will be constructed satisfy the probabilistic Euclidean principle that the probability of an event is strictly greater than the probability of each of its proper sub-events. This is in contrast with Kolmogorov probability, where it frequently happens that the probability of an event is equally great as the probability of one or more of its proper sub-events.

Moreover, what we shall mean by ‘mathematical universe’ is not the same as what is meant in (Benci et al., 2007) by the term. The authors of (Benci et al., 2007) impose mainly algebraic constraints on what counts as a mathematical universe (Benci et al 2007, Introduction). We, in contrast, take the term ‘mathematical universe’ in the set theoretical sense: the mathematical universe is the arena in which all of mathematics can in principle be carried out. Naively, you may take there to be one preferred set theoretic universe: V. But if you are uncomfortable with taking V as given, then you might want to take a mathematical universe to be a rank \(V_{\alpha }\) that constitutes a model of most or perhaps even all of the standard principles of set theory, such as some strongly inaccessible rank. Indeed, we will see that for random variables defined on any large set S, the general idea of equipping them with a probability function will be the same as that for random variables on V.

We will discuss two ways of generating non-Archimedean probability functions for random variables on V. In Sect. 2 a simple way of generating such probability functions (the finite snapshot approach) will be described. In Sect. 3 we go on to discuss how global properties of these probability functions can be made to hold by imposing constraints on the process of generating such functions (choice of ultrafilters). In Sect. 4, a theoretically more satisfying but also more complicated way of generating non-Archimedean probability functions for random variables on V is discussed (the bootstrapping method).

2 The Finite Snapshot Approach

A random variable \(\tau\) on V is a function from states to the outcome space, i.e., an element of \({^V}V\). So there are many random variables on V. Our aim is to associate a notion of probability with elements of \({^V}V\) that meet the minimal constraints (Totality, Uniformity and Regularity) that were described in Sect. 1.

In fact, we want to give precise meaning to conditional probability statements of the form

$$\begin{aligned} {\textsf{Pr}(\sigma \in A \mid \tau \in B) }, \end{aligned}$$

where \(\sigma , \tau \in {^V}V\) and \(A,B \subseteq V\). We will see that it will in fact be sufficient for our purposes to give meaning to unconditional probability statements of the form \({\textsf{Pr}(\sigma \in A )}.\) Given that random variables are elements of \({^V}V\), it is clear that the symbol ‘\(\in\)’ in ‘\({\textsf{Pr}(\sigma \in A )}\)’ does not have its usual literal meaning of membership (in a class), as we will see shortly. Informally, ‘\({\textsf{Pr}(\sigma \in A )}\)’ should be read as: ‘the probability that the random variable \(\sigma\) is in a state where it has as its value some element of A’.

Our fundamental problem amounts to giving meaning to expressions of the form \(\textsf{Pr}(\sigma \in A ).\) Such probability measures will be determined by a choice of a fine ultrafilterFootnote 9 on the collection \([V]^{<\omega }\) of finite subsets of the state space.Footnote 10

The starting point is a fine ultrafilter \(\mathcal {U}\) on \([V]^{< \omega }\). This fine ultrafilter \(\mathcal {U}\) defines a non-Archimedean field \(\mathcal {F}_{\mathcal {U}}\) in the following way.

For any two functions \(f,g: [V]^{< \omega } \rightarrow \mathbb {Q}\) we define:

Definition 1

$$\begin{aligned} f \approx _{\mathcal {U}} g \leftrightarrow \{ T \in [V]^{< \omega } : f(T) = g(T) \} \in \mathcal {U}. \end{aligned}$$

In words: two functions are equivalent under \(\mathcal {U}\) if they coincide on ultrafilter-many states.

The relation \(\approx _{\mathcal {U}}\) is an equivalence relation, so we can take equivalence classes for which we then have

$$\begin{aligned} {[}f]_{\mathcal {U}} = [g]_{\mathcal {U}} \Leftrightarrow f \approx _{\mathcal {U}} g . \end{aligned}$$

Moreover, it is again a routine exercise to verify that the \([f]_{\mathcal {U}}\)’s form a hyper-rational field \(\mathcal {F}_{\mathcal {U}}\).

Now suppose \(A \subseteq V\) and \(\theta \in {^V}V\). Then we define the function \(f_{\theta \in A}: [V]^{< \omega } \rightarrow \mathbb {Q}\) as follows:

Definition 2

For every non-empty \(T \in [V]^{< \omega } :\)

$$\begin{aligned} f_{\theta \in A}(T) \equiv \frac{\vert \{ s\in T: \theta (s) \in A \} |}{|T |} . \end{aligned}$$

In words: for every finite set of states T, \(f_{\theta \in A}(T)\) is the ratio between the number of states s in T for which \(\theta (s) \in A\) and the number of states in T. In this sense, \(f_{\theta \in A}(T)\) is the probability of \(\theta \in A\) on a finite snapshot of states, i.e., \(\textsf{Pr}(\theta \in A\mid \iota \in T)\), where \(\iota\) is the identity random variable and the probability is given by the ratio formula.

Similarly, we define the function \(f_{\theta \in A \wedge \nu \in B}\) as follows:

Definition 3

For every non-empty \(T \in [V]^{< \omega } :\)

$$\begin{aligned} f_{\theta \in A \wedge \nu \in B}(T) \equiv \frac{\vert \{ s\in T: \theta (s) \in A \text { and } \nu (s) \in B \} |}{|T |} . \end{aligned}$$

Now we are ready to define the probability of \(\theta \in A\), relative to a fine (and therefore freeFootnote 11) ultrafilter \(\mathcal {U}\) on \([V]^{< \omega }\):

Definition 4

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}} (\theta \in A) \equiv [ f_{\theta \in A} ]_{\mathcal {U}} . \end{aligned}$$

Similarly, we define \(\textsf{Pr}_{\mathcal {U}} (\theta \in A \wedge \nu \in B)\) as \([ f_{\theta \in A \wedge \nu \in B} ]_{\mathcal {U}}\). Thus we have constructed a probability function \(\textsf{Pr}_{\mathcal {U}}\) that takes its values in the hyperrational field \(\mathcal {F}_{\mathcal {U}}\), i.e., \(\mathcal {F}_{\mathcal {U}}\) contains infinitesimal elements. Such probability functions are sometimes called NAP functions (Non-Archimedean Probability functions).

Conditional probability can then be expressed in terms of unconditional probability:

Definition 5

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}} (\theta \in A \mid \nu \in B) \equiv \frac{c_{\mathcal {U}} (\theta \in A \wedge \nu \in B)}{\textsf{Pr}_{\mathcal {U}} (\nu \in B)} . \end{aligned}$$

Thus we have given a recipe for the construction of probability measures \(\textsf{Pr}_{\mathcal {U}}\) on V that is mathematically coherent. Nonetheless, since ultrafilters on \([V]^{< \omega }\) are hyperclasses (i.e., entities that contain proper classes as elements), it follows from definition 4 that the resulting probability measures are hyperclasses, too.Footnote 12 It is of course a difficult philosophical question whether classes and/or hyperclasses exist, and, if they do, what their nature is. We will not go into this question here, but merely reiterate our earlier observationFootnote 13 that those who are sceptical about classes may take V to be some strongly inaccessible rank.

3 Constraints

From Sect. 1 we know that the aim is not to arrive at a unique (correct) probability function on V. But we did insist from the outset on our probability functions satisfying three global constraints: totality, uniformity, and regularity. It will be shown that these properties are always guaranteed to hold.

There are further global conditions on probability functions on V that seem from a pre-theoretic point of view attractive, and that are not guaranteed to hold without further work. These global constraints will be explored in what follows. We will show that many of them can be forced to hold by imposing constraints on the ultrafilters from which the probability functions are generated.

3.1 Elementary Properties

The definition of \(\textsf{Pr}_{\mathcal {U}}\) is relative to an initial choice of the fine ultrafilter \(\mathcal {U}\). The properties of \(\textsf{Pr}_{\mathcal {U}}\) depend on \(\mathcal {U}\). Nonetheless, certain basic properties of \(\textsf{Pr}_{\mathcal {U}}\) can be easily seen to hold regardless of which fine ultrafilter \(\mathcal {U}\) is chosen. For instance, it is easy to see that \(\textsf{Pr}_{\mathcal {U}}\) is always a total finitely additive probability function (Benci et al 2013, Sect. 4).

Now we define the notion of a bijective random variable:

Definition 6

A random variable \(\theta\) is said to be a bijective random variable if for any set x, there is exactly one element u of the state space such that \(\theta (u) = x\).

In words: a bijective random variable is a random variable that takes every value exactly once. This simply means that bijective random variables have no built-in bias towards taking as their value any particular set.

In terms of the notion of random variable (on V), we define the notions of regularity and uniformity:

Definition 7

(regularity) A probability function \(\textsf{Pr}_{\mathcal {U}}\) is regular if for every bijective random variable \(\theta\) and for every \(x\in V, \textsf{Pr}_{\mathcal {U}}(\theta = x) >0\).

Definition 8

(uniformity) A probability function \(\textsf{Pr}_{\mathcal {U}}\) is uniform if for every bijective random variable \(\theta\) and for all \(x,y \in V:\)

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta = x) = \textsf{Pr}_{\mathcal {U}}(\theta = y) . \end{aligned}$$

Proposition 1

For every fine ultrafilter \(\mathcal {U}\):

  1. 1.

    \(\textsf{Pr}_{\mathcal {U}}\) is regular;

  2. 2.

    \(\textsf{Pr}_{\mathcal {U}}\) is uniform.

Proof

These properties are proved as propositions 2.5 and 2.6 in (Brickhill 2018, p. 525–526), respectively. \(\square\)

Here we note that it is the fineness of the ultrafilter that guarantees the Regularity and Uniformity of the resulting probability function. So our requirement of fineness on ultrafilters is motivated by our desire to obtain Regularity and Uniformity.Footnote 14

The Euclidean property is formally defined as follows:

Definition 9

(Euclidean) A probability function \(\textsf{Pr}_{\mathcal {U}}\) is Euclidean if for every bijective random variable \(\theta\) and all \(A,B \subseteq V\):

$$\begin{aligned} A \subsetneq B \Rightarrow \textsf{Pr}_{\mathcal {U}} (\theta \in A) < \textsf{Pr}_{\mathcal {U}} (\theta \in B). \end{aligned}$$

Then we have:

Proposition 2

For every fine ultrafilter \(\mathcal {U}\), the probability function \(\textsf{Pr}_{\mathcal {U}}\) is Euclidean.

Proof

By finite additivity and regularity. \(\square\)

Now we turn to infinite additivity. Countable additivity means that the probability of the union of a countable family of disjoint sets is the infinite sum of the probabilities of the elements of the family, where the notion of infinite sum is spelled out in terms of the classical notion of limit. In the present setting, the probability \(Pr_{\mathcal {U}}\) of the union of any family of disjoint sets is also the infinite sum of the probabilities of the elements of the family (Benci et al 2013, Sect. 3.4). But now the notion of infinite sum is spelled out in terms of the generalised notion of limit based on the ultrafilter \(\mathcal {U}\). More precisely, the new notion of infinite sum is defined as follows. Suppose we are given a family \(\{ q_i: i \in S \}\) of rational numbers, and \(I\subseteq S\). Then consider the function \(f: [S]^{< \omega } \rightarrow \mathbb {Q}\) given by

$$\begin{aligned} f(T) = \sum _{i \in I \cap T} q_i . \end{aligned}$$

This function can be seen as giving the value of the infinite sum on all finite parts (“snapshots”) of the index set. So we identify the infinite sum of the family \(\{ q_i: i \in I \}\) of rational numbers with the generalised limit of f according to the ultrafilter \(\mathcal {U}\):

Definition 10

$$\begin{aligned} {\sum _{i\in I}}^* q_i \equiv [f]_{\mathcal {U}}. \end{aligned}$$

Using this notion of infinite sum,Footnote 15 we can express the probability of the union of a disjoint family of sets as the sum of the probabilities of the members of that family:

Proposition 3

For any index set I, if \(A = \bigcup _{i\in I}A_i\), with \(A_i \cap A_j = \emptyset\) for all \(i,j \in I\), then for every random variable \(\tau\):

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\tau \in A) = {\sum _{i\in I}}^*\textsf{Pr}_{\mathcal {U}}(\tau \in A_i). \end{aligned}$$

In sum, \(\textsf{Pr}_{\mathcal {U}}\) has a natural infinite additivity property that is sometimes called perfect additivity.

Proposition 4

For every fine ultrafilter \(\mathcal {U}\), the probability function \(\textsf{Pr}_{\mathcal {U}}\) is perfectly additive.

Proof

This proposition is proved as proposition 8 in (Benci et al 2013, p. 132–133). \(\square\)

3.2 Symmetry Principles

The Euclidean-ness of \(\textsf{Pr}_{\mathcal {U}}\) has implications for symmetry principles. We have already seen that our probability functions satisfy Uniformity, which is is naturally regarded as a symmetry principle that holds for our probability functions. However, we will now recount how the Euclidean-ness of \(\textsf{Pr}_{\mathcal {U}}\) entails that certain other symmetry principles fail. The results below complement other results in the literature that indicate that in the presence of Uniformity, it is difficult to satisfy further symmetry principles.Footnote 16

Proposition 5

For every fine ultraflter \(\mathcal {U}\), the probability function \(Pr_{\mathcal {U}}\) is not invariant under all permutations of V.

Proof

We concentrate on \(\mathbb {N}\) as it is canonically represented in V (by means of the Zermelo ordinals, for instance). Define a permutation \(\pi\) of V as follows:

  • \(\pi (x) = x\) for \(x \in V \setminus \mathbb {N}\); Otherwise:

  • \(\pi (x) = x+2\) for x even;

  • \(\pi (1) = 0\);

  • \(\pi ( x) = x -2\) for x odd and \(>1\).

Let \(A \equiv \{0,2,4, \ldots \}\), and let \(\theta\) be a bijective random variable. Then \(\pi (A) \subsetneq A\). Therefore, by the Euclidean principle, \(\textsf{Pr}_{\mathcal {U}}(\theta \in \pi (A) ) < \textsf{Pr}_{\mathcal {U}}( \theta \in A) .\) \(\square\)

This of course entails that there are bijective random variables \(\theta , \theta '\) such that for some \(A\subseteq V\),

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}( \theta \in A ) \ne \textsf{Pr}_{\mathcal {U}}(\theta ' \in A ). \end{aligned}$$

One popular global constraint on probability measures is translation-invariance. The Lebesgue measure has this property, and Banach limits seem to occupy a privileged position in the class of generalised limits at least in part because they are translation-invariant. In our context, translation-invariance does not make obvious sense. For a random class A, it is not clear what ‘\(A + \alpha\)’ (where \(\alpha\) is a number) means. But a clear interpretation of ‘adding an ordinal number’ can of course be given if A is a collection of ordinals:

Definition 11

For A any collection of ordinals:

$$\begin{aligned} A\oplus \alpha \equiv \{ \beta : \exists \gamma \in A \text { such that } \beta = \gamma + \alpha \} . \end{aligned}$$

Then for A to be translation-invariant means that for all ordinals \(\alpha\) and for every \(\theta\),

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta \in A) = \textsf{Pr}_{\mathcal {U}}(\theta \in A\oplus \alpha ). \end{aligned}$$

However, even if we consider non-Archimedean measures (of the kind that we have been describing) on ordinals, translation-invariance conflicts with the Euclidean Property of our generalised probability functions. In particular, there is no \(\textrm{NAP}\) probability function \(\textsf{Pr}_{\mathcal {U}}\) on any infinite cardinal \(\kappa\) such that there is even one ordinal \(\alpha\) with \(0< \alpha < \kappa\) and

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta \in \kappa ) = \textsf{Pr}_{\mathcal {U}}(\theta \in \kappa \oplus \alpha ). \end{aligned}$$

The reason is simple. We have \(\kappa \oplus \alpha = \kappa \backslash \alpha \subsetneq \kappa ,\) so if we had \(\textsf{Pr}_{\mathcal {U}}(\theta \in \kappa ) = \textsf{Pr}_{\mathcal {U}}(\theta \in \kappa \oplus \alpha ),\) then we would contradict the Euclidean principle.

As this example shows, such translations are not necessarily one to one so we may not want full invariance in general. In (Benci et al 2007, section 1.3), Benci, Forti, and Di Nasso explore a restricted notion of translation-invariance of \(\textrm{NAP}\)-like measures on ordinals. We do not pursue this theme further here, but only pause to note that there are other reasonable-looking principles that are hard to satisfy. In the context of their theory of numerosities, Benci, Forti , and Di Nasso consider a principle that in the present context would take the following form:

Definition 12

(Difference Principle)

$$\begin{aligned}{} & {} \forall A, B \in V: \textsf{Pr}_{\mathcal {U}}(\theta \in A) < \textsf{Pr}_{\mathcal {U}}(\theta \in B) \Rightarrow \\{} & {} \quad \exists C \in V: \textsf{Pr}_{\mathcal {U}}(\theta \in B) = \textsf{Pr}_{\mathcal {U}}(\theta \in A) + \textsf{Pr}_{\mathcal {U}}(\theta \in C) . \end{aligned}$$

On countable sample spaces, the difference principle can be made to hold by building \(\textsf{Pr}_{\mathcal {U}}\) from a selective ultrafilter (Benci & Di Nasso, 2003).Footnote 17 But the existence of selective ultrafilters is independent of \(\text {ZFC}\).Footnote 18 As far as we know, it is an open question whether the difference principle can be consistently made to hold for \(\text {NAP}\) probability functions on uncountable sample spaces.

3.3 Probability and Cardinality

In this subsection we investigate the relation between our notion of generalised probability on the one hand, and the familiar notion of cardinality on the other hand.

3.3.1 Hume’s Principle for Probability

One might naively wonder whether the following probabilistic analogue of Hume’s Principle for cardinality can hold:

Definition 13

(Hume’s principle for probability) For all \(A,B \in V\):

$$\begin{aligned} \left| A \right| = \left| B \right| \Rightarrow \textsf{Pr}_{\mathcal {U} }(\tau \in A) = \textsf{Pr}_{\mathcal {U} }(\tau \in B). \end{aligned}$$

But the probability functions \(\textsf{Pr}_{\mathcal {U}}\) that we have been considering cannot satisfy Hume’s principle for probability, as its failure is an immediate consequence of Proposition 5: invariance under permutations and Hume’s principle for probability are mathematically equivalent. However, this was only to be expected. After all, we do not expect Kolmogorov probability (on infinite spaces) to satisfy any such principle.

3.3.2 Superregularity

The hyper-rational field \(\mathcal {F}_{\mathcal {U}}\) in which the probability functions \(\textsf{Pr}_{\mathcal {U}}\) take their values contain infinitesimal numbers —this is what makes it non-Archimedean. We will write \(\textsf{Pr}_{\mathcal {U}}(\sigma \in A)\approx 0\) if \(\textsf{Pr}_{\mathcal {U}}(\sigma \in A) < n^{-1}\) for each \(n \in \mathbb {N}\). And we will write \(\textsf{Pr}_{\mathcal {U}}(\sigma \in A) \ll \textsf{Pr}_{\mathcal {U}}(\tau \in B)\) if

$$\begin{aligned} \frac{\textsf{Pr}_{\mathcal {U}}(\sigma \in A)}{\textsf{Pr}_{\mathcal {U}}(\tau \in B)} \approx 0. \end{aligned}$$

We have seen that \(\textsf{Pr}_{\mathcal {U}}\) cannot satisfy Hume’s principle for probability. But, at least at first sight, it seems that it would be reasonable to ask for:

$$\begin{aligned} \left| A \right|< \left| B \right| \Rightarrow \textsf{Pr}_{\mathcal {U}}(\delta \in A) < \textsf{Pr}_{\mathcal {U}}(\delta \in B) . \end{aligned}$$

Indeed, if in addition \(\left| B \right| \ge \omega\), then we might even seek to demand

$$\begin{aligned} \left| A \right| < \left| B \right| \Rightarrow \textsf{Pr}_{\mathcal {U}}(\sigma \in A) \ll \textsf{Pr}_{\mathcal {U}}(\sigma \in B). \end{aligned}$$

Further, this may be expected to hold if B is a proper class but A is a set. The result is a size constraint which is a strengthening of the requirement of regularity:

Definition 14

(Superregularity)

$$\begin{aligned} \omega \le \left| A \right| < \left| B \right| \le \left| V \right| \Rightarrow \textsf{Pr}_{\mathcal {U}}(\theta \in A) \ll \textsf{Pr}_{\mathcal {U}}(\theta \in B) . \end{aligned}$$

Note that if A is finite and B is infinite then the consequent holds automatically.

By a suitable restriction on admissible ultrafilters \(\mathcal {U}\), superregularity can indeed be made to hold:

Theorem 1

There are fine ultrafilters \(\mathcal {U}\) such that \(\textsf{Pr}_{\mathcal {U}}\) is superregular.

Proof

If \(A,B\in V\) such that \(\omega \le \left| A \right| < \left| B \right|\) are given, then we have \(\textsf{Pr}_{\mathcal {U}}(\theta \in A) \ll \textsf{Pr}_{\mathcal {U}}(\theta \in B)\) if and only if for each \(n \in \mathbb {N}\),

$$\begin{aligned} \{ D \in [V]^{< \omega } : \frac{\textsf{Pr}(\theta \in A\mid \theta \in D)}{\textsf{Pr}(\theta \in B \mid \theta \in D)} \le n^{-1} \} \in \mathcal {U}. \end{aligned}$$

The aim is to build an ultrafilter \(\mathcal {U}\) for which this holds.

For any \(n \in \mathbb {N}\), define

$$\begin{aligned} C^n_{AB} \equiv \{ D \in [V]^{< \omega } : \frac{\textsf{Pr}(\theta \in A\mid \theta \in D)}{\textsf{Pr}(\theta \in B \mid \theta \in D)} \le n^{-1} \}. \end{aligned}$$

Moreover, let

$$\begin{aligned} A_x \equiv \{ D \in [V]^{< \omega } : x \in D \} . \end{aligned}$$

Define also

$$\begin{aligned} \mathcal {F} \equiv \{ C^n_{AB} : n\in \mathbb {N}, \left| A \right| < \left| B \right| \} \cup \{A_x : x \in V \}. \end{aligned}$$

We want to prove that \(\mathcal {F}\) has the finite intersection property. Therefore take any \(x_1,\ldots ,x_k \in V\), and any \(\langle A_1,B_1, n_1 \rangle , \ldots , \langle A_l,B_l,n_l \rangle\) such that \(\left| A_j \right| < \left| B_j \right|\) and \(n_j \in \mathbb {N}\) for \(j \le l.\) Assume for the construction that \(|A_1|\le |A_2|\le \dots \le |A_l|\). For every finite D, if \(\{ x_1,\ldots . x_k \} \subseteq D,\) then \(D \in \bigcap _{i\le k} A_{x_i} .\) So setting \(n=max\{n_j:j<l\}\) we will extend \(\{ x_1,\ldots . x_k \}\) to a set in \(C^n_{A_jB_j}\) , and hence \(C^{n_j}_{A_jB_j}\) , for each \(j \le l\). Set \(F_0= \{ x_1,\ldots . x_k \}\) and \(a_0=|F_0\cap A_1|\). As \(B_1\) is infinite and of larger cardinality than \(A_1\) we add \(n \cdot a_0\) elements of \(B_1 \setminus A_1\) to \(F_0\), yielding a finite set \(F_1\). Now set \(a_1=|F_1\cap A_2|\), and add \(n \cdot a_1\) elements of \(B _2\setminus (A_1\cup A_2)\) to \(F_1\) to give \(F_2\). Note we can find these elements of \(B_2\) as \(|B_2|>|A_2|\ge |A_1|\). Continuing in this manner, set \(F=F_l\). Then we have ensured that for all \(j\le l\)

$$\begin{aligned} \frac{\textsf{Pr}(\theta \in A_j\mid \theta \in F)}{\textsf{Pr}(\theta \in B_j \mid \theta \in F)} \le n^{-1}, \end{aligned}$$

and so we have \(F \in C^n_{A_jB_j},\) and since \(D \subseteq F,\) we also have \(F \in \bigcap _{i\le k} A_{x_i} .\)

So \(\mathcal {F}\) indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter \(\mathcal {U}\). By design, then, the resulting probability function \(\textsf{Pr}_{\mathcal {U}}\) is super-regular. \(\square\)

Once again, Hume’s Principle for probability cannot hold for the notion of probability that we are investigating. But this leaves open the question whether the converse of Hume’s Principle for probability can be made to hold. This is called Cantor’s Principle in (Benci et al., 2007), where the authors investigate it in the context of their Euclidean theory of size:

Definition 15

(Cantor’s Principle)

$$\begin{aligned} \text { If } \textsf{Pr}_{\mathcal {U}}(\theta \in A) = \textsf{Pr}_{\mathcal {U}}(\theta \in B) \text {, then } \left| A \right| = \left| B \right| . \end{aligned}$$

Benci, Forti, and Di Nasso prove that ‘Cantor’s Principle’ can be made to hold (Benci et al 2007, Sect. 3.2). It is also clear that Cantor’s Principle (for AB such that \(\left| A \right| , \left| B \right| \ge \omega\)) follows from super-regularity.

3.3.3 The Power Set Principle

The question whether

$$\begin{aligned} \forall A, B\in V: \vert A |< |B |\Rightarrow \vert \mathcal {P}(A) |< |\mathcal {P}(B) |\end{aligned}$$

is true, is independent of the axioms of set theory. (Of course the principle is true if the Generalised Continuum Hypothesis holds.) Like the cardinality operator, our generalised probability functions are measures of some kind. One might wonder what should follow from \(\textsf{Pr}_{\mathcal {U} }(\theta \in A) < \textsf{Pr}_{\mathcal {U}}( \theta \in B ).\) In particular, given that \(\textsf{Pr}_{\mathcal {U} }\) is intended to be a fine-grained quantitative possibility measure, perhaps probability should be expected to co-vary with the power set operation in some fairly direct manner. In other words, it is natural to ask if the following principle can be made to hold:

Definition 16

(Power Set Condition)

$$\begin{aligned} \forall A,B \in V: \textsf{Pr}_{\mathcal {U} } (\theta \in A)< \textsf{Pr}_{\mathcal {U} } (\theta \in B ) \Leftrightarrow \textsf{Pr}_{\mathcal {U} } (\theta \in \mathcal {P}(A)) < \textsf{Pr}_{\mathcal {U} }(\theta \in \mathcal {P}(B)). \end{aligned}$$

It turns out that the power set condition can indeed be satisfied:

Theorem 2

There are fine ultrafilters \(\mathcal {U}\) such that \(\textsf{Pr}_{\mathcal {U}}\) satisfies the power set condition.

The argument for this is somewhat more involved.

We aim to prove Theorem 2 by building the probability function up from an ultrafilter \(\mathcal {U}\) which is based on a pre-filter \(\mathcal {C}\subseteq \mathcal {P}([V]^{<\omega })\).Footnote 19

The class \(\mathcal {C}\) is built up in stages, and in such a way that it eventually witnesses the truth of the power set condition for all \(A,B \in V\).

Stage 0

The class \(\mathcal {C}_0\) consists of all

$$\begin{aligned} A_x \equiv \{ a\in [V]^{<\omega } : x \in a\}, \end{aligned}$$

for \(x \in V\). This is to ensure that the ultrafilter that will be built from \(\mathcal {C}\) is fine. We know that \(\mathcal {C}_0\) has the finite intersection property.

Limit stages

For limit stages \(\lambda\), we simply set \(\mathcal {C}_{\lambda } \equiv \bigcup _{\beta < \lambda }\mathcal {C}_{\beta }\).

Successor stages

Given fineness, we may, and will, ignore the elements of \(V_{\omega }\). At stage \(\alpha > \omega\), where \(\alpha\) is a successor ordinal, we consider the sets of \(V_{\alpha } \backslash V_{\alpha - 1}\) and ensure that the power set condition eventually holds for all these sets and their power sets, by adding families of finite sets to \(\mathcal {C}_{\alpha - 1}\) in such a way that the finite intersection property is preserved.

As an illustrative and indeed representative example we do the case where \(\alpha = \omega + 1\).

Let there be given an enumeration \(\{ A_1, B_1 \},\hdots , \{ A_{\beta }, B_{\beta } \}, \hdots\) of the pairs of elements of \(V_{\omega + 1} \backslash V_{\omega }\).

For the induction, we assume that, by having added appropriate sets of finite sets to \(\mathcal {C}_0\), the power set condition holds for \(\{ A_1, B_1 \},\hdots , \{ A_{\beta }, B_{\beta } \}\) and their power sets, and that in the process the finite intersection property has been preserved. The aim is now to extend this so that it also holds for \(\{ A_{\beta + 1}, B_{\beta + 1} \}\). In other words, we have constructed \(\mathcal {C}_1^{\beta }\), and we want to obtain \(\mathcal {C}_1^{\beta + 1}\), where \(\mathcal {C}_1^0 \equiv \mathcal {C}_0\).

Definition 17

$$\begin{aligned} C_{A<B} \equiv \left\{ D \in [V]^{< \omega }: \frac{\left| A \cap D \right| }{\left| D \right| } < \frac{\left| B \cap D \right| }{\left| D \right| } \right\} . \end{aligned}$$

Definition 18

$$\begin{aligned} C_{A\ge B} \equiv \left\{ D \in [V]^{< \omega }: \frac{\left| A \cap D \right| }{\left| D \right| } \ge \frac{\left| B \cap D \right| }{\left| D \right| } \right\} . \end{aligned}$$

Claim

Either \(\mathcal {C}_1^{\beta } \cup \{ C_{A_{\beta } < B_{\beta }} \}\) has the finite intersection property, or \(\mathcal {C}_1^{\beta } \cup \{ C_{A_{\beta } \ge B_{\beta }} \}\) has the finite intersection property (or both).

Proof

Suppose not. Then there is a finite intersection F of elements of \(\mathcal {C}_1^{\beta }\) such that \(F \cap C_{A_{\beta } < A_{\beta }} = \emptyset\), and there is a finite intersection \(F'\) of elements of \(\mathcal {C}_1^{\beta }\) such that \(F' \cap C_{A_{\beta } \ge B_{\beta }} = \emptyset\). But then \(( F \cap F') \cap C_{A_{\beta } < B_{\beta }} = \emptyset\) and \((F \cap F') \cap C_{A_{\beta } \ge B_{\beta }} = \emptyset\). But \(C_{A_{\beta }< B_{\beta }} \cup C_{A_{\beta } \ge B_{\beta }} = [V]^{< \omega }.\) So then \((F \cap F') = \emptyset\). But this contradicts the inductive assumption that \(\mathcal {C}_1^{\beta }\) has the finite intersection property.

Thus define \(\mathcal {C}^{\beta +1}_1\) to be \(\mathcal {C}_1^{\beta } \cup \{ C_{A_{\beta } < B_{\beta }} \}\) if this has the finite intersection property, or \(\mathcal {C}_1^{\beta } \cup \{ C_{A_{\beta } \ge B_{\beta }} \}\) otherwise, and by the claim, \(\mathcal {C}^{\beta +1}_1\) has the finite intersection property. Now setting \(\mathcal {C}^-_1 \equiv \bigcup _{\beta }\mathcal {C}^{\beta }_1,\) we may conclude that \(\mathcal {C}^-_1\) has the finite intersection property.

At this point we must extend \(\mathcal {C}^-_1\) by adding to \(\mathcal {C}^-_1\):

  • every set of the form \(C_{\mathcal {P}(A)<\mathcal {P}(B)}\) such that \(C_{A<B} \in \mathcal {C}^-_1\);

  • every set of the form \(C_{\mathcal {P}(A) \ge \mathcal {P}(B)}\) such that \(C_{A\ge B} \in \mathcal {C}^-_1\).

Call the resulting set \(\mathcal {C}_1\). Our aim is to prove that \(\mathcal {C}_1\) has the finite intersection property.

Consider an arbitrary non-empty finite family \(\mathcal {F} \subseteq \mathcal {C}_1\). Without loss of generality we may assume that the ‘judgements’ in \(\mathcal {F}\) of the form \(C_{\mathcal {P}(A)<\mathcal {P}(B)}\) or \(C_{\mathcal {P}(A) \ge \mathcal {P}(B)}\), taken together, describe a finite total pre-ordering relation R on some set \(\{ \mathcal {P}(A_1),\hdots , \mathcal {P}(A_k) \}\). Further, we may also assume that for sets A and B from \(V_{\omega + 1} \backslash V_{\omega }\), \(C_{\mathcal {P}(A)<\mathcal {P}(B)}\in \mathcal {F}\) if and only if \(C_{A<B}\in \mathcal {F}\), and \(C_{\mathcal {P}(A) \ge \mathcal {P}(B)}\) iff \(C_{A\ge B}\in \mathcal {F}\). Thus \(\mathcal {F}\) contains witnesses for all the relevant judgements we may be interested in.

Let \(\mathcal {F}^- = \mathcal {F} \cap \mathcal {C}^-_1\), so \(\mathcal {F}^-\) consists only of judgements about sets in \(V_{\omega + 1} \backslash V_{\omega }\). Then we know from the foregoing that \(\bigcap \mathcal {F}^- \ne \emptyset\). So take some \(F^- \in \bigcap \mathcal {F}^-\). Our plan is inductively to extend \(F^-\), using the pre-order R, to a finite set \(F \in \bigcap \mathcal {F}\).

We will add to \(F^-\) elements that ensure that the constraints of R are satisfied. Moreover, by choosing the elements to be added to \(F^-\) from \(V_{\omega + 1} \backslash V_{\omega }\),Footnote 20 we ensure that the constraints imposed by \(\mathcal {F}^-\) remain satisfied. As a result, F will satisfy all constraints from \(\mathcal {F}\), so \(\bigcap \mathcal {F}\ne \emptyset\) and hence \(\mathcal {C}_1\) has the finite intersection property.

As an example, suppose that R says that

$$\begin{aligned} \left| \mathcal {P}(A_1) \right|< \left| \mathcal {P}(A_2) \right| < \left| \mathcal {P}(A_3) \right| = \left| \mathcal {P} (A_4) \right| . \end{aligned}$$
  1. (1)

    We start by ensuring that \(\left| \mathcal {P}(A_1)\right| < \left| \mathcal {P}(A_2) \right|\) is satisfied.

    Suppose that \(F^-\) already contains n elements of \(\mathcal {P}(A_1)\). Since \(C_{A_1 < A_2} \in \mathcal {F}\), there must be an element \(x^-\in A_2 \backslash A_1\). This implies that there are infinitely many infinite sets x in \(\mathcal {P}(A_2) \backslash \mathcal {P}(A_1)\) such that \(x^- \in x\): we add \(n+1\) such elements to \(F^-\), and call the resulting finite set \(F^-_1\).

  2. (2)

    We proceed in similar fashion to ensure that \(\left| \mathcal {P}(A_2) \right| < \left| \mathcal {P}(A_3) \right|\) is satisfied:

    Suppose that \(F^-_1\) already contains m elements from \(\mathcal {P}(A_2)\), observing that it may be the case that \(m> n+1\), for there may already be a finite number of elements of \(\mathcal {P}(A_2)\) in \(F^-\). Since \(C_{A_2 < A_3} \in \mathcal {F}\), there must be an element \(y^-_1\in A_3 \backslash A_2\), and since \(C_{A_1 < A_3} \in \mathcal {F}\), there must be an element \(y^-_2\in A_3 \backslash A_1\). So there are infinitely many infinite sets y in \(\mathcal {P}(A_3)\) such that \(y^-_1, y^-_2 \in y\): add \(m+1\) such elements to \(F^-_1\), and call the resulting set \(F^-_2\).

  3. (3)

    Now suppose that there are \(m_1\) elements of \(\mathcal {P}(A_3)\) in \(F^-_2\), and \(m_2\) elements of \(\mathcal {P}(A_4)\) in \(F^-_2\). Moreover, suppose that \(m_2 < m_1\). (The case where \(m_1 < m_2\) is similar.) Since \(C_{A_3 \ge A_4},C_{A_4 \ge A_3}\in \mathcal {F}\), but also \(A_3 \ne A_4\), there must be some \(x_1 \in A_3 \backslash A_4\) and some \(x_2 \in A_4 \backslash A_3\). Moreover, since \(C_{A_1< A_4}, C_{A_2 < A_4} \in \mathcal {F}\), there are elements \(x_3 \in A_4 \backslash A_1, x_4 \in A_4 \backslash A_2\). So \(\mathcal {P}(A_4)\) contains infinitely many infinite sets x such that \(\{ x_2,x_3,x_4 \}\subset x\). Similarly, \(\mathcal {P}(A_3)\) contains infinitely many infinite sets x that are outside \(\mathcal {P}(A_1), \mathcal {P}(A_2), \mathcal {P}(A_4)\). So we add a sufficient number of such elements to \(F^-_2\) so that there are an equal number p of “witnesses” for \(\mathcal {P}(A_3)\) as for \(\mathcal {P}(A_4)\) but where p is larger than the number of witnesses for \(\mathcal {P}(A_2)\). Call the resulting set \(F^-_3\).

  4. (4)

    To conclude, we set \(F \equiv F_3^-\). It is clear that \(F \in \bigcap \mathcal {F}\).

    This procedure of extending \(F^-\) easily generalises to any finite total pre-ordering on \(\{ \mathcal {P}(A_1),\hdots , \mathcal {P}(A_k) \}\). Thus we have shown that \(\mathcal {C}_1\) has the finite intersection property.

This procedure for extending \(\mathcal {C}_0\) to \(\mathcal {C}_1\) while preserving the finite intersection property also works for larger successor ordinals: at level \(V_{\alpha +1}\) (stage \(\beta +1\) with \(\alpha =\omega +\beta\)) we can extend the corresponding \(F^-\) using subsets of rank \(\alpha\). As we have said above, at limit stages we can simply take unions. Ultimately we set \(\mathcal {C} \equiv \bigcup _{\alpha \in On}\mathcal {C}_{\alpha }\).

The class \(\mathcal {C}\) will then have the finite intersection property, so it can be extended to a filter and then to an ultrafilter \(\mathcal {U}\). The probability function based on \(\mathcal {U}\) will make the power set condition true for all \(A,B \in V\), and this concludes the proof of theorem 2. \(\square\)

With a minimal amount of extra work, our proof can be seen to show something slightly stronger: for all AB with \(\left| A \right| , \left| B \right| \ge \omega\), we have

$$\begin{aligned} \textsf{Pr}_{\mathcal {U} } (\theta \in A) < \textsf{Pr}_{\mathcal {U} } (\theta \in B ) \Leftrightarrow \textsf{Pr}_{\mathcal {U} } (\theta \in \mathcal {P}(A)) \ll \textsf{Pr}_{\mathcal {U} }(\theta \in \mathcal {P}(B)) . \end{aligned}$$

The reason is that in enlarging the set \(F^-\) we always have infinitely many elements to choose from.

For any probability measure \(\textsf{Pr}_{\mathcal {U}}\) that satisfies power set condition we also have that \(\forall A,B \in V, \forall n\in \omega\):

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}} (\theta \in A)< \textsf{Pr}_{\mathcal {U}}(\theta \in B ) \Leftrightarrow \textsf{Pr}_{\mathcal {U}}(\theta \in \mathcal {P}^{n}(A)) < \textsf{Pr}_{\mathcal {U}}(\theta \in \mathcal {P}^{n}(B)) \end{aligned}$$

where \(\mathcal {P}^{n}(A)=\mathcal {P}(\mathcal {P}(\dots \mathcal {P}(A)\dots ))\). An easy argument shows this cannot extend to infinite applications of the power set operation: if \(B= \mathcal {P}(A)\) then \(\mathcal {P}^{\omega }(B)= \mathcal {P}^{\omega }(A)\).

One might wonder whether the motivations behind the power set condition should not also support imposing the following restricted power set condition on \(\textsf{Pr}_{\mathcal {U}}\):Footnote 21

Question 1

Are there probability measures such that

$$\begin{aligned} \forall A,B \in V: \textsf{Pr}_{\mathcal {U}}(\theta \in A)< \textsf{Pr}_{\mathcal {U}}(\theta \in B ) \Leftrightarrow \textsf{Pr}_{\mathcal {U}}(\theta \in [A]^{< \omega })<\textsf{Pr}_{\mathcal {U}}(\theta \in [B]^{< \omega })? \end{aligned}$$

3.4 The Ordinals

For \(\alpha \ge \omega ,\) in each level \(V_{\alpha + 1} \setminus V_{\alpha }\) of the iterative hierarchy one finds only one ordinal, but infinitely many sets that are not ordinals. This might lead one to believe that a probability function on V should satisfy

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{On}) \approx 0 , \end{aligned}$$

where ‘On’ is the class of ordinals.

Just as it seems reasonable to require that the probability of choosing an even natural number from the set of natural numbers must be equal to or infinitesimally close to \(\frac{1}{2}\) (see Wenmackers et al 2013, section 6.2), it seems reasonable to require that

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Even} \mid \theta \in \textrm{On}) \approx \frac{1}{2} , \end{aligned}$$

where ‘Even’ is the class of even ordinals, which is defined in the obvious way.

Moreover, between any two limit ordinals there are infinitely many successor ordinals, so one might expect

$$\begin{aligned} \textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Lim} \mid \theta \in \textrm{On}) \approx 0 , \end{aligned}$$

where ‘Lim’ is the class of limit ordinals.

We will sketch how probability functions can be constructed that meet these expectations. Indeed, we will see that there are probability functions that meet these ‘ordinal expectations’ and in addition meet the size constraint of super-regularity.

Theorem 3

There is a super-regular probability function \(Pr_{\mathcal {U}}\) such that:

  1. 1.

    \(\textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{On}) \approx 0;\)

  2. 2.

    \(\textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Even} \mid \theta \in \textrm{On}) \approx 2^{-1} ;\)

  3. 3.

    \(\textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Lim} \mid \theta \in \textrm{On}) \approx 0.\)

Proof

As before, the aim is to choose wisely the ultrafilter \(\mathcal {U}\) on which \(\textsf{Pr}_{\mathcal {U}}\) is based. We want \(\mathcal {U}\) to be such that for all \(k,l,m \in \mathbb {N}\):

  • \(\frac{\textsf{Pr}_{\mathcal {U}}(\theta \in A)}{\textsf{Pr}_{\mathcal {U}}(\theta \in B)} \le k^{-1}\) if \(\omega \le \left| A \right| < \left| B \right| ;\)

  • \(\textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Even} \mid \theta \in \textrm{On}) - \textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Odd} \mid \theta \in \textrm{On}) \le l^{-1}\) and \(\textsf{Pr}_{\mathcal {U}}(\theta \in \textrm{Lim} \mid \theta \in \textrm{On}) \le l^{-1} ;\)

  • \(\textsf{Pr}_{\mathcal {U}}(\theta \in On) \le m^{-1} .\)

Now we define:

  • \(A_x \equiv \{ D \in [V]^{< \omega } : x \in D \} ;\)

  • \(C^k_{AB} \equiv \{ D \in [V]^{< \omega } : \frac{\textsf{Pr}[A\mid D]}{\textsf{Pr}[B \mid D]} \le k^{-1} \} ;\)

  • \(I^l \equiv \{ D \in [V]^{< \omega } : \forall \alpha \in D\cap On \exists \beta ( \alpha \in [ \beta , \beta + l ] \subseteq D ) \} ;\)

  • \(W^m \equiv \{ D \in [V]^{< \omega } : \textsf{Pr}[On \mid D] \le m^{-1} \} .\)

And now we set:

$$\begin{aligned} \mathcal {F}_0 \equiv \{A_x, C^k_{AB}, I^l, W^m: x \in V, k,l,m \in \mathbb {N} \text { and } \omega \le \left| A \right| < \left| B \right| \} \end{aligned}$$

The inclusions of the sets \(A_x\) in \(\mathcal {F}_0\) will ensure the resulting filter is fine, the inclusion of the sets \(C^k_{AB}\) ensures super-regularity, the inclusion of the sets \(W^m\) ensures property 1. of the theorem holds. The inclusion of the sets \(I^l\) will give us property 3. as the ratio of limit ordinals to sucessor ordinals in any element of \(I^l\) is less than or equal to \(l^{-1}\), but these sets also give us property 2. as any \(D\in I^l\) can have at most one extra odd or even ordinal for every l elements it contains, so the ratio of odd to even ordinals must tend towards 1.

Claim: \(\mathcal {F}_0\) has the finite intersection property.

Take any finite \(F\subset \mathcal {F}_0\). Let some \(x_1,\ldots , x_n\) be the indices corresponding to the \(A_x\) type elements of F . Now \(\bigcap _{i\le n} I^{l_i}=I^{l_{max}}\) where \(l_{max}={max\{l_i:i<n\}}\), and similarly for \(\bigcap _{i\le n} W^{m_i}\), so as before in Theorem 1, it suffices to consider the maximum values of klm represented as indices of, respectively, the C, I, and W type sets in F.

  1. (1)

    \(A \in \bigcap _{i\le n}A_{x_i} \Leftrightarrow \{ x_1,\ldots , x_n\} \subseteq A .\) So we start with the finite set \(A_0 \equiv \{ x_1,\ldots , x_n \} ,\) and will extend it.

  2. (2)

    Again we concentrate on one pair \(\langle A, B\rangle\) such that \(\omega \le \left| A \right| < \left| B \right|\); we leave out further cases as they are similar. There are arbitrarily large finite subsets \(C \subseteq B\) that are l-isolated from elements of A, meaning that each ordinal in C is more than l ordinals removed from any ordinal in A. We choose any such \(C \subseteq B\) that is of size at least \(k \cdot n\), and we set \(A_1 \equiv A_0 \cup C\).

  3. (3)

    Now we extend \(A_1\) to ensure that all ordinal intervals are of length \(\ge l\): for each \(\alpha \in A_1\), we add \(\alpha +1,\ldots , \alpha +l\). Call the resulting finite collection \(A_2\). Note that by our choice of l-isolated elements in (2), for any \(\alpha \in A_1\setminus A_0\), none of \(\alpha +1,\ldots , \alpha +l\) are elements of A, and thus the ration of elements in A to elements in B remains below 1 : k.

  4. (4)

    Let \(\left| A_2 \right| = j\). Then we add \(j \cdot m\) elements of \(V \setminus ( A \cup B \cup On )\) to \(A_2\) and call the resulting set \(A_3\).

It is now routine to verify that \(A_3 \in \bigcap _{i\le n}A_{x_i} \cap C^k_{AB} \cap I^l \cap W^m\). The case including further sets \(C^k_{A'B'}\) is similar, thus the claim is verified. So \(\mathcal {F}_0\) indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter \(\mathcal {U}\). By design, the resulting probability function \(\textsf{Pr}_{\mathcal {U}}\) has the required properties. \(\square\)

4 The bootstrapping approach

The probability \(\textsf{Pr}_{\mathcal {U}}(\theta \in A)\) is obtained by ‘summing up’ the probabilities \(\textsf{Pr}(\theta \in A \mid \iota \in S)\), where \(\iota\) is the identity random variable, for all ‘small’ parts S of V; such \(\textsf{Pr}(\theta \in A \mid \iota \in S)\) are seen as approximations of \(\textsf{Pr}_{\mathcal {U}}(\theta \in A)\).

In the finite snapshot approach, ‘small’ in this context means ‘finite’. But from a conceptual point of view, ‘finite’ might be taken to be too small as far as the test sets (or snapshots) are concerned. Compared to V, all sets —and not just the finite sets— are small. So to determine \(\textsf{Pr}_{\mathcal {U}}(\theta \in A)\), we should take the ‘limit’ of the values \(\textsf{Pr}(\theta \in A \mid \iota \in S)\), where S is a set of any size. Then if S is infinite, \(\textsf{Pr}(\theta \in A \mid \iota \in S)\) cannot just be taken to be given by the ratio formula but needs to be defined.

In the approach to which we now turn (the bootstrapping approach), a probability \(\textsf{Pr}_{\mathcal {U}}(\theta \in A)\) is determined by the probabilities \(\textsf{Pr}_{\mathcal {U}}(\theta \in A \mid \iota \in S)\), where \(\textsf{Pr}_{\mathcal {U}}(\theta \in A \mid \iota \in S)\), for S a large set, is then in turn determined by probabilities \(\textsf{Pr}_{\mathcal {U}}(\theta \in A \mid \iota \in S')\) for \(S'\) being smaller ‘snapshots’ than S, and so on, until we reach the finite snapshots and can appeal to the probability functions that were discussed in the previous sections. Thus the bootstrapping account can be seen as a generalisation of the finite snapshot approach.

4.1 The Rough Idea

In general terms, this is how we will proceed:

  1. (1)

    By the construction from the previous section, a fine ultrafilter on \([S]^{<\omega }\) yields a notion of probability on all sets \(S \in V\) with \(\left| S \right| < \omega _{1}\). In other words, this yields a suitable notion of probability, call it \(\textsf{Pr}^S\), for every countable set S.

  2. (2)

    The notion of \(\textsf{Pr}^S\) for all \(S \in V\) with \(\left| S \right| < \omega _2\) is determined using the notion of probability on countable sets: the probability of A on such an S is determined by the class of probabilities of A on the countable ‘snapshots’ of S. Using these countable probability functions, a fine ultrafilter on \([S]^{< \omega _1}\) gives us a notion of probability on sets S with \(\left| S \right| < \omega _2\).

Again the resulting functions \(\textsf{Pr}^S\) are essentially NAP-functions as defined in (Benci et al., 2013). They are total, regular, etc.

\(\vdots\)

(\(\beta\)) A fine ultrafilter on \([S]^{< \omega _{\alpha }}\), together with probability functions \(\textsf{Pr}^S\) for all S such that \(\left| S \right| < \omega _\alpha\), yields a notion of probability on all sets S with \(\left| S \right| < \omega _{\alpha +1}\).

\(\vdots\)

Limit stages of course do not present a problem. So by transfinite recursion on cardinality this yields for every set S a notion \(\textsf{Pr}^S\) of probability on S.

Then a fine ultrafilter \(\mathcal {U}\) on \(V=[V]^{<{Card}}\) yields, using the general notion \(\textsf{Pr}^S\) for \(S\in V\), a notion \(\textsf{Pr}^V\) that is a total (class) function from properties A and random variables \(\theta\) to values \(\textsf{Pr}^V(\theta \in A)\) in a non-Archimedean class field. This probability function again satisfies the principles of the theory NAP in (Benci et al., 2013).

For this construction, we need suitable (fine) ultrafilters on increasingly larger and larger sets, and a fine ultrafilter \(\mathcal {U}\) on \([V]^{<Card}\). But we will see that all the set ultrafilters used in the construction can be uniformly obtained as restrictions to sets S of the given fine ultrafilter on \([V]^{<Card}\). So \(\textsf{Pr}^V\) is determined by one initial choice of \(\mathcal {U}\), whereby \(\textsf{Pr}^V\) can be seen as the ‘limit’ of its set-restrictions \(\textsf{Pr}^S\), where the functions \(\textsf{Pr}^S\) can in turn be seen as ‘limits’ of restrictions to their small subsets. This uniform construction has the advantage that the resulting probability functions are all coherent, in the sense that for a set T, \(\textsf{Pr}^S(\theta \in A|\iota \in T)\) is (up to a canonical embedding, as is explained in detail in the first part of the proof of Proposition 7 below) the same for all \(S\supseteq T\) and hence also for V.

Now it is time to look at details of the construction.

4.2 Details 1: Restrictions of Fine Ultrafilters

Since our construction involves ultrafilters on sets \([S]^{< \kappa }\) with \(\kappa > \omega\), we make the following definition, which accords with the usual definition of fineness on \([S]^{< \omega }\).

Definition 19

For any infinite cardinal \(\kappa\), an ultrafilter \(\mathcal {U}\) on \([S]^{< \kappa }\) is fine iff for every \(x \in S:\)

$$\begin{aligned} \{T \in [S]^{< \kappa }: x \in T \} \in \mathcal {U}. \end{aligned}$$

Moreover, an ultrafilter \(\mathcal {U}\) on \([V]^{< Card}\) is fine iff for every \(x \in V:\)

$$\begin{aligned} \{T \in [V]^{< Card}: x \in T \} \in \mathcal {U}. \end{aligned}$$

We first show that appropriate restrictions of ultrafilters to smaller sets can be obtained in a uniform fashion.

Definition 20

Suppose \(S \in V\), \(|S |= \kappa\), and \(\mathcal {U}\) a fine ultrafilter on \([S]^{<\kappa }\), and \(S' \subseteq S\) with \(|S' |= \alpha < \kappa\). Then we define the restriction \(\mathcal {U}_{S'}\) of \(\mathcal {U}\) to \(S'\) as follows.

For any \(X \in \mathcal {P}([S]^{<\kappa })\), let

$$\begin{aligned} X_{S'} \equiv \{ y \mid \exists z \in X: y = z \cap S' \text { and } |y |< \alpha \}. \end{aligned}$$

Then \(\mathcal {U}_{S'} \equiv \{ X_{S'} \mid X \in \mathcal {U} \}.\)

Proposition 6

For any \(S \in V\) with \(\left| S \right| = \kappa\), there are fine ultrafilters \(\mathcal {U}\) on \([S]^{<\kappa }\) that restrict to a fine ultrafilter on every \(S' \subseteq S\) with \(\left| S' \right| = \alpha\), and \(\omega \le \alpha < \kappa\).

Further, such ultrafilters are coherent in that if \(T\subset S'\) with \(\omega \le |T| <|S'|\), then \((\mathcal {U}_{S'})_T=\mathcal {U}_T\).

Proof

We build the ultrafilter from a pre-filter \(\mathcal {F}_0\), which can then be extended to a filter and then to an ultrafilter.

For each \(x \in S\), let

$$\begin{aligned} A_x \equiv \{ X \in [S]^{< \kappa }: x \in X \} . \end{aligned}$$

And let for each \(S'\) with \(\left| S' \right| = \alpha < \kappa\) and \(S' \subseteq S\):

$$\begin{aligned} R^{S'} \equiv \{ X \in [S]^{< \kappa }: X\cap S'\in [S']^{< \alpha } \} . \end{aligned}$$

Now set

$$\begin{aligned} \mathcal {F}_0 \equiv \{ A_x: x \in S \} \cup \{ R^{S'}: S' \subseteq S \text { and } \left| S' \right| < \kappa \} . \end{aligned}$$

It is easy to see that \(\mathcal {F}_0\) has the finite intersection property and so can be extended to an ultrafilter \(\mathcal {U}\). And by design, \(\mathcal {U}\) is fine.

Clearly \(\mathcal {U}_{S'} \subseteq \mathcal {P}([S']^{<\alpha }).\) We must check the fine ultrafilter properties for \(\mathcal {U}_{S'}\):

  1. (1)

    Fine. This follows from the fact that \(\mathcal {U}\) is fine: for \(x\in S'\) this is witnessed by \((A_x)_{S'}\).

  2. (2)

    Finite intersection. Let \(X,Y \in \mathcal {U}_{S'}\). Then there are \(\overline{X}, \overline{Y} \in \mathcal {U}\) such that \(X = \overline{X}_{ S'}\) and \(Y = \overline{Y}_{ S'}\). By the finite intersection property of \(\mathcal {U}\), we know that \(\overline{X} \cap \overline{Y} \in \mathcal {U}.\) But \(X \cap Y \supseteq (\overline{X} \cap \overline{Y} )_{ S'}.\) So \(X \cap Y \in \mathcal {U}_{S'}\).

  3. (3)

    Ultra. Take any \(X \subseteq [S']^{< \alpha }\), and let \(X^c \equiv [S']^{< \alpha } \backslash X.\) Let \(\overline{X} \equiv \{ x \in [S]^{< \kappa } \mid x \cap S' \in X \}\) and let \(\overline{X^c} \equiv \{ x \in [S]^{< \kappa } \mid x \cap S' \not \in X \}.\) Then \(\overline{X^c} = [S]^{< \kappa } \backslash \overline{X}.\) By the ultra property for \(\mathcal {U}\), we have \(\overline{X} \in \mathcal {U}\) or \(\overline{X^c} \in \mathcal {U}\). But \(X = \overline{X}_{S'}\) and \(X^c = \overline{X^c}_{S'}\). So \(X \in \mathcal {U}_{S'}\) or \(X^c \in \mathcal {U}_{S'}.\)

  4. (4)

    Non-principality. This is implied by fineness.

  5. (5)

    Empty set property: We have to show that \(\emptyset \not \in \mathcal {U}_{S'}\). It suffices to show that for each \(X \in \mathcal {U}\), \(X_{S'} \ne \emptyset\). Since \(R^{S'} \in \mathcal {U}\), \(X \cap R^{S'} \ne \emptyset\). But for any set x in this intersection, \(x\cap S'\in [S']^{< \alpha }\). So \(x\cap S'\in X_{S'}\ne \emptyset .\)

For coherence, take \(T\subset S'\subset S\) with \(|T|<|S'|<|S|\) and let \(X\in \mathcal {U}\). As \(R^{S'}\in \mathcal {U}\), and restrictions of subsets are subsets of restrictions, it is enough to show that \(((X\cap R^{S'})_{S'})_T=(X\cap R^{S'})_T\). Now \(((X\cap R^{S'})_{S'})_T=\{y \mid \exists z \in X\cap R^{S'}: y = z \cap T, |y|<|T| \text { and } |z\cap S' |< |S'|\}\), but by definition, for any \(z\in R^{S'}\) we have \(|z\cap S' |< |S'|\). Thus \(((X\cap R^{S'})_{S'})_T=\{y \mid \exists z \in X\cap R^{S'}: y = z \cap T \text { and } |y|<|T| \}=(X\cap R^{S'})_T\). \(\square\)

It can then be seen that this property must also hold for fine ultrafilters on \([V]^{< Card}:\)

Consequence 1

There are fine ultrafilters \(\mathcal {U}\) on \([V]^{< Card}\), such that for every set S with \(\left| S \right| = \alpha\), \(\mathcal {U}_{S}\) is a fine ultrafilter on \([S]^{<\alpha }\) and the coherence property holds.

Proof

By the same reasoning as in the previous proposition. \(\square\)

4.3 Details 2: Defining Probability Functions

Now we show how for every set, a probability function on that set can be defined. The same procedure can then be used to define a probability function on V, and these probability functions are coherent.

The key is to spell out what is involved in the \(\beta\)-th step of the recursive procedure for defining probabilities on sets:

(\(\beta\)) A fine ultrafilter \(\mathcal {U}\) on \([S]^{< \omega _{\beta }}\) (with \(\omega _{\beta } = \left| S \right|\)), together with probability functions \(\textsf{Pr}^T\) for all T such that \(\left| T \right| < \omega _{\beta }\), yields a notion of probability \(\textsf{Pr}^S\) on S.

As in Sect. 2, we define a function \(f_{\theta \in A}\) such that for all \(T \in [S]^{< \omega _{\beta } }\):

$$\begin{aligned} f_{\theta \in A} (T) \equiv \textsf{Pr}^T (\theta \in A\mid \iota \in T) \equiv \textsf{Pr}^T (\{s\in T: \theta (s) \in A\}) . \end{aligned}$$

Similarly, we define a function \(f_{\theta \in A \wedge \nu \in B}\) such that for all \(T \in [S]^{< \omega _{\beta } }\):

$$\begin{aligned} f_{\theta \in A \wedge \nu \in B} (T) \equiv \textsf{Pr}^T (\theta \in A \wedge \nu \in B\mid \iota \in T) \equiv \textsf{Pr}^T (\{s\in T: \theta (s) \in A \wedge \nu (s) \in B\}) . \end{aligned}$$

Then \(\textsf{Pr}^S (\theta \in A)\) is defined as \([f_{\theta \in A}]_{\mathcal {U}}\), and \(\textsf{Pr}^S (\theta \in A \mid \nu \in B)\) is defined as

$$\begin{aligned} \frac{[f_{\theta \in A \wedge \nu \in B}]_{\mathcal {U}}}{[f_{\nu \in B}]_{\mathcal {U}}}. \end{aligned}$$

This function \(\textsf{Pr}^S\) will then be an NAP probability function in the sense of (Benci et al., 2013).

Now in an exactly similar way, we define a class probability function \(\textsf{Pr}^+_{\mathcal {U}}\) on V, using the probability functions on ‘small’ classes (i.e., sets) and ultrafilters on ‘small’ classes which (given proposition 6) we can now assume to have been defined on the basis of an ultrafilter \(\mathcal {U}\) on \([V]^{<Card}\) with which we start. The function \(\textsf{Pr}^+_{\mathcal {U}}\) is total, regular, and uniform for the same reasons as why its ‘smaller cousin’ \(\textsf{Pr}_{\mathcal {U}}\) has these properties.

We now check coherence. We will do this only for probabilities of the identity random variable rather than random variables in general, as although coherence holds for all random variables, it is more technical to prove. Below we use \(\textsf{Pr}(A)\) to denote \(\textsf{Pr}(\iota \in A)\), \(\textsf{Pr}(A|T)\) for \(\textsf{Pr}(\iota \in A|\iota \in T)\), and \(f_A(T)\) for \(f_{\iota \in A}(T)\), where \(\iota\) is the identity random variable.

Proposition 7

For any class A and sets \(T\subset S\) with \(|T|<|S|\) we haveFootnote 22

$$\begin{aligned} \textsf{Pr}^T(A)=\textsf{Pr}^S(A|T). \end{aligned}$$

Proof

We show by induction on |T| that that the above holds for all \(S\supset T\) with \(|S|>|T|\). This is trivial for finite T as on both sides we are just using the ratio formula. For infinite T, strictly speaking, the range of \(\textsf{Pr}^T\) may be a non-Archimedean field that is different from the range of \(\textsf{Pr}^S\), but there is a natural embedding of the former into the latter defined by \(i([f]_{\mathcal {U}_T})=[\bar{f}]_{\mathcal {U}_S}\) where for \(X\in S^{<|S|}\), \(\bar{f}(X)=f(X\cap T)\) where the latter is defined and zero otherwise. The embedding is well-defined as \(\{X\in S^{<|S|}:|X\cap T|<|T|\}=(R^T)_S\in \mathcal {U}_S\), so if \(g\in [f]_{\mathcal {U}_T}\), so g agrees with f on a set Y in \(\mathcal {U}_T\), then \(\bar{f}\) agrees with \(\bar{g}\) on \((R^T)_S\cap \bar{Y}\), where \(\bar{Y}\) is any set in \(\mathcal {U}_S\) with \(\bar{Y}_T=Y\). Such a \(\bar{Y}\) exists by cohernce.

For a given infinite T we assume the property holds for any \(T'\) with \(|T'|<|T|\), and show it holds for T. Let S be arbitrary with \(|S|>|T|\).

Using the embedding we have \(i(\textsf{Pr}^T(A))=i([f_A]_{\mathcal {U}_T})=[\bar{f_{A}}]_{\mathcal {U}_S}\). Now for \(X\in (R^T)_S \, (\in {\mathcal {U}_S})\) we have:

$$\begin{aligned} \bar{f_{A}}(X)=f_A(X\cap T)=f_{A\cap T}(X\cap T)=\textsf{Pr}^{X\cap T}(A\cap T). \end{aligned}$$

As \(X\in (R^T)_S\) we have \(|X\cap T|<|T|\) so by our inductive hypothesis

$$\begin{aligned} \textsf{Pr}^{X\cap T}(A\cap T)=\textsf{Pr}^X(A\cap T|T)=\frac{f_{A \cap T}(X)}{f_{T}(X)}. \end{aligned}$$

But by definition, \(\big [\frac{f_{A \cap T}}{f_{T}}\big ]_{\mathcal {U}_S}=\textsf{Pr}^S(A|T)\), so \([\bar{f_{A}}]_{\mathcal {U}_S}=\textsf{Pr}^S(A|T)\) and we’re done. \(\square\)

4.4 Comparison of the Finite Snapshot Approach and the Bootstrapping Approach

In our definition of the probability of a set theoretic property, the probability \(\textsf{Pr}^+_{\mathcal {U}}(\theta \in A)\) of \(\theta\) having the property A is determined by the probabilities \(Pr^S(\theta \in A )\) of A on large ‘snapshots’ S, where a probability \(Pr^S(\theta \in A )\) (for S a large set) is then in turn determined by the probabilities \(Pr^{S'}(\theta \in A)\) for \(S'\) being smaller ‘snapshots’ than S, and so on. Conceptually, the definition in Sect. 4.3 is superior to the simpler definition suggested from Sect. 2: we want to take the behaviour of the property on as many and as large ‘snapshots‘ as possible into account.

It is not straightforward to compare the simple and the more involved definition: the simple method is based on an ultrafilter on \([V]^{< \omega }\) whereas the more involved method is based on an ultrafilter on \(V=[V]^{< Card}\).

The obvious suggestion is to base the comparison on the relation between a probability function determined by an ultrafilter \(\mathcal {U}\) on \([V]^{< Card}\) and its restrictionFootnote 23 to \([V]^{< \omega }\) defined as \(\mathcal {U}\upharpoonright \omega =\{X\cap [V]^{< \omega }|X\in \mathcal {U}\}\). But:

Proposition 8

Not all ultrafilters on \([V]^{< Card}\) restrict to ultrafilters on to \([V]^{< \omega }\).

Proof

Consider \(\mathcal {A} \cup \overline{[V]^{< \omega }}\), where \(\mathcal {A}\) is the collection of sets of the form \(A_x\) for various x as used throughout the paper (guaranteeing fine-ness) and \(\overline{[V]^{< \omega }}\) is the relative complement of \({[V]^{< \omega }}\) in \({[V]^{< Card}}\). Then \(\mathcal {A} \cup \overline{[V]^{< \omega }}\) has the finite intersection property and so can be extended to a fine ultrafilter \(\mathcal {U}\) on \({[V]^{< Card}}\). But \(\emptyset \in \mathcal {U}\upharpoonright \omega\). So \(\mathcal {U}\) does not restrict to an ultrafilter on \([V]^{< \omega }\). \(\square\)

On the other hand, every fine ultrafilter on \([V]^{< Card}\) restricting to an ultrafilter on \([V]^{< \omega }\) essentially is an ultrafilter on \([V]^{< \omega }\):

Proposition 9

Suppose \(\mathcal {U}\) is a fine ultrafilter on \([V]^{< Card}\) restricting to an ultrafilter \(\mathcal {U} \upharpoonright \omega\) on \([V]^{< \omega }\). Then \([V]^{< \omega } \in \mathcal {U}\).

Proof

Since \(\mathcal {U}\) is ultra, we have \([V]^{< \omega }\in \mathcal {U}\) or \(\overline{[V]^{< \omega }}\in \mathcal {U}\). But if \(\overline{[V]^{< \omega }}\in \mathcal {U}\), then \(\emptyset \in \mathcal {U}\upharpoonright \omega\), so that \(\mathcal {U}\) does not restrict, contradicting the assumption. So \([V]^{< \omega }\in \mathcal {U}\). \(\square\)

This means that the probability functions on V generated by the bootstrapping method cannot be reduced to ‘simple’ probability functions on V that were discussed in the previous section.

In sum, in the preceding sections we have explored two methods for modelling, by means of non-Archimedean probability functions, the properties of random variables ranging over the set theoretic universe. Concerning the finite snapshot method, we found that many of the probabilistic properties that seem intuitively plausible can be satisfied. The bootstrapping method is more satisfying from a conceptual point of view. But we have only been able to show that the resulting probability functions satisfy minimal requirements. Much work on the bootstrapping method therefore remains to be done.

5 Concluding Remarks

The real numbers are in a sense close to our physical world: we routinely use them to model physical phenomena and processes. For that reason, throwing a dart randomly at the real unit interval appears to be probabilistically meaningful,Footnote 24 even though, due to our finite powers of discrimination, this scenario cannot be experimentally realised.Footnote 25 In contrast with this, throwing a dart randomly at V seems much further removed from what probability theory is about. Therefore one might wonder whether the notion of random variable on V is a probabilistically meaningful concept at all.Footnote 26

In response to this, one might argue that if the abstract notions of random graph and random space are probabilistically meaningful, then why is the notion of random set not equally meaningful? But even those who are, for the reasons given in the previous paragraph, sceptical about attempts to apply probabilistic notions to the mathematical universe as a whole, might find implications of our results for generalised probability functions on the real numbers meaningful, as we shall now argue.

Consider the scenario of throwing a dart randomly at the real unit interval from the perspective of Kolmogorov probability. From that perspective, this is a probabilistically coherent scenario. It is then easy to see that not only all point events must have probability 0, but it must also be the case (because of \(\sigma\)-additivity) that for each countably infinite set \(A \in [0,1]\), the probability that the random dart lands on a number in A must be 0.

If we consider this scenario from the perspective of the framework of the present article, the picture changes. Already according to the notion of probability described in the framework of (Benci et al., 2013), the dart landing in any given countably infinite subset of [0, 1] is “more probable” than it landing in any finite subset of the unit interval.

This seems natural, but the intuition takes us further: shouldn’t it be the case that in general, for any sets \(A, B \subseteq [0,1]\) with \(\left| A \right| < \left| B \right|\), the probability that the dart lands in B is higher than the probability that it lands in A? Theorem 1 shows that there exist generalised probability functions that satisfy this condition, so that in the framework of (Benci et al., 2013) this intuition can be fulfilled. This goes beyond the results in (Benci et al., 2013), where this was only shown for A finite: Theorem 1 implies that any uncountable subset can be given higher probability than any countable subset. The phenomenon becomes even more dramatic when the Continuum Hypothesis fails:Footnote 27 we may require, for any two infinite sets \(A, B \subseteq [0,1]\) such that \(\left| A \right|< \left| B \right| < \left| \mathbb {R} \right|\) (and there will then be such pairs of sets), the probability that the dart lands in B is higher than the probability that the dart lands in A.

This can then be seen as a natural strengthening of the phenomenon (i.e., the above for A finite) that was already known to hold in the framework of (Benci et al., 2013). Since we are still only talking about the real unit interval, this is a result that has concrete probabilistic content. The bootstrapping method can also be applied to generalised probability on the real interval [0, 1], so the conceptual refinement gained is equally relevant to this more concrete situation.