Sets and Probability

In this article the idea of random variables over the set theoretic universe is investigated. We explore what it can mean for a random set to have a specific probability of belonging to an antecedently given class of sets.


Introduction
Probabilistic notions have been applied to mathematical objects and notions.For instance, probabilistic concepts have been applied in the theory of random graphs [Alon et al 2000].The aim of this article is to apply a notion of probability to the mathematical universe as a whole.More in particular, we wish to explicate what it could mean for a property A of sets to have a probability of being true of a set y in the set theoretic universe V. Properties are identified with their extensions, so that A ranges over all proper and improper classes in V.
The aim is to develop a theory of the probability of events of the form A(τ), where A is a class and the variable τ is a random variable.The state space of the random variables is of course V.The outcome space of the random variables has to be at least as large as V because there must be enough states for a random variable to take each set as a possible value.On the other hand, there is no need for it to be larger than V. Therefore the outcome space is simply identified with V.
Without invoking fixed set of postulates, intuitions about probability have occasionally been used in set theory, for instance to motivate new basic principles [Freiling 1986].However, such attempts are mostly regarded as unsuccessful [Hamkins 2015].In the light of this it is natural to wonder what we should require from probability functions associated with random variables on V.
Surely it would be unreasonable to insist on there being one unique correct probability function that yields the probability of a random variable taking a value in a given class of sets.On the other hand, for our functions to have any hope of meriting the label probability function, they have to satisfy Kolmogorov's conditions for being a finitely additive probability function.
From the outset we impose additional constraints on the class of probability functions that we are interested in:1 1. Totality.The probability functions are defined on all classes.2. Uniformity.All singleton events are given the same probability.
3. Regularity.All singleton events are given non-zero probability.
All this means, for familiar reasons, that the sought-for probability functions cannot be Kolmogorov probability functions.Given our insistence on finite additivity, this means that the probability functions will be non-Archimedean.They will not satisfy σ-additivity, but they will instead satisfy a generalised infinite additivity rule.
In mathematics today, the term 'probability' has become virtually synonymous with 'function that satisfies the Kolmogorov axioms (including σadditivity)'.If you see matters this way, then you will will be loath to dignify the functions constructed in this paper by the term 'probability function'.Nonetheless, you may ask the question whether a fine-grained quantitative theory of possibility, with which the degree of possibility of properties can quantitatively be compared, can be constructed.This is what is investigated in the present article.So, if you prefer, you can call the theory constructed in this paper a quantitative theory of possibility.You are then advised to replace all occurrences of '(non-standard) probability function' by 'quantitative possibility function'.
The project in which we are engaging in this article is related to the work in [Benci et al 2007].The aim of the latter article is to construct a theory of sizes for mathematical universes inspired by the Euclidean principle that the size of the whole is larger than the sizes of its proper parts.Now there is of course a familiar theory of size-Cantor's theory of cardinality,which does not satisfy this Euclidean principle.So Benci and his co-authors propose their Euclidean theory of size as a rival to Cantor's theory.
We, on the other hand, fully accept Cantor's theory of cardinality.Nonetheless, the probability functions that will be constructed satisfy the Euclidean principle that the probability of an event is strictly greater than the probability of each of its sub-events.Moreover, the mathematical techniques for generating them are closely related to the techniques that are used in [Benci et al 2007].
What we shall mean by 'mathematical universe' is not the same as what is meant [Benci et al 2007] by the term.The authors of [Benci et al 2007] impose mainly algebraic constraints on what counts as a mathematical universe [Benci et al 2007, Introduction].We, in contrast, take the term 'mathematical universe' in the set theoretical sense.Naively, you may take there to be one preferred set theoretic universe: V.But if you are uncomfortable with taking V as given, then you might want to take a mathematical universe to be a rank V α that constitutes a model of most or perhaps even all of the standard principles of set theory.Indeed, we will see that for random variables defined on any large set S, the general idea of equipping them with a probability function will be the same as that for random variables on V.
We will discuss two ways of generating non-Archimedean probability functions for random variables on V.In section 2 a simple way of generating such probability functions (the finite snapshot approach) will be described.In section 3 we go on to discuss how global properties of these probability functions can be made to hold by imposing constraints on the process of generating such functions.In section 4, a theoretically more satisfying but also more complicated way of generating non-Archimedean probability functions for random variables on V is discussed (the bootstrap-ping method).

The finite snapshot approach
A random variable τ on V is a function from states to the outcome space, i.e., an element of V V.So there are many random variables on V.The aim is to associate a notion of probability with elements of V V that meet the minimal constraints (totality, uniformity and regularity) that were described in section 1.
In fact, we want to give precise meaning to conditional probability statements of the form where σ, τ ∈ V V and A, B ⊆ V.But we will see that it will be sufficient for our purposes to give meaning to unconditional probability statements of the form Pr(σ ∈ A).So our fundamental problem amounts to giving meaning to expressions of the form Pr(σ ∈ A).Such probability measures will be determined by a choice of a fine ultrafilter on the collection [V] <ω of finite subsets of the state space. 2he starting point is a fine ultrafilter U on [V] <ω .This fine ultrafilter U defines a non-Archimedean field F U in the following way.
For any two functions f , g : [V] <ω → Q we define: In words: two functions are identified if they coincide on ultrafilter-many states.
The relation ≈ U is an equivalence relation, so we can take equivalence classes for which we then have Moreover, it is again a routine exercise to verify that the [ f ] U 's form a hyper-rational field F U .Now suppose A ⊆ V and θ ∈ V V. Then we define the function f θ∈A : [V] <ω → Q as follows: Definition 2 For every T ∈ [V] <ω : In words: for every finite set of states T, f θ∈A (T) is the ratio between the number of states s in T for which θ(s) ∈ A and the number of states in T.
In this sense, f θ∈A (T) is the probability of θ ∈ A on a finite snapshot of states.
Similarly, we define the function f θ∈A∧ν∈B as follows: Definition 3 For every T ∈ [V] <ω : Now we are ready to define the probability of θ ∈ A, relative to a fine (and therefore free) ultrafilter U on [V] <ω : Similarly, we define Pr U (θ Thus we have constructed a probability function Pr U that takes its values in the hyperrational field F U .Such probability functions are sometimes called NAP functions.
Conditional probability can then be expressed in terms of unconditional probability:

Constraints
From section 1 we know that the aim is not to arrive at a unique (correct) probability function on V.But we did insist from the outset on our probability functions satisfying three global constraints: totality, uniformity, and regularity.It will be shown that these properties are always guaranteed to hold.
There are further global conditions on probability functions on V that seem reasonable to require, and that are not guaranteed to hold without further work.These global constraints will be explored.We will show that many of them can be forced to hold by imposing constraints on the ultrafilters from which the probability functions are generated.

Elementary properties
The definition of Pr U is relative to an initial choice of the fine ultrafilter U .The properties of Pr U depend on U .Nonetheless, certain basic properties of Pr U can be easily seen to hold regardless of which fine ultrafilter U is chosen: Proof.Easy.

Now we define the notion of a diagonal random variable:
Definition 6 A random variable θ is said to be a diagonal random variable if for any set x, there is exactly one element u of the state space such that θ(u) = x.
In words: a diagonal random variable is a random variable that takes every value exactly once.
Using this notion, we define the notions of regularity and uniformity: Definition 7 (regularity) A probability function Pr U is regular if for every diagonal random variable θ and for every x ∈ V, Pr U (θ = x) > 0.
Definition 8 (uniformity) A probability function Pr U is uniform if for every diagonal random variable θ and for all x, y ∈ V : Proposition 2 For every fine ultrafilter U : 1. Pr U is regular; 2. Pr U is uniform.
The Euclidean property is formally defined as follows: Definition 9 (Euclidean) A probability function Pr U is Euclidean if for every diagonal random variable θ and all A, B ⊆ V: Then we have: Proposition 3 For every fine ultrafilter U , the probability function Pr U is Euclidean.
Proof.By finite additivity and regularity.
Now we turn to infinite additivity.Countable additivity means that the probability of the union of a countable family of disjoint sets is the infinite sum of the probabilities of the elements of the family, where the notion of infinite sum is spelled out in terms of the classical notion of limit.In the present setting, the probability Pr U of the union of any family of disjoint sets is also the infinite sum of the probabilities of the elements of the family [Benci et al 2013, section 3.4].But now the notion of infinite sum is spelled out in terms of the generalised notion of limit based on the ultrafilter U .More precisely, the new notion of infinite sum is defined as follows.Suppose we are given a family {q i : i ∈ N} of rational numbers, and I ⊆ N. Then consider the function f : This function can be seen as giving the value of the infinite sum on all finite parts ("snapshots") of the index set.So we identify the infinite sum of the family {q i : i ∈ I} of rational numbers with the generalised limit of f according to the ultrafilter U : Using this notion of infinite sum, we can express the probability of the union of a disjoint family of sets as the sum of the probabilities of the members of that family: for all i, j ∈ I, then for every random variable τ: In sum, Pr U has a natural infinite additivity property that is sometimes called perfect additivity.
Proposition 5 For every fine ultrafilter U , the probability function Pr U is perfectly additive.Proof.This proposition is proved as proposition 8 in [Benci et al 2013, p. 132-133].

Symmetry principles
From now on, the symbol θ will be used to refer to some arbitrary diagonal random variable.When it is not assumed that the random variable in question is diagonal, we will write τ.
The Euclidean-ness of Pr U has implications for symmetry principles.As a rule of thumb, one can say that symmetry principles fail.3Proposition 6 For every fine ultraflter U , the probability function Pr U is not invariant under all permutations of V. Proof.We concentrate on N as it is canonically represented in V (by means of the Zermelo ordinals, for instance).Define a permutation π of V as follows: Let A ≡ {0, 2, 4, . ..}, and let θ be a diagonal random variable.Then π(A) A. Therefore, by the Euclidean principle, This of course entails that there are diagonal random variables θ, θ ′ such that for some A ⊆ V, One popular global constraint on probability measures is translationinvariance.The Lebesgue measure has this property, and Banach limits seem to occupy a privileged position in the class of generalised limits at least in part because they are translation-invariant.In our context, translationinvariance does not make obvious sense.For a random class A, it is not clear what 'A + α' (where α is a number) means.But a clear interpretation of 'adding an ordinal number' can of course be given if A is a collection of ordinals: Definition 11 For A any collection of ordinals: Then for A to be translation-invariant means that for all ordinals α and for every θ, However, even if we consider non-Archimedean measures (of the kind that we have been describing) on ordinals, translation-invariance conflicts with the Euclidean Property of our generalised probability functions.In particular, there is no NAP probability function Pr U on any infinite cardinal κ such that there is even one ordinal α with 0 < α < κ and The reason is simple.We have κ ⊕ α = κ\α κ, so if we had Pr U (θ ∈ κ) = Pr U (θ ∈ κ ⊕ α), then we would contradict the Euclidean principle.
As this example shows, such translations arent necessarily one to one so we may not want full invariance in general.In [Benci et al 2007, section 1.3], Benci, Forti, and Di Nasso explore a restricted notion of translationinvariance of NAP-like measures on ordinals.We do not pursue this theme further here, but only pause to note that there are other reasonable-looking principles that are hard to satisfy.In the context of their theory of numerosities, Benci, Forti, and Di Nasso consider a principle that in the present context would take the following form: On countable sample spaces, the difference principle can be made to hold by building Pr U from a selective ultrafilter [Benci et al 2003].But the existence of selective ultrafilters is independent of ZFC.As far as we know, it is an open whether the difference principle can be consistently made to hold for NAP probability functions on uncountable sample spaces.

Probability and cardinality
In this (sub-)section we investigate the relation between our notion of generalised probability on the one hand, and the familiar notion of cardinality on the other hand.

Hume's principle for probability
One might naively wonder whether the following probabilistic analogue of Hume's Principle for cardinality can hold: Definition 13 (Hume's principle for probability) For all A, B ∈ V: But the probability functions Pr U that we have been considering cannot satisfy Hume's principle for probability, as its failure is an immediate consequence of Proposition 6: invariance under permutations and Hume's principle for probability are mathematically equivalent.However, this was only to be expected.After all, we do not expect Kolmogorov probability (on infinite spaces) to satisfy any such principle.

Superregularity
The hyper-rational field F U in which the probability functions Pr U take their values contain infinitesimal numbers-this is what makes it non-Archimedean.We will write Pr U (σ We have seen that Pr U cannot satisfy Hume's principle for probability.But, at least at first sight, it seems that it would be reasonable to demand: Indeed, if in addition |B| ≥ ω, then we might even expect Further, this may be expected to hold if B is a proper class but A is a set .The result is a size constraint which is a strengthening of the requirement of regularity: Note that if A is finite and B is infinite then the consequent holds automatically.
By a suitable restriction on admissible ultrafilters U , superregularity can indeed be made to hold: Theorem 1 There are fine ultrafilters U such that Pr U is superregular.

Proof.
If A, B ∈ V such that ω ≤ |A| < |B| are given, then we have The aim is to build an ultrafilter U for which this holds.
For any n ∈ N, define Define also We want to prove that F has the finite intersection property.Therefore take any x 1 , . . . ,x k ∈ V, and any A 1 , B 1 , n 1 , . . ., A l , B l , n l such that A j < B j and n j ∈ N for j ≤ l.Assume for the construction that So setting n = max{n j : j < l} we will extend {x 1 , . . ..x k } to a set in C n A j B j , and hence C Continuing in this manner, set F = F l .Then we have ensured that for all j ≤ l and so we have F ∈ C n A j B j , and since D ⊆ F, we also have F ∈ i≤k A x i .So F indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter U .By design, then, the resulting probability function Pr U is super-regular.
Once again, Hume's Principle for probability cannot hold for the notion of probability that we are investigating.But this leaves open the question whether the converse of Hume's Principle for probability can be made to hold.This is called Cantor's Principle in [Benci et al 2007], where the authors investigate it in the context of their Euclidean theory of size: Definition 15 (Cantor's Principle) Benci, Forti, and Di Nasso prove that 'Cantor's Principle' can be made to hold [Benci et al 2007, section 3.2].It is also clear that Cantor's Principle follows from super-regularity.

The power set principle
The question whether is true, is independent of the axioms of set theory.(Of course the principle is true if the Generalised Continuum Hypothesis holds.)Like the cardinality operator, our NAP probability functions are measures of some kind.One might wonder what should follow from Pr U (θ ∈ A) < Pr U (θ ∈ B).In particular, given that Pr U is intended to be a fine-grained quantitative possibility measure, perhaps probability should be expected to co-vary with the power set operation in some fairly direct manner.In other words, it is natural to ask if the following principle can be made to hold: It turns out that the power set condition can indeed be satisfied: Theorem 2 There are fine ultrafilters U such that Pr U satisfies the power set condition.
The argument for this is somewhat more involved.
We aim to prove Theorem 2 by building the probability function up from an ultrafilter U which is based on a pre-filter C ⊆ P([V] <ω ) that has the finite intersection property.
The class C is built up in stages, and in such a way that it eventually witnesses the truth of the power set condition for all A, B ∈ V.

Stage 0
The class C 0 consists of all for x ∈ V.This is to ensure that the ultrafilter that will be built from C is fine.We know that C 0 has the finite intersection property.

Successor stages
Given fine-ness, we may, and will, ignore the elements of V ω .At stage α > ω, where α is a successor ordinal, we consider the sets of V α \V α−1 and ensure that the power set condition eventually holds for all these sets and their power sets, by adding families of finite sets to C α−1 in such a way that the finite intersection property is preserved.
As an illustrative and indeed representative example we do the case where α = ω + 1.
For the induction, we assume that, by having added appropriate sets of finite sets to C 0 , the power set condition holds for {A 1 , B 1 }, . . ., {A β , B β } and their power sets, and that in the process the finite intersection property has been preserved.The aim is now to extend this so that it also holds for {A β+1 , B β+1 }.In other words, we have constructed C β 1 , and we want to obtain C β+1 1 , where C 0 1 ≡ C 0 .
Definition 17 Definition 18 • every set of the form C P (A)≥P (B) such that C A≥B ∈ C − 1 .Call the resulting set C 1 .Our aim is to prove that C 1 has the finite intersection property.
Consider an arbitrary non-empty finite family F ⊆ C 1 .Without loss of generality we may assume that the 'judgements' in F of the form C P (A)<P (B) or C P (A)≥P (B) , taken together, describe a finite total pre-ordering relation R on some set {P(A 1 ), . . ., P(A k )}.Further, we may also assume that for and sets A and B from V ω+1 \V ω , C P (A)<P (B) ∈ F if and only if C A<B ∈ F , and C P (A)≥P (B) iff C A≥B ∈ F .Thus F contains witnesses for all the relevant judgements we may be interested in.
Let F − = F ∩ C − 1 , so F − consists only of judgements about sets in V ω+1 \V ω .Then we know from the foregoing that F − = ∅.So take some F − ∈ F − .Our plan is inductively to extend F − , using the preorder R, to a finite set F ∈ F .
We will add to F − elements that ensure that the constraints of R are satisfied.Moreover, by choosing the elements to be added to F − from V ω+1 \V ω ,4 we ensure that the constraints imposed by F − remain satisfied.As a result, F will satisfy all constraints from F , so F = ∅ and hence C 1 has the finite intersection property.
(1) We start by ensuring that P(A 1 ) < P(A 2 ) is satisfied.Suppose that F − already contains n elements of P(A 1 ).Since C A 1 <A 2 ∈ F , there must be an element x − ∈ A 2 \A 1 .This implies that there are infinitely many infinite sets x in P(A 2 )\P(A 1 ) such that x − ∈ x: we add n + 1 such elements to F − , and call the resulting finite set F − 1 .
(2) We proceed in similar fashion to ensure that P(A 2 ) < P(A 3 ) is satisfied: Suppose that F − 1 already contains m elements from P(A 2 ), observing that it may be the case that m > n + 1, for there may already be a finite number of elements of P(A 2 ) in F − .Since C A 2 <A 3 ∈ F , there must be an element y − 1 ∈ A 3 \A 2 , and since C A 1 <A 3 ∈ F , there must be an element y − 2 ∈ A 3 \A 1 .So there are infinitely many infinite sets y in P(A 3 ) such that y − 1 , y − 2 ∈ y: add m + 1 such elements to F − 1 , and call the resulting set (3) Now suppose that there are m 1 elements of P(A 3 ) in F − 2 , and m 2 elements of P(A 4 ) in F − 2 .Moreover, suppose that m 2 < m 1 .(The case where m 1 < m 2 is similar.)Since C A 3 ≥A 4 , C A 4 ≥A 3 ∈ F , but also A 3 = A 4 , there must be some x 1 ∈ A 3 \A 4 and some x 2 ∈ A 4 \A 3 .Moreover, since Similarly, P(A 3 ) contains infinitely many infinite sets x that are outside P(A 1 ), P(A 2 ), P(A 4 ).So we add a sufficient number of such elements to F − 2 so that there are an equal number p of "witnesses" for P(A 3 ) as for P(A 4 ) but where p is larger than the number of witnesses for P(A 2 ).Call the resulting set F − 3 .(4) To conclude, we set F ≡ F − 3 .It is clear that F ∈ F .This procedure of extending F − easily generalises to any finite total pre-ordering on {P(A 1 ), . . ., P(A k )}.Thus we have shown that C 1 has the finite intersection property.This procedure for extending C 0 to C 1 while preserving the finite intersection property also works for larger successor ordinals: at level V α+1 (stage β + 1 with α = ω + β) we can extend the corresponding F − using subsets of rank α.As we have said above, at limit stages we can simply take unions.Ultimately we set C ≡ α∈On C α .
The class C will then have the finite intersection property, so it can be extended to a filter and then to an ultrafilter U .The probability function based on U will make the power set condition true for all A, B ∈ V, and this concludes the proof of theorem 2.
Our proof actually shows something slightly stronger: for all A, B with |A| , |B| ≥ ω, we have The reason is that in enlarging the set F − we always have infinitely many elements to choose from.
For any probability measure Pr U that satisfies power set condition we also have that ∀A, B ∈ V, ∀n ∈ ω: where P n (A) = P(P(. . .P(A) . . .)).An easy argument shows this cannot extend to infinite applications of the power set operation.
One might wonder whether the motivations behind the power set condition should not also support imposing the following restricted power set condition on Pr U :5 Question 1 Are there probability measures such that

The ordinals
For α ≥ ω, in each level V α+1 \ V α of the iterative hierarchy one finds only one ordinal, but infinitely many sets that are not ordinals.This might lead one to believe that a probability function on V should satisfy where 'On' is the class of ordinals.
Just as it seems reasonable to require that the probability of choosing an even natural number from the set of natural numbers must be equal to or infinitesimally close to 1 2 (see [Wenmackers et al 2013, section 6.2]), it seems reasonable to require that where 'Even' is the class of even ordinals, which is defined in the obvious way.
Moreover, between any two limit ordinals there are infinitely many successor ordinals, so one might expect where 'Lim' is the class of limit ordinals.
We will sketch how probability functions can be constructed that meet these expectations.Indeed, we will see that there are probability functions that meet these 'ordinal expectations' and in addition meet the size constraint of super-regularity.
Theorem 3 There are super-regular probability functions Pr such that: Proof.As before, the aim is wisely to choose the ultrafilter U on which Pr U is based.We want U to be such that for all k, l, m ∈ N: Now we define: And now we set: Claim: F 0 has the finite intersection property.Let some x 1 , . . ., x n be given.Now i≤n I l i = I l where l = max{l i : i < n}, and similarly for i≤n W m i , so as before in theorem 1, it suffices to concentrate on the highest values of k, l, m.
(2) Again we concentrate on one pair A, B such that ω ≤ |A| < |B|; we leave out further cases as they are similar.There are arbitrarily large finite subsets C ⊆ B that are l-isolated from elements of A 0 , meaning that each ordinal in C is more than l ordinals removed from any ordinal in A. We choose any such C ⊆ B that is of size at least k • n, and we set A 1 ≡ A 0 ∪ C.
(3) Now we extend A 1 to ensure that all ordinal intervals are of length ≥ l: for each α ∈ A 1 , we add α + 1, . . ., α + l.Call the resulting finite collection A 2 .Note that by our choice of l-isolated elements in (2), none of α + 1, . . ., α + l are elements of A. (4) Let |A 2 | = j.Then we add j • m elements of V \ (A ∪ B ∪ On) to A 2 and call the resulting set A 3 .
It is now routine to verify that A 3

The case including further sets C k
A ′ B ′ is similar, thus the claim is verified.So F 0 indeed has the finite intersection property, whereby it can be extended to a filter and then further to an ultrafilter U .By design, the resulting probability function Pr U has the required properties.

The bootstrapping approach
The probability Pr U (θ ∈ A) is obtained by 'summing up' the probabilities Pr(θ ∈ A | θ ∈ S) for all 'small' parts S of V; such Pr(θ ∈ A | θ ∈ S) are seen as approximations of Pr U (θ ∈ A).
In the finite snapshot approach, 'small' in this context means 'finite'.But from a conceptual point of view, 'finite' might be taken to be too small as far as the test sets (or snapshots) are concerned.Compared to V, all sets -and not just the finite sets-are small.So to determine Pr U (θ ∈ A), we should take the 'limit' of the values Pr(θ ∈ A | θ ∈ S), where S is a set of any size.Then if S is infinite, Pr(θ ∈ A | θ ∈ S) cannot just be taken to be given by the ratio formula but needs to be defined.
In the approach to which we now turn (the bootstrapping approach), a probability Pr U (θ ∈ A) is determined by the probabilities Pr U (θ ∈ A | θ ∈ S), where Pr U (θ ∈ A | θ ∈ S), for S a large set, is then in turn determined by probabilities Pr U (θ ∈ A | θ ∈ S ′ ) for S ′ being smaller 'snapshots' than S, and so on, until we reach the finite snapshots and can appeal to the probability functions that were discussed in the previous sections.Thus the bootstrapping account can be seen as a generalisation of the finite snapshot approach.

The rough idea
In general terms, this is how we will proceed: (1) By the construction from the previous section, a fine ultrafilter on [S] <ω yields a notion of probability on all sets S ∈ V with |S| < ω 1 .In other words, this yields a suitable notion of probability, call it Pr S , for every countable set S.
(2) The notion of Pr S for all S ∈ V with |S| < ω 2 is determined using the notion of probability on countable sets: the probability of A on such an S is determined by the class of probabilities of A on the countable 'snapshots' of S. Using these countable probability functions, a fine ultrafilter on [S] <ω 1 gives us a notion of probability on sets S with |S| < ω 2 .
Again the resulting functions Pr S are essentially NAP-functions as defined in [Benci et al 2013].They are total, regular, etc. . . .(β) A fine ultrafilter on [S] <ω α , together with probability functions Pr S for all S such that |S| < ω α , yields a notion of probability on all sets S with |S| < ω α+1 .
. . .Limit stages of course do not present a problem.So by transfinite recursion on cardinality this yields for every set S a notion Pr S of probability on S.
Then a fine ultrafilter U on V = [V] <Card yields, using the general notion Pr S for S ∈ V, a notion Pr V that is a total (class) function from proper-ties A and random variables θ to values Pr V (θ ∈ A) in a non-Archimedean class field.This probability function again satisfies the principles of the theory NAP in [Benci et al 2013].
For this construction, what we need is suitable (fine) ultrafilters on small, and somewhat larger, and large, . . .sets, and a fine ultrafilter U on [V] <Card .But we will see that all the set ultrafilters used in the construction can be uniformly obtained as restrictions to sets S of the given fine ultrafilter on [V] <Card .So Pr V is determined by one initial choice of U , whereby Pr V can be seen as the 'limit' of its set-restrictions Pr S , where the functions Pr S can in turn be seen as 'limits' of restrictions to their small subsets.This uniform construction has the advantage that the resulting probability functions are all coherent, in the sense that for a set T, Pr S (A|T) is the same for all S ⊇ T and hence also for V.
Now it is time to look at details of the construction.

Details 1: Restrictions of fine ultrafilters
Since our construction involves ultrafilters on sets [S] <κ with κ > ω, we make the following definition, which accords with the usual definition of fineness on [S] <ω .
The notion of 'set-fine' ultrafilter on V is defined in the obvious way.
We first show that appropriate restrictions of ultrafilters to smaller sets can be obtained in a uniform fashion.
Definition 20 Suppose S ∈ V, |S| = κ, and U a fine ultrafilter on [S] <κ , and S ′ ⊆ S with |S ′ | = α < κ.Then we define the restriction U S ′ of U to S ′ as follows.
For any X ∈ P([S] <κ ), let But this means that this property must also hold for fine ultrafilters on [V] <Card : Consequence 1 There are fine ultrafilters U on [V] <Card , such that for every set S with |S| = α, U S is a fine ultrafilter on [S] <α and the coherence property holds.Proof.By the same reasoning as in the previous proposition.

Details 2: defining probability functions
Now we show how for every set, a probability function on that set can be defined.The same procedure can then be used to define a probability function on V, and these probability functions are coherent.
The key is to spell out what is involved in the β-th step of the recursive procedure for defining probabilities on sets: (β) A fine ultrafilter U on [S] <ω β (with ω β = |S|), together with probability functions Pr T for all T such that |T| < ω β , yields a notion of probability Pr S on S.
As in section 2, we define a function f θ∈A such that for all T ∈ [S] <ω β : Similarly, we define a function f θ∈A∧ν∈B such that for all T ∈ [S] <ω β : This function Pr S will then be an NAP probability function in the sense of [Benci et al 2013].Now in an exactly similar way, we define a class probability function Pr + U on V, using the probability functions on 'small' classes (i.e., sets) and ultrafilters on 'small' classes which (given proposition 7) we can now assume to have been defined on the basis of an ultrafilter U on [V] <Card with which we start.The function Pr + U is total, regular, and uniform for the same reasons as why its 'smaller cousin' Pr U has these properties.
We now check coherence.We will do this only for straight probabilities rather than random variables in general, as although coherence holds for random variables also, it is much more technical to state.Below we use Pr(A) to denote Pr(ι ∈ A) where ι s the identity random variable.

Proposition 8 For any class A and sets T ⊂ S with |T| < |S| we have
Pr T (A) = Pr S (A|T).
Proof.We show by induction on |T| that that the above holds for all S ⊃ T with |S| > |T|.Strictly speaking, the range of Pr T may be a different non-archimedean field to the range of Pr S , but there is a natural embedding of the former into the latter defined by i([ f ] U T ) = [ f ] U S where for X ∈ S <|S| , f (X) = f (X ∩ T).This is well-defined as {X ∈ S <|S| : |X ∩ T| < |T|} = (R T ) S ∈ U S .
Using this embedding we have i(Pr T (A)) = i([ f A ] U T ) = [ fA ] U S .Now for X ∈ (R T ) S (∈ U S ) we have: As X ∈ (R T ) S we have |X ∩ T| < |T| so by our inductive hypothesis Pr X∩T (A ∩ T) = Pr X (A ∩ T|T) = f A∩T (X) f T (X) .
But by definition, f A∩T f T U S = Pr S (A|T), so [ fA ] U S = Pr S (A|T) and we're done.

Comparison of the finite snapshot approach and the bootstrapping approach
In our definition of the probability of a set theoretic property, the probability Pr + U (θ ∈ A) of a property A is determined by the probabilities Pr S (θ ∈ A) of A on large 'snapshots' S, where a probability Pr S (θ ∈ A) (for S a large set) is then in turn determined by the probabilities Pr S ′ (θ ∈ A for S ′ being smaller 'snapshots' than S, and so on.Conceptually, the definition in section 4.3 is superior to the simpler definition suggested from section 2: we want to take the behaviour of the property on as many and as large 'snapshots' as possible into account.
It is not straightforward to compare the simple and the more involved definition: the simple method is based on an ultrafilter on [V] <ω whereas the more involved method is based on an ultrafilter on V = [V] <Card .
The obvious suggestion is to base the comparison on the relation between a probability function determined by an ultrafilter U on [V] <Card and its restriction6 to [V] <ω defined as U ↾ ω = {X ∩ [V] <ω |X ∈ U }. But: Proposition 9 Not all ultrafilters on [V] <Card restrict to ultrafilters on to [V] <ω .Proof.Consider A ∪ [V] <ω , where A is the set of atoms (guaranteeing fine-ness) and [V] <ω is the relative complement of [V] <ω in [V] <Card .Then A ∪ [V] <ω has the finite intersection property and so can be extended to a fine ultrafilter U on [V] <Card .But ∅ ∈ U ↾ ω.So U does not restrict to an ultrafilter on [V] <ω .
On the other hand, every fine ultrafilter on [V] <Card restricting to an ultrafilter on [V] <ω essentially is an ultrafilter on [V] <ω : Proposition 10 Suppose U is a fine ultrafilter on [V] <Card restricting to an ultrafilter U ↾ ω on [V] <ω .Then [V] <ω ∈ U .Proof.Since U is ultra, we have [V] <ω ∈ U or [V] <ω ∈ U .But if [V] <ω ∈ U , then ∅ ∈ U ↾ ω, so that U does not restrict, contradicting the assumption.So [V] <ω ∈ U .This means that the essentially involved probability functions on V cannot be reduced to 'simple' probability functions on V.

Conclusion
In this article we have explored two methods for modelling, by means of non-Archimedean probability functions, the properties of random variables ranging over the set theoretic universe: the finite snapshot method and the bootstrapping method.Concerning the finite snapshot method, we found that many of the probabilistic properties that seem intuitively plausible can be satisfied.The bootstrapping method is more satisfying from a conceptual point of view, but we have only been able to show that the resulting probability functions satisfy minimal requirements.So much work remains to be done.
A β <B β } if this has the finite intersection property, or C β 1 ∪ {C A β ≥B β } otherwise, and by the claim, intersection property.At this point we must extend C − 1 by adding to C − 1 : • every set of the form C P (A)<P (B) such that C A<B ∈ C − 1 ;