Uniquely determined uniform probability on the natural numbers

In this paper, we address the problem of constructing a uniform probability measure on $\mathbb{N}$. Of course, this is not possible within the bounds of the Kolmogorov axioms and we have to violate at least one axiom. We define a probability measure as a finitely additive measure assigning probability $1$ to the whole space, on a domain which is closed under complements and finite disjoint unions. We introduce and motivate a notion of uniformity which we call weak thinnability, which is strictly stronger than extension of natural density. We construct a weakly thinnable probability measure and we show that on its domain, which contains sets without natural density, probability is uniquely determined by weak thinnability. In this sense, we can assign uniform probabilities in a canonical way. We generalize this result to uniform probability measures on other metric spaces, including $\mathbb{R}^n$.


Introduction and main results
Within the bounds of the Kolmogorov axioms [7], a probability measure on N = {1, 2, 3, ...} cannot assign the same probability to every singleton and therefore, a uniform probability measure on N does not exist. Despite this, we have some intuition about what a uniform probability measure on N should look like. According to this intuition, for example, we would assign probability 1/2 to the subset of all odd numbers. If we want to capture this intuition in a mathematical framework, we have to violate at least one of the axioms of Kolmogorov. A suggestion by De Finetti [5] is to relax countable additivity of the measure to finite additivity. To see why this suggestion is reasonable, we must first understand why it is possible, within the axioms of Kolmogorov, to set up a uniform probability measure on [0, 1], namely Lebesgue measure [8]. The type of additivity we demand plays a crucial role here. In the standard theory one always demands countable additivity. If every singleton has the From a probability pair (F , µ) on ([0, ∞), M) we can immediately derive a corresponding probability pair (F ′ , µ ′ ) on (N, P(N)) given by (1. 2) It turns out that by working on ([0, ∞), M) instead of (N, P(N)), we can formulate and prove our claims much more elegantly. We should emphasize, however, that conceptually there is no difference between ([0, ∞), M) and (N, P(N)) and that the work we do in Sections 2 and 3 can be done in the same way for (N, P(N)).
For which is also known as the Cesàro limit. Schurz and Leitgeb [10] propose the following uniform probability measure on ([0, ∞), M). With the Hahn-Banach Theorem [4, p. 78] we can extend the limit operator on the subspace of convergent functions in L ∞ ([0, ∞)), to a linear operator HB-lim that is bounded by the limsup operator and is defined on L ∞ ([0, ∞)). Then λ * (A) := HB-lim(ρ A ) (1.6) is a probability measure defined on M. Also a hyperreal variant of this probability measure, that is not only finitely additive but hypercountably additive, has been proposed [13]. The probability measure λ * is defined for every element of M, but is not unique: for every A ∈ M \ C there are infinitely many different Hahn-Banach extensions of the limit operator that give different values for λ * (A). This can be seen as follows. Let W be the linear subspace of L ∞ ([0, ∞)) consisting of convergent functions and let A ∈ M. Choose any V ∈ R such that lim inf(ρ A ) ≤ V ≤ lim sup(ρ A ). Then consider the linear operator L : {f + cρ A : f ∈ L, c ∈ R} → R given by L(f + cρ A ) := lim(f ) + cV . Note that L is dominated by the lim sup operator. Apply the Hahn-Banach Theorem to extend L to a linear operator L ′ on L ∞ ([0, ∞)). Then L ′ extends the limit operator and L ′ (ρ A ) = V .
In this paper, we avoid arbitrariness like that in λ * by constructing a canonical uniform probability measure on ([0, ∞), M), which we then generalize to canonical uniform probability measures on other spaces (see Section 4). How do we make such a canonical choice? Let us again look at what happens with Lebesgue measure on [0, 1]. Lebesgue measure is unique in the sense that every member A of the Lebesgue σ-algebra on [0, 1] is assigned the same probability by every probability measure (in the Kolmogorov sense) on [0, 1] that is translation invariant and at least measures all the intervals in [0, 1] [2]. On ([0, ∞), M) we want to do something similar. We want to define 'uniformity' by some property P of probability pairs on ([0, ∞), M). Once we have this property, we want that our probability pair does not only have property P , but also is unique with respect to P in the sense of Definition 1.3. Definition 1.3 Let P be some property of probability pairs on (X, H). The probability pair (F , µ) on (X, H) has unique values with respect to P if it has property P and for every A ∈ F and probability pair (F ′ , µ ′ ) on (X, H) with property P and A ∈ F ′ it is true that µ(A) = µ ′ (A).
Notice that having unique values with respect to P is stronger than uniqueness in the sense that there is no other measure pair with the same f -system that has property P . The latter says only something about other measure pairs with the same f -system, while having unique values says something about other measure pairs with an f -system that intersect the fsystem. We want to have unique values because we want to avoid the situation that we have some set A in our f -system, for which there is another measure pair with property P assigning a different probability to A. In that scenario, we would not have any ground to decide what the unique probability of A is.
The structure of this paper is as follows. In Section 2, we discuss the property of being a 'weakly thinnable pair' (WTP) and motivate why this is a natural choice for P . In Section 3, we explicitly construct a measure pair (A uni , α) such that and The expression in (1.8) is called the logarithmic density of A [11, p. 272]. We give the precise definition of A uni in Section 3. We also present the following theorem about (A uni , α), which is the main result of our paper. On the basis of Theorem 1.4, we propose α as our canonical choice for a uniform probability measure on ([0, ∞), M). In Section 4, we derive from α canonical uniform probability measures on certain metric spaces including Euclidean space (the collection of subsets we work on in these spaces is specified in Section 4). The proofs of the results in Sections 2-4 are given in Section 5.
There are some small notational remarks. We write N 0 := {0, 1, 2, ...}. For real-valued sequences x, y or real-valued functions x, y on [0, ∞) we write in Sections 2 and 3, every time we speak of an f -system, probability pair or probability measure we omit specifying that this is on ([0, ∞), M). From a purely mathematical standpoint, we can choose any P to define uniformity, but we want to make a natural choice that somehow corresponds to our intuition about uniformity. Furthermore, we want to choose P in such a way that we can find a measure pair that has unique values with respect to P and has a 'big' f -system.
Let us look at natural options for P . If we have A ∈ M, the average ρ A (x) represents the uniform probability of A ∩ [0, x) on [0, x). We would like to view the uniform probability measure on ([0, ∞), M) as the 'limit' of these uniform measures for x → ∞. In other words, a uniform probability measure should assign to elements of M with a natural density their natural density. So, it seems reasonable to let P be the property that for a probability pair (F , µ) we have that C ⊆ F and µ extends the Cesàro limit. In that case, (C, λ) is a probability pair that has unique values with respect to P . In fact, no measure pair with an f -system containing anything outside C, has unique values with respect to P .
Notice that in our approach, it suffices that the domain of a measure is an f -system. Consequently, there is no need to extend λ to M as Schurz and Leitgeb do, since C is already an f -system. The probability measure λ would be the canonical choice for a uniform probability measure on ([0, ∞), M), if we choose P as above. But we can do better. We introduce a new property P that allows for a probability pair that has unique values with respect to P , and has an f -system which is bigger than C.
For Lebesgue measurable Y ⊆ R with 0 < m(Y ) < ∞ the uniform probability measure on Y is given by We want to generalize this property to a property of probability pairs on ([0, ∞), M). Write . The map f A gives a one-to-one correspondence between A and [0, ∞). If A ∈ M * and B ∈ M, we want to introduce notation for the set Note that if A, B ∈ M, then A • B ∈ M and that for A ∈ M * we have We can view this operation as thinning A by B because we create a subset of A, where B is 'deciding' which parts of A are removed. We also can view the operation A • B as thinning out B over A, since we 'spread out' the set B over A.
Let (F , µ) be a probability pair and let A ∈ F ∩ M * . If B ∈ M, the set A • B is the subset of A corresponding to B. We can use this to transform µ into a measure on A as follows. We set (2.7) Given B ∈ F such that A • B ∈ F , analogous to (2.2), we insist that is a necessary condition for µ to be uniform. Using (2.7) this translates into We now have the restriction that A ∈ F ∩ M * . However, if A ∈ F \ M * , then µ(A • B) ≤ µ(A). Clearly, we want that a uniform probability measure assigns probability zero to a set of finite Lebesgue measure. Hence, µ(A) = 0 if µ is uniform and (2.9) still holds. We call the property that (2.9) holds for every A, B ∈ F with A • B ∈ F the thinnability of µ.
In Appendix A, we show that thinnability can only be achieved on probability pairs with relatively small f -systems. Therefore, we choose to use a weakened version of thinnability in this paper. Definition 2.2 Let (F , µ) be a probability pair. The probability measure µ is weakly thinnable if for every A ∈ F and C ∈ C such that C • A ∈ F .
If A ∈ F and C ∈ C, we want to use (2.10) to determine the probability of C •A. In other words: we want that C •A ∈ F if C ∈ C and A ∈ F . We say that F is closed under weak thinning if this holds. In particular, closedness of F under weak thinning implies that C ⊆ F , since [0, ∞) ∈ F and C • [0, ∞) = C for every C ∈ C.
Besides weak thinnability, there is another property that we want to include in the new approach of 'uniformity'. Let (F , µ) be a probability pair. Let A, B ∈ F and suppose it is true for every Since this inequality is true for every x, it is clear that B is 'sparser' than A. Therefore, we insist that µ(A) ≥ µ(B) is a necessary condition for µ to be uniform. We call this property 'preserving ordering by ρ' since (2.11) can trivially be rewritten as Since we have C ⊆ F , it seems natural to also ask µ C = λ, but it turns out to be sufficient to ask the weaker property that µ([c, ∞)) = 1 for every c ∈ [0, ∞). So, to reduce redundancy we require the latter and then prove that µ C = λ. Putting everything together, we obtain the following definition.
We start to state that the conditions of Definition 2.3 are sufficient to obtain extension of the Cesàro limit. In fact, we get the following slightly stronger result. (2.12) As motivated, we choose being a WTP to be the property P that defines uniformity. In the next section, we introduce an important WTP.

The pair (A uni , α)
and It is easy to check that (W uni , κ) is a probability pair. For any A ∈ M, we set and α : A uni → [0, 1] given by It is easy to check that A ∈ C, but is a probability pair follows directly from the fact that (W uni , κ) is a probability pair. The pair (A uni , α) is also a WTP.
We do not only want that a uniform probability pair is a WTP, but that it has unique values with respect to being a WTP. Moreover, the f -system A uni is maximal in the following sense.
Theorem 3.4 Let (F , µ) be a probability pair that has unique values with respect to being a WTP. Then F ⊆ A uni .

Generalization to metric spaces
In this section we use (A uni , α) to derive canonical probability measures on a class of metric spaces. Of course one could also try to construct such a probability measure by working more directly on these metric spaces, instead of constructing a derivative of (A uni , α). Since probability pairs on ([0, ∞), M), motivated from the problem of a uniform probability measure on N, is the priority of this paper, we do not make such an effort here.
Let us first sketch the idea of the generalization. Let A ∈ M. Whether A is in A uni depends completely on the asymptotic behavior of ρ A (Lemma 5.2). If A ∈ A uni , then also α(A) only depends on the asymptotic behavior of ρ A (Lemma 5.2). Now suppose that on a space X, we have a subset B ⊆ X and we can somehow define a functionρ B : [0, ∞) → [0, 1] that represents the 'density' of B. Then, intuitively, ifρ B is asymptotically equivalent with ρ A , then the probability of B should be α(A). The goal of this section is to make this idea precise.
Let (X, d) be a metric space. For x ∈ X and r ≥ 0, write Definition 4.1 We say that a Borel measure ν on X is uniform if for all r > 0 and x, y ∈ X we have 0 < ν(B(x, r)) = ν(B(y, r)) < ∞.
On R n with Euclidean metric, the standard Borel measure as obtained by assigning to products of intervals the product of the length of those intervals, is a uniform measure. In general, on normed locally compact vector spaces, the invariant measure with respect to vector addition, as given by the Haar measure, is a uniform measure.
A result by Christensen [3] tells us that uniform measures that are Radon measures are unique up to multiplicative constants on locally compact metric spaces. This, however, does not cover all cases. The set of irrational numbers, for example, is not locally compact, but the Lebesgue measure restricted to Borel sets of irrational numbers is a uniform measure and unique up to a multiplicative constant. We give a slightly more general version of the result of Christensen.
Proposition 4.2 If ν 1 and ν 2 are two uniform measures on X, then there exists some c > 0 such that ν 1 = cν 2 .
Proposition 4.2 gives us uniqueness, but not existence. To see that there are metric spaces without a uniform measure, consider the following example. Let X be the set of vertices in a connected graph that is not regular. Let d be the graph distance on X. If we suppose that ν is a uniform measure on X, from (4.2) with r < 1 it follows that for some C > 0 we have ν({x}) = C for every x ∈ X. But then ν(B(x, 2)) = C(1 + deg(x)) for every x ∈ V , which implies (4.2) cannot hold for r = 2 since the graph is not regular. A characterization of metric spaces on which a uniform measure exist, does not seem to be present in the literature.
We now assume X has a uniform measure ν and that ν(X) = ∞. In addition to that, we write h(r) := ν(B(x, r)) for r ≥ 0 and assume that which is equivalent with amenability in case (X, d) is a normed locally compact vector space [12]. For the importance of this assumption, see Remark 4.4 below. Set . Let (F , µ) be a probability pair on (X, L(X)). Suppose that A ∈ L(X) and that there is a B ∈ A uni such that ρ B (u) ∼ρ A (u). This means that the density of A in X asymptotically behaves the same as the density of B in [0, ∞). Since we already have a canonical uniform probability measure that assigns a probability to B, we insist that A ∈ F with µ(A) = α(B) is a necessary condition for µ to be uniform. As motivated, we choose being an EP as the property that defines 'uniformity' for probability pairs on (X, L(X)).
Define (4.10) We now give the analogue of Theorem 1.4.
Theorem 4.6 The pair (A uni (X), α X ) is an EP that has unique values with respect to being an EP. In addition, if an EP (F , µ) has unique values with respect to being an EP, then F ⊆ A uni (X).
So α X gives us a canonical uniform probability measure on (X, L(X)). In case of Euclidean space, we have the following expression for (A uni (X), α X ).
Proposition 4.7 Suppose X = R n and d is Euclidean distance. Let σ be the surface measure on the unit sphere in R n . Then for A ∈ L(R n ) we can replacē ξ A (D, x) in (4.9) and (4.10) by where K A : [0, ∞) → [0, 1] is given by (4.12)

Proofs
First we show that every f -system of a WTP is closed under translation and that every probability measure of a WTP is invariant under translation. Proof Let (F , µ) be a WTP. Let A ∈ C and c ∈ [0, ∞). Set B := [c, ∞). We have B ∈ C and by M1 we find that µ(B) = 1. Therefore, We give the proof of Propositon 2.4.
Proof of Propositon 2.4 Let (F , µ) be a WTP and A ∈ F . Set u := lim sup x→∞ ρ A (x). If u = 1 there is nothing to prove, so assume u < 1. Let ǫ > 0 be given. Let u ′ ∈ [0, 1] ∩ Q such that u ′ > u and u ′ − u < ǫ. The idea is to construct a Y ∈ M such that we can easily see that µ(Y ) = u ′ and ρ A (x) ≤ ρ Y (x) for all x, so that with M3 we get µ(A) ≤ u ′ . First we observe that there is a K > 0 such that for all x ≥ K we have ρ A (x) ≤ u ′ . We can write u ′ as u ′ = p q for some p, q ∈ N 0 with p ≤ q. Now we introduce the set Y given by Note that Y ∈ C ⊆ F . Lemma 5.1 and the fact that µ is a measure, gives us that Letting ǫ ↓ 0 we find µ(A) ≤ u = lim sup x→∞ ρ A (x).
By applying this to A c we find

⊓ ⊔
We are ready to give the proof of Theorem 3.2.

Proof of Theorem 3.2 Notice that any intersection of f -systems satisfying F
is again an f -system satisfying F. Therefore, if we show that (A s,f , α s,f ) is a WTP for every (s, f ) ∈ P, it follows from Lemma 5.2 that (A uni , α) is a WTP.
Let (s, f ) ∈ P. It immediately follows that (A s,f , α s,f ) is a probability pair and that M1 and M3 hold, so we have to verify F and M2. Note that for every A, B ∈ M and x > 0 we have Let A ∈ C and B ∈ A s,f . Then If λ(A) > 0, then we see that  Proof Let A ∈ M and fix C > 1.
Step 1 We show that (5.36) Step 2 We give an upper and lower bound for We now observe that The fact that log(1 + y) ≤ y for every y ≥ 0, combined with (5.39), (5.40) and (5.41) gives Step 3 We combine Step 1 and Step 2 to finish the proof. Set  A).
Analogously, we find that Combining (5.45) and (5.46) we obtain We also need the following lemma. The idea is to introduce a set B ∈ M for which we have lim sup  Note that Z 1 , Z 2 , Z 3 ∈ C are pairwise disjoint. Now, we set Observe that for j ≥ 3 So we constructed a set A ′ that on each interval [2 j−1 , 2 j ) with j ≥ 3 has an average that equals the average of the averages of A on two consecutive intervals. By weak thinnability we find that µ( j) is convergent or only oscillates a little, we can give a good upper bound of µ(A) using Lemma 5.4. Applying this strategy not only for C = 2 but for any C > 1 and averages of not only two but arbitrarily many averages on consecutive intervals, is what happens in the proof.
Step 1 We construct aÂ ∈ F . Fix C > 1 and n ∈ N. We split up [C j−1 , C j ) into intervals of length 1 plus a remainder interval for every j. Set for j ∈ N N j := C j−1 (C − 1) (5.57) and for j ∈ N and l ∈ {1, ..., N j } for the remainder interval, so that for every j ∈ N we have I(j, l). and that (5.62) guarantees that such that Z(p, k) ∈ C and m(Z(p, k) ∩ I p,k (u + j)) = C n−p+j−1 (C − 1). (5.68) for every j ∈ N. From this it directly follows that λ(Z(p, k)) = C n C p+u . Observe that all the Z(p, k) are disjoint. So the closedness of F under weak thinning and disjoint unions implies thatÂ ∈ F .
Step 2 We give an upperbound for µ(A) by first giving an upperbound for µ(Â) and then relating µ(A) and µ(Â).
A crucial property ofÂ is that for Step 3 We take limits in (5.75). Unfix n and C. We first take the limit superior for n → ∞ in (5.75), giving (5.76) Then we take the limit superior for C ↓ 1 and find by Lemma 5.3 that The lower bound we can now easily obtain by applying our upper bound for the complement of A. Doing this, we see that Clearly, we can find m, l ∈ N ∞ such that ξ A (s mn , f mn ) tends to I and ξ A (s ln , f ln ) tends to S. Now set s ′ n := s mn , f ′ n := f mn , s ′′ n := s ln and f ′′ n := f ln .
⊓ ⊔ Without any further additional notation or lemma, the proofs of results in Section 4 are given below.
Proof of Proposition 4.2 We give a proof along the lines of Mattila [9, p. 45], with small adaptations for completeness and more generality.
First let A be an open set of (X, d) with ν 1 (A) < ∞ and ν 2 (A) < ∞. Suppose that r > 0 is such that h 2 is continuous in r. Then B(x, r)) is a continuous mapping from X to [0, ∞). Since h 2 is nondecreasing, it can have at most countable many discontinuities. So we can choose r 1 , r 2 , r 3 , ... such that lim n→∞ r n = 0 and h 2 is continuous in every r n .
For n ∈ N let f n : X → [0, 1] be given by Notice that by our previous observation f n is continuous on A, hence f n is measurable.
Because A is open, we have lim n→∞ f n (x) = 1 for every x ∈ A.
With Fatou's Lemma we find Note that any uniform measure is σ-finite. Applying Fubini's theorem we obtain (5.82) By interchanging ν 1 and ν 2 we get Now let A be any open set of (X, d). Let x ∈ X and set A n := A ∩ B(x, n) for n ∈ N. Note that A n is open with ν 1 (A n ) ≤ ν 1 (B(x, n)) < ∞ and ν 2 (A n ) ≤ ν 2 (B(x, n)) < ∞. Hence, by the first part of the proof, we find ν 1 (A n ) = cν 2 (A n ). But then (5.86)
(5.89) By (4.3), it follows that Let (F , µ) be an EP and let A ∈ A uni (X). For n ∈ N set u n := r − (n) and v n :=ν(A ∩ B(o, u n )). Then set w 1 := v 1 and w n := v n − v n−1 for n ≥ 2. Define [n − w n , n).  Now observe that by partial integration Since |ζ A (D, x)| ≤ 1 log(D) , the desired result follows.

Discussion
The natural analogue of an σ-algebra in finite additive probability theory is an algebra. Both Schurz and Leitgeb [10] and Wenmackers and Horsten [13] remark that the restriction of M to C is problematic since C is not an algebra. However, any collection extending C that is not M itself, is not an algebra since a(C) = M. This can be seen as follows. Let A ∈ M and set This observation bring us to the conclusion that the requirement of an algebra, despite the fact that an algebra is the natural analogue of an σalgebra, is too restrictive. Furthermore, finite additivity only dictates how a probability measure behaves when taking disjoint unions, and thus only suggests closedness under disjoint unions. Therefore, we think the requirement of an f -system rather than an algebra in Definition 1.1 is justified.
It should be noted that even if one prefers an algebra as the domain of a probability measure, hence M, our argument from Section 2 that being a WTP defines uniformity for a probability pair, remains to have implications. The probability measure λ * from (1.6), for example, is in general not a WTP. This is because the probability of A ∈ A uni \ C is uniquely determined by Theorem 3.3, whereas λ * (A) may be any value between lim inf(ρ A ) and lim sup(ρ A ). So λ * is not uniform for every Hahn-Banach extension of the limit operator. If HB-lim is a multiplicative Hahn-Banach extension (dominated by the limsup is a WTP (this can be proven completely analogous to the proof of Theorem 3.2). It might strike as a serious problem that in Section 2, we motivated thinnability as a natural property of a uniform probability measure, while the proposed probability measure α is only weakly thinnable, but not thinnable (appendix A). A thinnable probability measure on A uni , however, does not exist (appendix A). Since we are not looking for a strong P , but for a P that leads to a canonical probability pair with a 'big' f -system, we prefer weak thinnability. It is for the same reason we, for example, do not require a probability measure to be countably additive: although this is in combination with M1 an a priori justifiable P , a measure pair with property P does not even exist.
There may, of course, be another P for which there is a probability pair which has unique values with respect to P and has an bigger f -system than A uni . Choosing such a P , however, requires a justification for using P as the definition of 'uniformity'. At this point, we can not see any convincing motivation for such a property.
A typical example of a set in M that does not have natural density, but is assigned a probability by α, is [e 2n , e 2n+1 ), (6.3) for which we have α(A) = 1/2. It is, however, unclear how 'many' of such sets there are, i.e. how much 'bigger' the f -system A uni is than C and how much 'smaller' it is than M. If we could construct a uniform probability measure on M by the method of Section 4, we could determine the probability of A uni if A uni ∈ A uni (M). To construct such a probability measure, we need to equip M with a metric d such that (M, d) has a uniform measure. It is, however, not at all clear how we should choose d. So at this point, it is not clear if there is a useful way of measuring the collections C and A uni .