On nested infinite occupancy scheme in random environment

We consider an infinite balls-in-boxes occupancy scheme with boxes organised in nested hierarchy, and random probabilities of boxes defined in terms of iterated fragmentation of a unit mass. We obtain a multivariate functional limit theorem for the cumulative occupancy counts as the number of balls approaches infinity. In the case of fragmentation driven by a homogeneous residual allocation model our result generalises the functional central limit theorem for the block counts in Ewens' and more general regenerative partitions.


Introduction
In the infinite multinomial occupancy scheme balls are thrown independently in a series of boxes, so that each ball hits box k = 1, 2, . . . with probability p k , where p k > 0 and k∈N p k = 1. This classical model is sometimes named after Karlin due to his seminal contribution [28]. Features of the occupancy pattern emerging after the first n balls are thrown have been intensely studied, see [5,18,24] for survey and references and [6,11,12,13] for recent advances. Statistics in focus of most of the previous work, and also relevant to the subject of this paper, are not sensitive to the labelling of boxes but rather only depend on the integer partition of n comprised of nonzero occupancy numbers.
In the infinite occupancy scheme in a random environment the (hitting) probabilities of boxes are positive random variables (P k ) k∈N with an arbitrary joint distribution satisfying k∈N P k = 1 almost surely. Conditionally on (P k ) k∈N , balls are thrown independently, with probability P k of hitting box k. Instances of this general setup have received considerable attention within the circle of questions around exchangeable partitions, discrete random measures and their applications to population genetics, Bayesian statistics and computer science. In the most studied and analytically best tractable case the probabilities of boxes are representable as the residual allocation (or stick-breaking) model where the U i 's are independent with beta(θ, 1) distribution 1 on (0, 1) and θ > 0. In this case the distribution of the sequence (P k ) k∈N is known as the Griffiths-Engen-McCloskey (GEM) distribution with parameter θ. The sequence of the P k 's arranged in decreasing order has the Poisson-Dirichlet (PD) distribution with parameter θ, and the induced exchangeable partition on the set of n balls follows the celebrated Ewens sampling formula [3,29,31,32]. Generalisations have been proposed in various directions. The two-parameter extension due to Pitman and Yor [29] involves probabilities of form (1) with independent but not identically distributed U i 's, where the distribution of U i is beta(θ + αi, 1 − α) (with 0 < α < 1 and θ > −α). Residual allocation models with other choices of parameters for the U i 's with different beta distributions are found in [26,33]. Much effort has been devoted to the occupancy scheme, known as the Bernoulli sieve, which is based on a homogeneous residual allocation model (1), that is, with independent and identically distributed (iid) factors U i having arbitrary distribution on (0, 1), see [2,14,20,24,25,30]. The homogeneous model has a multiplicative regenerative property, also inherited by the partition of the set of balls.
In more sophisticated constructions of random environments probabilities (P k ) k∈N are identified with some arrangement in sequence of masses of a purely atomic random probability measure. A widely explored possibility is to define a random cumulative distribution function F by transforming the path of an increasing drift-free Lévy process (subordinator) (X(t)) t≥0 . In particular, in the Poisson-Kingman model F (t) = X(t)/X(1) for a measure supported by [0, 1], see [16,29]. In the regenerative model F (t) = 1 − e −X(t) , t ≥ 0, called in the statistical literature neutral-to-the right prior [16], see [4,19,21,22].
Following [7,10,27] we shall study a nested infinite occupancy scheme in random environment. In this context we regard (P k ) k∈N as a random fragmentation law (with P k > 0 and k∈N P k = 1 a.s.). To introduce hierarchy of boxes, for each j ∈ N 0 let W j be the set of words of length j over N, where W 0 := {∅}. The set W = j∈N 0 W j of all finite words has the natural structure of a ∞-storey tree with root ∅ and ∞-ary branching at every node, where w1, w2, · · · ∈ W j+1 are the immediate followers of w ∈ W j . Let {(P (w) k ) k∈N , w ∈ W} be a family of independent copies of (P k ) k∈N . With each w ∈ W we associate a box divided in sub-boxes w1, w2, . . . of the next level. The probabilities of boxes are defined recursively by (note that the factors P (w) and P (w) k are independent). Given (P (w)) w∈W , balls are thrown independently, with probability P (w) of hitting box w. Since w∈W j P (w) = 1 the allocation of balls in boxes of level j occurs according to the ordinary Karlin's occupancy scheme.
Recursion (2) defines a discrete-time mass-fragmentation process, where the generic mass splits in proportions according to the same fragmentation law, independently of the history and masses of the co-existing fragments. The nested occupancy scheme can be seen as a combinatorial version of this fragmentation process. Initially all balls are placed in box ∅, and at each consecutive step j + 1 each ball in box w ∈ W j is placed in sub-box wk with probability P (w) k . The inclusion relation on the hierarchy of boxes induces a combinatorial structure on the (labelled) set of balls called total partition, that is a sequence of refinements from the trivial one-block partition down to the partition in singletons. The paper [15] highlights the role of exchangeability and gives the general de Finetti-style connection between mass-fragmentations and total partitions. We consider the random probabilities of the hierarchy of boxes and the outcome of throwing infinitely many balls all defined on the same underlying probability space. For j, r ∈ N, denote by K n,j,r the number of boxes w ∈ W j of the jth level that contain exactly r out of n first balls, and let be a cumulative count of occupied boxes, where ⌈ · ⌉ is the integer ceiling function. With probability one the random function s → K n,j (s) is nondecreasing and right-continuous, hence belongs to the Skorokhod space D[0, 1]. Also observe that K n,j (0) = K n,j,n is zero unless all balls fall in the same box and that K n,j (1) is the number of occupied boxes in the jth level. In [7] a central limit theorem with random centering was proved for K n,j (1) for j growing with n at certain rate. Our focus is different. We are interested in the joint weak convergence of ((K n,j 1 (s), . . . , K n,jm (s))) s∈[0,1] , properly normalised and centered, for any finite collection of occupancy levels 1 ≤ j 1 < . . . < j m as the number of balls n tends to ∞. As far as we know, this question has not been addressed so far. We prove a multivariate functional limit theorem (Theorem 2.1) applicable to the fragmentation laws representable by homogeneous residual allocations models (including the GEM/PD distribution) and some other models where the sequence of P k 's arranged in decreasing order approaches zero sufficiently fast. A univariate functional limit for (K n,1 (s)) s∈ [0,1] in the case of Bernoulli sieve was previously obtained in [2].

Main result
For given fragmentation law (P k ) k∈N , let ρ(s) := #{k ∈ N : P k ≥ 1/s} for s > 0, and N (t) := ρ(e t ), V (t) := EN (t) for t ∈ R. The joint distribution of K n,j,r 's is completely determined by the probability law of the random function ρ(·), which captures the fragmentation law up to re-arrangement of P k 's. For our purposes therefore we can make no difference between fragmentation laws with the same ρ(·).
Let T k := − log P k . Here is a basic decomposition of principal importance for what follows: where (N (k) j−1 (t)) t≥0 for k ∈ N are independent copies of N j−1 (·) which are also independent of T 1 , T 2 , . . . An immediate consequence of (4) is a recursion for the expectations which shows that V j (·) is the jth convolution power of V (·). The assumptions on fragmentation law and the functional limit will involve a centered Gaussian process W := (W (s)) s≥0 which is a.s. locally Hölder continuous with exponent β > 0 and satisfy W (0) = 0. In particular, for some a.s. finite random variable M . We set further Alternatively, the process R j can be defined via repeated integration by parts where s 1 = s. Throughout the paper D := D[0, ∞) denotes the standard Skorokhod space. Here is our main result.
Remark 2.2. The assumption 0 < ε 1 , ε 2 ≤ ω ensures that γ > 0. Furthermore, in view of (7) and the choice of γ relation (9) is equivalent to in the J 1 -topology on D. Similarly, in view of (13) given below relation (10) is equivalent to 3 Proof of Theorem 2.1
(b) Since W is a.s. continuous we conclude that, for any h > 0, To prove (16) we again use induction. When j = 1, the result holds according to (17). Assume that (16) holds for j = ℓ. Then using a decomposition analogous to (4) we obtain ℓ (y) for k ∈ N are independent and identically distributed. Then, for any ε > 0 and δ ∈ (0, c) While the second probability on the right-hand side converges to zero by the induction hypothesis, the first does due to (8). A similar but much simpler argument leads to the conclusion that The proof of Lemma 3.1 is complete.

Connecting two ways of box-counting
We retreat for a while from our main theme to focus on Karlin's occupancy scheme with deterministic probabilities (p k ) k∈N . By the law of large numbers a box of probability p gets occupied by about np balls, provided np is big enough. This suggests to relate counting the boxes occupied by at least n 1−s balls to the number of boxes with probability at least n −s . Let ρ(t) := #{k ∈ N : p k ≥ 1/t} for t > 0, and letK n,r be the number of boxes containing exactly r out of n balls. We shall estimate uniformly the difference between and (ρ(n s )) s∈[0,1] . The following result is very close to Proposition 4.1 in [2]. However, we did not succeed to apply the cited proposition directly and will combine the estimates obtained in its proof.
where y 0 ∈ (0, 1) is a constant which does not depend on n, nor on (p k ) k∈N .
Proof. For k ∈ N, denote byZ n,k the number of balls falling in the kth box, so that Then, for n ∈ N and s ∈ [0, 1], In [2] it was shown that, for n ∈ N, (ρ(en s ) −ρ(n s )) (see [2], pp. 1004-1005) and E sup (ρ(n s ) −ρ(e −1 n s )) (see [2], p. 1006). Finally, for n ∈ N, Combining the estimates we arrive at (18) because We apply next Proposition 3.2 to the setting of Theorem 2.1. This result shows that (10) is equivalent to the analogous limit relation with ρ j (n t ) = N j (t log n) replacing K n,j (t). (7) and (9). Then, for each j ∈ N, Proof. Fix any j ∈ N. By Proposition 3.2, for n ∈ N, Recall the notation c j = (cΓ(ω + 1)) j Γ(ωj + 1) , j ∈ N and our choice of γ > ω − min(1, ε 1 , ε 2 ). In view of (14), The next step is to show that As a preparation for the proof of (22) we first note that according to (15) for n ∈ N and x ≥ 1. Further, using the inequality (u + v) α ≤ (2 α−1 ∧ 1)(u α + v α ) which holds for α > 0 and u, v ≥ 0 yields and (22) follows. An appeal to (13) enables us to conclude that for large enough n Hence, by the same reasoning as above. Finally, by Lemma 3.1(b). Using (21), (22), (23) and (24) in combination with Markov's inequality (applied to the first three terms on the right-hand side of (20)) shows that the left-hand side of (20) converges to zero in probability as n → ∞. Now (19) follows by another application of Markov's inequality and the dominated convergence theorem.
Our main result, Theorem 2.1, is an immediate consequence of Proposition 3.3 and the next Theorem 3.4 which in turn follows from Propositions 3.5 and 3.6.
Proposition 3.6. Suppose (7), (8) and (9). Then, for each j ≥ 2 and each T > 0, 3.3 Proof of Proposition 3.5 We shall use an integral representation similar to (5) for j ≥ 2 and t ≥ 0. In view of (12) Skorokhod's representation theorem ensures that there exist versions N 1 and W such that for all T > 0. This implies that (27) is equivalent to where we set Z j (t, x) : ) for j ≥ 2 and t, x ≥ 0. As far as the first coordinate is concerned the equivalence is an immediate consequence of (30). As for the other coordinates, integration by parts yields, for s > 0 fixed and j ≥ 2 Observe that the first term is a counterpart of (29) in which N 1 replaces N 1 . Denoting by L(t) the first term on the right-hand side, we infer in view of (14) which implies that and (30). For j ≥ 2 and t, x ≥ 0, set Z j (t, x) := (0, x] W (y)d y (−V j−1 (t(x − y)) and note that (31) is equivalent to because the left-hand sides of (31) and (33) have the same distribution. It remains to check two properties: (a) weak convergence of finite-dimensional distributions, i.e. that for all n ∈ N, all 0 ≤ s 1 < s 2 < . . . < s n < ∞ and all integer ℓ ≥ 2 as t → ∞; (b) tightness of the distributions of coordinates in (33), excluding the first one. Proof of (34). If s 1 = 0, we have W (s 1 ) = Z j (t, s 1 ) = R k (s 1 ) = 0 a.s. for j ≥ 2 and k ∈ N.
Hence, in what follows we consider the case s 1 > 0. Both the limit and the converging vectors in (34) are Gaussian. In view of this it suffices to prove that for k, j ∈ N, k + j ≥ 3 and s, u > 0, where we set Z 1 (t, ·) = W (·) and r(x, y) := E[W (x)W (y)] for x, y ≥ 0. We only consider the case where k, j ≥ 2, the complementary case being similar and simpler.
This together with (32) leads to formula (35): Proof of tightness. Choose j ≥ 2. We intend to prove tightness of (t −ω(j−1) Z j (t, u)) u≥0 on D[0, T ] for all T > 0. Since the function t → t −ω(j−1) is regularly varying at ∞ it is enough to investigate the case T = 1 only. By Theorem 15.5 in [8] it suffices to show that for any κ 1 > 0 and κ 2 > 0 there exist t 0 > 0 and δ > 0 such that for all t ≥ t 0 . We only analyze the case where 0 ≤ v < u ≤ 1, the complementary case being analogous.
Set W (x) = 0 for x < 0. The basic observation for the subsequent proof is that (6) extends to whenever −∞ < y < x ≤ 1 for the same positive random variable M as in (6). This is trivial when x ≤ 0 and a consequence of (6) when y ≥ 0. Assume that y ≤ 0 < x. Then where the first inequality follows from (6) with y = 0. Let 0 ≤ v < u ≤ 1 and u − v ≤ δ for some δ ∈ (0, 1]. Using (37) and (14) we obtain for large enough t and a positive constant λ. This proves (36).
Relation (42) will be proved by induction in two steps.
Step 1. Assume (42), hence (38) and (28) hold for j = 2, . . . , k. We claim that in the J 1 -topology on D k . Indeed, in view of (26) and the induction hypothesis relation (43) is equivalent to The latter holds by Proposition 3.5.
Step 2. Using in the J 1 -topology on D which is a consequence of (43) we shall prove that (42) holds with j = k + 1.
To prove this, write For all T > 0, In view of (8) and (46) we can argue as in the proof of Lemma 3.1 (b) to conclude that the right-hand side of (48) converges to zero in probability as t → ∞, whence lim ℓ→∞ Z 1 (ℓ) = 0 a.s. Pick now n ∈ N such that nωδ > 1. To avoid considering separately a simpler case n = 1 we assume in what follows that n ≥ 2. Since R k is a centered Gaussian process which is selfsimilar of exponent γ + ω(k − 1) we infer, for j ∈ N, E[R k (y)] 2j = E[R k (1)] 2j y 2j(γ+ω(k−1)) =: m 2j y 2j(γ+ω(k−1)) . By Rosenthal's inequality (Theorem 3 in [34]) for a positive constant C. By Markov's inequality, for all κ > 0, In view of (8), the general term of the series is O(ℓ −nωδ ) a.s. Hence, our choice of n ensures that the series converges a.s. thereby proving (with the help of the Borel-Cantelli lemma) that lim ℓ→∞ Z 2 (ℓ) = 0 a.s. conditionally on ( T k ) k∈N , hence, also unconditionally. The proof of (47) is complete. The process B q := (B q (s)) s≥0 is a centered Gaussian process called the fractionally integrated BM or the Riemann-Liouville process. Clearly B = B 0 , and for q ∈ N the process can be obtained as a repeated integral of the BM. It is known that B q is locally Hölder continuous with any exponent β < q + 1/2 [23].

The case of homogeneous residual allocation model
Theorem 4.1. Let (P k ) k∈N be given by (1) with iid U i 's such that Proof. Let (ξ k , η k ) k∈N be independent copies of a random vector (ξ, η) with positive arbitrarily dependent components. Denote by (S k ) k∈N 0 the zero-delayed ordinary random walk with increments ξ k , that is, S 0 := 0 and S k := ξ 1 + . . . + ξ k for k ∈ N. Consider a perturbed random walkT and then defineÑ (t) := #{k ∈ N :T k ≤ t} andṼ (t) := EÑ(t) for t ≥ 0. It is clear that where, for t ≥ 0, U (t) := k≥0 P{S k ≤ t} is the renewal function andG(t) = P{η ≤ t}.
Proof. (a) A standard result of the renewal theory tells us that where a 0 is a known positive constant. The second inequality in combination withṼ (t) ≤ U (t) proves the second inequality in (51). Using the first inequality in (52) yields To obtain the first inequality in (51) it remains to note that lim t→∞ t min(d,1) (1 −G(t)) = 0 by assumption. For a proof of weak convergence see Theorem 3.2 in [2].

Corollary 4.3.
For θ > 0 let (P k ) k∈N be GEM-distributed with parameter θ, or any random sequence such that the sequence of P k 's arranged in decreasing order follows the PD distribution with parameter θ. Then in the product J 1 -topology on D[0, 1] N .

Some regenerative models
For (X(t)) t≥0 a drift-free subordinator with X(0) = 0 and a nonzero Lévy measure ν supported by (0, ∞) let ∆X(t) = X(t) − X(t−), t ≥ 0, be the associated process of jumps. The process ∆X(·) assumes nonzero values on a countable set, which is dense in case ν(0, ∞) = ∞. The transformed process (multiplicative subordinator) F (t) = 1 − e −X(t) , t ≥ 0, has the associated process of jumps In this section we identify the fragmentation law (P k ) k∈N with nonzero jumps ∆F (·)arranged in some order (for instance by decrease). Note that mutlypling the Lévy measure by a positive factor corresponds to a time-change for F , hence does not affect the derived fragmentation law. We shall assume that the Lévy measure ν is infinite and has the right tail ν([x, ∞)) satisfying for small enough x > 0 and some q, c 0 , α 0 , α 1 > 0, 1/2 < r 1 , r 2 ≤ q + 1 and β 0 , β 1 < 0. x 2 ν(dx) < ∞.
Theorem 5.1 applies to the gamma subordinator with the Lévy measure and to the subordinator with where θ, λ > 0. In both cases s 2 < ∞ and (55) holds with c 0 = q = r 1 = r 2 = 1. Theorem 5.1 is a consequence of Theorem 2.1, the easily checked formula (u − y) j+q−1 dB(y), u ≥ 0, j ∈ N, q > 0 and the next lemma.
This proves the second inequality in (56). Arguing analogously we obtain thereby proving the first inequality in (56).