Functional central limit theorems for occupancies and missing mass process in infinite urn models

We study the infinite urn scheme when the balls are sequentially distributed over an infinite number of urns labelled 1,2,... so that the urn $j$ at every draw gets a ball with probability $p_j$, $\sum_j p_j=1$. We prove functional central limit theorems for discrete time and the poissonised version for the urn occupancies process, for the odd-occupancy and for the missing mass processes extending the known non-functional central limit theorems.


Introduction
In this paper we study the following classical urn model first considered by Karlin [12]: n ≥ 1 balls are distributed one by one over an infinite number of urns enumerated from 1 to infinity. The ball distributed at step j = 1, 2 . . . , call it jth ball, gets into urn i with probability p i , ∞ i=1 p i = 1, independently of the other balls. Such multinomial occupancy schemes arrise in many different applications, in Biology [11], Computer science [13], [14] and in many other areas, see, e.g., [10] and the references therein.
Let X j be the urn the jth ball gets into and let J i (n) be the number of balls the ith urn contains after n balls are distributed: We are particularly interested in the asymptotic distribution of the number of urns containing at least k ≥ 1 balls and containing exactly k balls: of the number of urns with an odd number of balls and the asymptotic behaviour of the missing mass: We also use notation R n def = R * n,1 = k≥1 R n,k for the number of non-empty urns. Renumbering the urns if necessary, we further assume that the sequence (p i ) i≥1 is monotonely decaying and regularly varying, namely, where L(x) is a slowly varying function as x → ∞.
Alongside the the discrete time model, we will also consider its continuous time analogue when the balls are put into urns at the times of jumps of a homogeneous Poisson point processes Π(t), t ≥ 0 with intensity 1 on R + . According to the independent marking theorem for Poisson processes, , t ≥ 0} are independent homogeneous Poisson processes with intensities p i . To ease the notation, we will write simply This paper extends the results of [7] and [6], where a functional central limit theorem (FCLT) was shown under condition (3) for the vector process 1] in the case θ ∈ (0, 1]. Ordinary (not functional) central limit theorems for the above quantities were established under various conditions in [2], [3], [9], [10], [12], [13], [14]. In particular, under rather general conditions on the sequence (p i ) involving an unbounded growth of the variances, the following results available: a strong law of large numbers and asymptotic normality of R n , an asymptotic normality of the vector (R n,1 , . . . , R n,ν ), local limit theorems, etc. We acknowledge a novel method of a randomised decomposition for proving FCLTs for the processes of our kind developed in a recent paper [8], but we do not use it here.
We establish a FCLT for the odd-occupancy process and for the missing mass process when θ > 0. Extending FCLT to the case θ = 0 would require additional to (3) conditions. As it was mentioned in [12] and in [2], θ = 0 does not imply that the variances grow to infinity and various asymptotic behaviour is possible for different statistics. We also argue that even an infinite growth of variances does not guarantee per se the required relative compactness.
When θ = 1, we need a function It is known (see [12]) that L * (x) is slowly varying when x → ∞. Finally, for t ∈ [0, 1] introduce the following notation: We are now ready to formulate the main result of the paper.
Theorem 1. When θ ∈ (0, 1], the vector process converges weakly in the uniform metric on D((0, 1) 3 ) to a 3-dimensional Gaussian process with zero mean and the covariance function C(τ, t)with the following components: when θ ∈ (0, 1), τ ≤ t, When θ = 1, τ ≤ t, C(τ, t) is given by 2 Proof of Theorem 1 We start with formulating a couple of lemmas proved in [7]. We will generally use the letter C and its variants to denote a constant whose value is of no importance for us and note in parentheses the parameters it depends upon. This should not lead to a confusion when the same notation is used for, actually, different constants in different contexts, the same way O(1) notation is used.
Lemma 3. For any ε, δ ∈ (0, 1) there exists an N = N(ε, δ) such that for any n ≥ N, In preparation of the proof, let us introduce some further notation and establish a few inequalities we will be using.
In view of (5), let For any two positive τ 1 ≤ τ 2 , define their expectations are denoted by Similarly for M(t), write Clearly, for all natural k, Similarly, As a result, We are using the same notation u i , m i and u i , m i without explicitly specifying the corresponding values of τ 1 < τ 2 , this should not create a confusion. The following lemma will be used in the proof of a relative compactness of the process M * n (t).
Step 1: Covariance. The first rather technical step consists in establishing a formulae for the covariances which is put in Appendix.
Step 2: Convergence of finite-dimensional distributions. Along the lines of the proof of [9,Th. 12], one can show m ≥ 1, 0 < t 1 < t 2 < . . . < t m ≤ 1 the triangular array of m-dimensional vectors (independent in k for every n) satisfies the Lindeberg condition (see, e.g., [5,Th. 6.2]). Similarly, the convergence of the finite-dimensional distributions is shown for the process M * n (t).
Step 3: Relative compactness. We shall follow the following plan: (a) prove the continuity of the limiting process; (b) prove that U * n (M * n ) and U * * n (M * * n ) are sufficiently close; (c) prove the relative compactness of U * * n (M * * n ).
Since the covariance function has a limit, [1, Th. 1.4] will imply that the limiting Gaussian process a.s. has a continuous modification on [0, 1].
Since the trajectories of the limiting Gaussian process belong a.s. to the class C(0, 1), then the weak convergence in the Skorohod topology implies the weak convergence in the uniform metric, see, e.g., [4]. Therefore, it is sufficient to prove the relative compactness of {U * n } n≥n 0 (with n 0 as in Lemma 2) in the Skorohod topology. b(U) Since with probability one we have Hence, for all η > 0, Recall the Rosenthal inequality [15]: if ϕ i are independent random variables with E ϕ i = 0, then for all k ≥ 2 there exists a constant c(k) such that For all n ≥ n 0 (with n 0 as in Lemma 2) we then have where c(k), C(k) and C(θ) depend only on their arguments. Above, we have used (11) in the first inequality, (8) in the second and finally, (10) and Lemma 2 alongside with the bound If t 2 − t 1 ≥ 1/n, then there are the following three cases:

then the Cauchy-Schwarz inequality implies
the same inequality yields Now the relative compactness follows from, e.g., [4,Th. 13.5].
a(M) Because the covariance function has a limit, it is sufficient to appeal to Lemma 4 and [1, Th. 1.4] to establish existence of an almost sure continuous on [0, 1] modification of the limiting Gaussian process. Since the trajectories of this process are a.s. in C(0, 1), then the weak convergence in the Skorohod topology implies the uniform convergence, see [4]. Thus it is sufficient to prove a relative compactness of the family {M * n } n≥n 0 in the Skorohod topology (here n 0 is the same as in Lemma 2). b(M) Set τ 2 = nt and τ 1 = [nt]. Since τ 2 − τ 1 ≤ 1, then Let m ′′′ i = m ′′ i (τ 1 , τ 1 + 1) and m ′′′ i = E m ′′′ i . Then we have almost surely, We know that for any integer k ≥ 2 Using the independence of the terms and Rosenthal inequality, for any k ≥ 2, Hence, for k ≥ [2/θ] + 1 and all η > 0 Therefore, it is sufficient to show the local compactness of {M * * n } n≥n 0 in the Skorohod topology. Again, by independence and the Rosenthal inequality, where c(k), C(k) and C(θ) depend only on their arguments.
Above, we have used inequalities (9), (10) and Lemmas 4, 2 alongside with the bound When t 2 − t 1 ≥ 1/n, we have the following three cases: , then since for any l ≥ 2, the Cauchy-Schwarz inequality yields the bound , is similar to the previous case. Thus the required compactness follows from [4, Th. 13.5].
Acknowledgements. MC's research is supported by RSF Grant 17-11-01173. He also acknowledges hospitality of Chalmers university where a part of this work has been done. The authors are thankful to Sergey Foss for his interest in this research and valuable comments. Because β(nt) β(n) → t θ when n → ∞, for θ ∈ (0, 1) we have that