Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models

Chebunin, Mikhail; Zuyev, Sergei

doi:10.1007/s10959-020-01053-6

Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models

Open access
Published: 23 November 2020

Volume 35, pages 1–19, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Theoretical Probability Aims and scope Submit manuscript

Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models

Download PDF

1576 Accesses
4 Citations
Explore all metrics

Abstract

We study the infinite urn scheme when the balls are sequentially distributed over an infinite number of urns labeled 1,2,... so that the urn j at every draw gets a ball with probability $p_j$, where $\sum _j p_j=1$. We prove functional central limit theorems for discrete time and the Poissonized version for the urn occupancies process, for the odd occupancy and for the missing mass processes extending the known non-functional central limit theorems.

Functional Limit Theorems for the Pólya Urn

Article 18 August 2021

Limit behavior of the q-Pólya urn

Article 04 February 2022

A generalized urn with multiple drawing and random addition

Article 02 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we study the following classical urn model first considered by Karlin [12]: $n\ge 1$ balls are distributed one by one over an infinite number of urns enumerated from 1 to infinity. The ball distributed at step $j=1,2\dots $, call it jth ball, gets into urn i with probability $p_i$, $\sum _{i=1}^\infty p_i=1$, independently of the other balls. Such multinomial occupancy schemes arise in many different applications, in Biology [11], Computer science [13, 14] and in many other areas, see, e.g., [10] and the references therein.

Let $X_j$ be the urn the jth ball gets into and let $J_i(n)$ be the number of balls the ith urn contains after n balls are distributed:

$$\begin{aligned} J_i(n)=\sum _{j=1}^{n} \mathbb {1}_{X_j=i}. \end{aligned}$$

Of a particular interest is the asymptotic behavior of the following quantities: the number of urns containing at least $k\ge 1$ balls and containing exactly k balls:

$$\begin{aligned} R^{*}_{n,k}=\sum _{i=1}^{\infty } \mathbb {1}_{J_i(n)\ge k},\quad R_{n,k}=\sum _{i=1}^{\infty } \mathbb {1}_{J_i(n)= k}=R^{*}_{n,k}-R^{*}_{n,k+1}, \end{aligned}$$

(1)

the number of urns with an odd number of balls and the scaled missing mass introduced in [12]:

$$\begin{aligned} U_n=\sum _{i=1}^\infty \mathbb {1}_{J_i(n)\equiv 1\ (\mathrm {mod} \ 2)}, \quad M_n=n \sum _{i=1}^\infty p_i \mathbb {1}_{J_i(n)=0}, \end{aligned}$$

(2)

We also use notation $R_n{\mathop {=}\limits ^{\mathrm{def}}}R^{*}_{n,1}=\sum _{k\ge 1} R_{n,k}$ for the number of non-empty urns. Renumbering the urns if necessary, we may assume that the sequence $(p_i)_{i\ge 1}$ is monotonely decaying. We further assume that it is regularly varying:

$$\begin{aligned} \alpha (x)=\max \{i:\ p_i\ge 1/x\}=x^{\theta } L(x) \ \text {with} \theta \in [0,1], \end{aligned}$$

(3)

where L(x) is a slowly varying function as $x\rightarrow \infty $.

Following Karlin’s [12] original approach, we will consider a Poissonized version of the model when the balls are put into urns at the times of jumps of a homogeneous Poisson point processes $\Pi (s), s\ge 0$ with intensity 1 on $\mathbb {R}_+$. According to the independent marking theorem for Poisson processes, $\{J_i(\Pi (s)){\mathop {=}\limits ^{\mathrm{def}}}\Pi _i(s), \ s\ge 0\}$ are independent homogeneous Poisson processes with intensities $p_i$. To ease the notation, we write simply

$$\begin{aligned} R(s){\mathop {=}\limits ^{\mathrm{def}}}R^{*}_{\Pi (s),1},\ U(s){\mathop {=}\limits ^{\mathrm{def}}}U_{\Pi (s)}, \end{aligned}$$

and we introduce the following Poissonized version of the scaled missing mass:

$$\begin{aligned} M(s){\mathop {=}\limits ^{\mathrm{def}}}s \sum _{i=1}^\infty p_i \mathbb {1}_{\Pi _i(s)=0}. \end{aligned}$$

It differs from $M_{\Pi (s)}$ by the scaling factor s vs. $\Pi (s)$, but, when properly scaled, it is asymptotically equivalent to it.

Ordinary (not functional) central limit theorems for the above quantities were established under various conditions in [2, 3, 9, 10, 12,13,14]. In particular, under rather general conditions on the sequence $(p_i)$ involving an unbounded growth of the variances, the following results are available: a strong law of large numbers and asymptotic normality of $R_n$, an asymptotic normality of the vector $(R_{n,1},\dotsc ,R_{n,\nu })$, local limit theorems, etc.

We acknowledge a novel method of a randomized decomposition for proving FCLTs developed in a recent paper [8], but we do not use it here. As a particular case of their Theorem 2.3, a FCLT holds for the processes $R_n$ and $U_n$ when $\theta \in (0,1)$.

Our goal here is to establish a FCLT for the triplet of processes: the occupancy, odd occupancy and the scaled missing mass when $\theta \in (0,1]$. In particular, we obtain previously unknown FCLT for $U_n$ for $\theta =1$ and for $M_n$ when $\theta \in (0,1]$. Up to a normalizing constant, the FCLT stated in Theorem 1 also holds for the original (non-scaled) missing mass $\sum _{i=1}^\infty p_i \mathbb {1}_{J_i(n)=0}$ on any interval $t\in [\varepsilon ,1]$, $\varepsilon >0$, separated from 0. The paper extends the results of [6] and [7], where a functional central limit theorem (FCLT) was shown under condition (3) for the vector process

$(R^{*}_{[nt],1},R^{*}_{[nt],2},\dots , R^{*}_{[nt],\nu })_{t\in [0,1]}$ in the case $\theta \in (0,1]$.

Extending the FCLT to the case $\theta =0$ would require additional to (3) conditions. As it was mentioned in [12] and in [2], $\theta =0$ does not imply that the variances grow to infinity and various asymptotic behavior is possible for different statistics. We also argue that even an infinite growth of variances does not guarantee per se the required relative compactness.

When $\theta =1$, we need a function

$$\begin{aligned} L^{*}(x)=\int _0^\infty L(xs) e^{-s}s^{-1} \hbox {d}s. \end{aligned}$$

It is known (see [12]) that $L^{*}(x)$ is slowly varying when $x \rightarrow \infty $.

Finally, for $t\in [0,1]$ introduce the following notation:

$$\begin{aligned} \beta (n)&=\left\{ \begin{array}{ll} \alpha (n), &{} \theta \in [0,1); \\ nL^{*}(n), &{} \theta =1, \end{array} \right. \quad&R_n(t) =\frac{R_{[nt]}-{{\,\mathrm{\mathbf{E}}\,}}R_{[nt]}}{(\beta (n))^{1/2}}, \end{aligned}$$

(4)

$$\begin{aligned} U_n(t)&=\frac{U_{[nt]}-{{\,\mathrm{\mathbf{E}}\,}}U_{[nt]}}{(\beta (n))^{1/2}},&M_n(t) =\frac{M_{[nt]}-{{\,\mathrm{\mathbf{E}}\,}}M_{[nt]}}{(\alpha (n))^{1/2}}. \end{aligned}$$

(5)

We are now ready to formulate the main result of the paper.

Theorem 1

When $\theta \in (0, 1]$, the vector process

$$\begin{aligned} (R_n(t),U_n(t),M_n(t)),\quad t\in [0,1], \end{aligned}$$

converges weakly in the uniform metric on $D([0,1]^3)$ to a three-dimensional Gaussian process $(\rho (t),\upsilon (t),\mu (t))$ with zero mean and the covariance function $c(\tau ,t)$ with the following components: when $\theta \in (0,1), \ \tau \le t$,

$$\begin{aligned} c_{\rho \rho }(\tau ,t)&=\Gamma (1-\theta )((\tau +t)^\theta -t^\theta ), \\ c_{\upsilon \upsilon }(\tau ,t)&= \Gamma (1-\theta )2^{\theta -2}((t+\tau )^\theta -(t-\tau )^\theta ),\\ c_{\mu \mu }(\tau ,t)&=\theta \Gamma (2-\theta ) \left( \frac{\tau }{t^{1-\theta }}-\frac{t\tau }{(t+\tau )^{2-\theta }} \right) , \\ c_{\rho \upsilon }(\tau ,t)&=\Gamma (1-\theta )((2t+\tau )^{\theta } - (2t-\tau )^{\theta })/2, \\ c_{\rho \upsilon }(t,\tau )&=\Gamma (1-\theta )((2t+\tau )^\theta - t^\theta )/2,\\ c_{\rho \mu }(\tau ,t)&=\theta \Gamma (1-\theta )\left( \frac{t}{(t+\tau )^{1-\theta }} - t^\theta \right) , \\ c_{\rho \mu }(t,\tau )&=\theta \Gamma (1-\theta )\left( \frac{\tau }{(t+\tau )^{1-\theta }} -\frac{\tau }{ t^{1-\theta }} \right) ,\\ c_{\mu \upsilon }(\tau ,t)&=\theta \Gamma (1-\theta )\left( \frac{\tau }{2(2t+\tau )^{1-\theta }} -\frac{\tau }{2(2t-\tau )^{1-\theta }} \right) ,\\ c_{\mu \upsilon }(t,\tau )&=\theta \Gamma (1-\theta )\left( \frac{t}{2(2\tau +t)^{1-\theta }} -\frac{t^\theta }{2} \right) . \end{aligned}$$

When $\theta =1$, $ \tau \le t$, $c(\tau ,t)$ is given by

$$\begin{aligned} c_{\rho \rho }(\tau ,t)&= \tau , \ c_{\upsilon \upsilon }(\tau ,t)=2\tau , \ c_{\mu \mu }(\tau ,t)=\tau ^2, \\ c_{\rho \upsilon }(\tau ,t)&=\tau , \ c_{\rho \upsilon }(t,\tau )=(t+\tau )/2,\\ c_{\rho \mu }(\tau ,t)&=c_{\rho \mu }(t,\tau )=c_{\upsilon \mu }(\tau ,t)=c_{\upsilon \mu }(t,\tau )=0. \end{aligned}$$

Thus, when $\theta =1$, $\rho (t)$ and $\upsilon (t)$ are Wiener processes. For a general $\theta \in (0,1]$, the process $(\rho (t),\upsilon (t),\mu (t))$ is self-similar with the Hurst parameter $H=\theta /2$ which includes, in particular, a fractional Brownian motion, a bi-fractional Brownian motion with parameter $H = 1/2, K = \theta $ (see, e.g., [8]) with a new self-similar process $\mu (t)$.

2 Proof of Theorem 1

We start with formulating a couple of lemmas proved in [7]. We will generally use the letter C and its variants to denote a constant whose value is of no importance for us and note in parentheses the parameters it depends upon. This should not lead to a confusion when the same notation is used for, actually, different constants in different contexts, the same way O(1) notation is used.

Lemma 1

When $\theta >0$, there exist $n_0\ge 1$ and $C(\theta )<\infty $ such that

$$\begin{aligned} \frac{{{\,\mathrm{\mathbf{E}}\,}}R(n\delta )}{\beta (n)} \le C(\theta ) \delta ^{\theta /2} \end{aligned}$$

holds for any $\delta \in [0,1]$ and $n\ge n_0$.

Lemma 2

For any $\varepsilon , \delta \in (0,1)$ there exists an $N=N(\varepsilon ,\delta )$ such that for any $n\ge N$,

$$\begin{aligned} \mathbf {P}(\forall t\in [0,1] \ \ \exists \tau : |\tau -t|\le \delta , \ \Pi (n\tau )= [nt]) \ge 1-\varepsilon . \end{aligned}$$

In preparation of the proof, let us introduce some further notation and establish a few inequalities we will be using.

In view of (5), let

$$\begin{aligned} U_n^{*}(t)&=\frac{U (nt)-{{\,\mathrm{\mathbf{E}}\,}}U (nt)}{(\beta (n))^{1/2}},&U_n^{**}(t)=\frac{U ([nt])-{{\,\mathrm{\mathbf{E}}\,}}U ([nt])}{(\beta (n))^{1/2}} \end{aligned}$$

(6)

$$\begin{aligned} M_n^{*}(t)&=\frac{M (nt)-{{\,\mathrm{\mathbf{E}}\,}}M (nt)}{(\alpha (n))^{1/2}},&M_n^{**}(t)=\frac{M ([nt])-{{\,\mathrm{\mathbf{E}}\,}}M ([nt])}{(\alpha (n))^{1/2}}. \end{aligned}$$

(7)

For any two positive $\tau _1\le \tau _2$, define

$$\begin{aligned} U(\tau _2)-U(\tau _1)&=\sum _{i=1}^{\infty }\mathbb {1}\{\Pi _i(\tau _2)\ \text {is odd}\}-\mathbb {1}\{\Pi _i(\tau _1)\ \text {is odd}\}\\&=\sum _{i=1}^{\infty }\mathbb {1}\{\Pi _i(\tau _2) \ \text {is odd}, \Pi _i(\tau _1) \ \text {is even}\}\\&\quad - \mathbb {1}\{\Pi _i(\tau _2) \ \text {is even}, \Pi _i(\tau _1) \ \text {is odd}\}\\&{\mathop {=}\limits ^{\mathrm{def}}}\sum _{i=1}^{\infty } u_{i}(\tau _1,\tau _2) = \sum _{i=1}^{\infty } u_{i} =\sum _{i=1}^{\infty } u'_{i}-u''_i, \end{aligned}$$

and their expectations are denoted by

$$\begin{aligned} \overline{u}_i =\overline{u}_i'-\overline{u}_i''=\overline{u}_i(\tau _1,\tau _2){\mathop {=}\limits ^{\mathrm{def}}}{{\,\mathrm{\mathbf{E}}\,}}u'_i -{{\,\mathrm{\mathbf{E}}\,}}u''_i. \end{aligned}$$

Similarly for M,

$$\begin{aligned} M(\tau _2)-M(\tau _1)&=\sum _{i=1}^{\infty }(\tau _2-\tau _1)p_i\mathbb {1}\{\Pi _i(\tau _2)=0\}- \tau _1 p_i\mathbb {1}\{\Pi _i(\tau _1)=0,\Pi _i(\tau _2)>0\}\\&{\mathop {=}\limits ^{\mathrm{def}}}\sum _{i=1}^{\infty } m_{i}(\tau _1,\tau _2)=\sum _{i=1}^{\infty } m_{i}=\sum _{i=1}^{\infty } m'_{i}-m''_i,\\ \overline{m}_i&=\overline{m}_i'-\overline{m}_i''=\overline{m}_i(\tau _1,\tau _2){\mathop {=}\limits ^{\mathrm{def}}}{{\,\mathrm{\mathbf{E}}\,}}m'_i -{{\,\mathrm{\mathbf{E}}\,}}m''_i. \end{aligned}$$

Clearly, for all natural k,

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}|u_i-\overline{u}_i|^k&= |1+\overline{u}_i|^k \overline{u}_i''+|\overline{u}_i|^k (1-\overline{u}_i'-\overline{u}_i'')+|1-\overline{u}_i|^k \overline{u}_i' \nonumber \\&\le 2^k (\overline{u}_i'+\overline{u}_i'')+|\overline{u}_i|^k\le (2^k+1) (\overline{u}_i'+\overline{u}_i'') \nonumber \\&=(2^k+1)\left[ \sum _{j=0}^\infty \mathbf {P}\{\Pi _i(\tau _1)=2j, \ \Pi _i(\tau _2)-\Pi _i(\tau _1)\ \text {is odd}\} \nonumber \right. \\&\quad \left. +\sum _{j=0}^\infty \mathbf {P}\{\Pi _i(\tau _1)=2j+1, \ \Pi _i(\tau _2)-\Pi _i(\tau _1)\ \text {is odd}\}\right] \nonumber \\&=(2^k+1)\mathbf {P}\{\Pi _i(\tau _2-\tau _1)\ \text {is odd}\} \nonumber \\&<(2^k+1) \mathbf {P}\{\Pi _i(\tau _2-\tau _1)>0\}. \end{aligned}$$

(8)

Similarly,

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}|m'_i-\overline{m}'_i|^k&\le 2^{k-1}({{\,\mathrm{\mathbf{E}}\,}}|m'_i|^k+|\overline{m}'_i|^k)=2^{k-1}(\tau _2-\tau _1)^kp^k_i(e^{-\tau _2 p_i}+e^{-k\tau _2 p_i})\\&<2^k k! (1-e^{-(\tau _2-\tau _1)p_i})= 2^k k!\, \mathbf {P}\{\Pi _i(\tau _2-\tau _1)>0\},\\ {{\,\mathrm{\mathbf{E}}\,}}|m''_i-\overline{m}''_i|^k&\le 2^{k-1}({{\,\mathrm{\mathbf{E}}\,}}|m''_i|^k+|\overline{m}''_i|^k)<2^{k}\tau _1^kp^k_i e^{-\tau _1 p_i}(1-e^{-(\tau _2-\tau _1)p_i})\\&<2^k k! (1-e^{-(\tau _2-\tau _1)p_i})= 2^k k! \,\mathbf {P}\{\Pi _i(\tau _2-\tau _1)>0\}. \end{aligned}$$

As a result,

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}|m_i-\overline{m}_i|^k<4^{k}k! \,\mathbf {P}\{\Pi _i(\tau _2-\tau _1)>0\}. \end{aligned}$$

(9)

We are using the same notation $u_i, \ m_i$ and $\overline{u}_i, \ \overline{m}_i$ without explicitly specifying the corresponding values of $\tau _1<\tau _2$; this should not create a confusion. The following lemma will be used in the proof of the relative compactness of the process $M^{*}_n(t)$.

Lemma 3

Let $\theta \in (0,1]$ and $\delta \in [0,1]$. Then, there exist $n_0\ge 1$ and $C(\theta )<\infty $ such that

$$\begin{aligned} \frac{{{\,\mathrm{\mathbf{var}}\,}}(M(nt_2)-M(nt_1))}{\alpha (n)} \le C(\theta )\delta ^{\theta /2} \end{aligned}$$

for all $t_2-t_1=\delta \ge 0$ and $n\ge n_0$.

Proof

Put $\tau _2=n t_2$ and $\tau _1=n t_1$. Since the variance of an indicator does not exceed its expectation, we have that

$$\begin{aligned}&{{\,\mathrm{\mathbf{var}}\,}}(M(\tau _2)-M(\tau _1))=\sum _{i=1}^{\infty }{{\,\mathrm{\mathbf{E}}\,}}(m_i-\overline{m}_i)^2 = \sum _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}(m'_i)^2-(\overline{m}'_i-\overline{m}_i'')^2+{{\,\mathrm{\mathbf{E}}\,}}(m''_i)^2 \\&\quad \le \sum _{i=1}^{\infty } (\tau _2{-}\tau _1)^2p_i^2 e^{-\tau _2 p_i} +\tau _1^2 p_i^2 e^{-\tau _1 p_i}(1-e^{{-}(\tau _2{-}\tau _1)p_i}\pm (\tau _2{-}\tau _1)p_i e^{-(\tau _2-\tau _1)p_i})\\&\quad \le 2\frac{(\tau _2-\tau _1)^2}{\tau _2^2}{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (\tau _2),2}+{{\,\mathrm{\mathbf{E}}\,}}R^{*}_{\Pi (\tau _2-\tau _1),2} +6\frac{\tau _1^2(\tau _2-\tau _1)}{\tau _2^3}{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (\tau _2),3}. \end{aligned}$$

By [12, Th. 2.1 and (23)],

$$\begin{aligned} \lim \limits _{x\rightarrow \infty }\frac{{{\,\mathrm{\mathbf{E}}\,}}R^{*}_{\Pi (x),2}}{\alpha (x)}=\Gamma (2-\theta )<2, \end{aligned}$$

and therefore, there exists an $x_1>1$ such that for all $x\ge x_1$,

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (x),2}+{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (x),3}<{{\,\mathrm{\mathbf{E}}\,}}R^{*}_{\Pi (x),2}< 2 \alpha (x). \end{aligned}$$

According to Karamata (see, e.g., [5, Th. 2,1, Eq. A6.2.10]), there exists an $x_2>0$ such that for all x and $\delta \in (0,1]$ satisfying $x\delta \ge x_2$, one has

$$\begin{aligned} \frac{L(x\delta )}{L(x)}\le 2 \delta ^{-1/2}. \end{aligned}$$

Let $n\delta >\max \{x_1,x_2\}=x_0$, then

$$\begin{aligned} \frac{{{\,\mathrm{\mathbf{E}}\,}}R^{*}_{\Pi (n \delta ),2}}{\alpha (n)} \le 2\frac{(n \delta )^\theta L(n \delta )}{n^\theta L(n)} \le 4 \delta ^{\theta /2}, \ \ \ \frac{\max ({{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (n t_2),2},{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (n t_2),3})}{\alpha (n)} \le 4 t_2^{\theta /2}. \end{aligned}$$

Choose $n_0$ such that for all $n\ge n_0$ we have $n^\theta L(n)\ge n^{\theta /2}$. Then, provided $n t_2 \le x_0$,

$$\begin{aligned}&\frac{{{\,\mathrm{\mathbf{E}}\,}}R^{*}_{\Pi (n\delta ),2}}{\alpha (n)} \le \frac{{{\,\mathrm{\mathbf{E}}\,}}{\Pi (n\delta )}}{\alpha (n)} \le \frac{n\delta }{n^{\theta /2}}=(n\delta )^{1-\theta /2}\delta ^{\theta /2}\le x_0 \delta ^{\theta /2}, \\&\frac{\max ({{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (n t_2),2},{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (n t_2),3})}{\alpha (n)} \le x_0 t_2^{\theta /2}. \end{aligned}$$

Now, take $c=\max \{4, x_0 \}$. Since $t_2-t_1=\delta \ge 0$, for all $n\ge n_0$ we obtain

$$\begin{aligned} \frac{{{\,\mathrm{\mathbf{var}}\,}}(M(nt_2)-M(nt_1))}{\alpha (n)}\le 2 c \frac{\delta ^2}{t_2^{2-\theta /2}}+\delta ^{\theta /2}+6 c \frac{t_1^2 \delta }{t_2^{3-\theta /2}}\le 9c \cdot \delta ^{\theta /2}. \end{aligned}$$

$\square $

We are ready to prove Theorem 1. The proof is broken into four steps.

Step 1: Covariance The first rather technical step consists in establishing a formula for the covariances which is put in Appendix.

Step 2: Convergence of finite-dimensional distributions Along the lines of the proof of [9, Th. 12], one can show that for

$$\begin{aligned} m\ge 1, \ \ \ 0<t_1<t_2<\ldots <t_m\le 1 \end{aligned}$$

the triangular array of m-dimensional vectors (i.e., independent in k for every n)

$$\begin{aligned} \left\{ \frac{\mathbb {1}(\Pi _k(nt_j)\ \text {is odd})-\mathbf {P}(\Pi _k(nt_j)\ \text {is odd})}{\sqrt{\beta (n)}}, \ j \le m, \ k\le n \right\} _{n\ge 1} \end{aligned}$$

satisfies the Lindeberg condition (see, e.g., [5, Th. 6.2]). Similarly, the convergence of the finite-dimensional distributions is shown for the process $M^{*}_n(t)$.

Step 3: Relative compactness

We shall follow the following plan:

(a) prove the continuity of the limiting process;
(b) prove that $U_n^{*}$ and $U_n^{**}$ ($M_n^{*}$ and $M_n^{**}$) are sufficiently close;
(c) prove the relative compactness of $U_n^{**}$ ($M_n^{**}$).

a(U) Take $\tau _1=n t_1, \ \tau _2=n t_2$ for $0<t_1<t_2<0$. Then,
$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}(U^{*}_n(t_2)-U^{*}_n(t_1))^2= & {} {{\,\mathrm{\mathbf{E}}\,}}\Big (\sum \limits _{i=1}^{\infty } (u_i-\overline{u}_i) \Big )^2/\beta (n) = \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}(u_i-\overline{u}_i)^2/\beta (n)\\&\quad \le 5 \sum \limits _{i=1}^{\infty } \mathbf {P}(\Pi _i(\tau _2-\tau _1){>}0) /\beta (n){=} 5 {{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (\tau _2{-}\tau _1)}/\beta (n)\\&\quad \le 5 C(\theta ) (t_2-t_1)^{\theta /2}. \end{aligned}$$
We have used above the independence of the summands, inequality (8) and Lemma 1. Since the covariance function has a limit, [1, Th. 1.4] will imply that the limiting Gaussian process a.s. has a continuous modification on [0, 1]. Since the trajectories of the limiting Gaussian process belong a.s. to the class C(0, 1), the weak convergence in the Skorohod topology implies the weak convergence in the uniform metric, see, e.g., [4]. Therefore, it is sufficient to prove the relative compactness of $\{U^{*}_n\}_{n\ge n_0}$ (with $n_0$ as in Lemma 1) in the Skorohod topology.
b(U) Since with probability one we have
$$\begin{aligned} |U(nt)-U([nt])|\le \Pi (nt)-\Pi ([nt])\le \Pi ([nt]+1)-\Pi ([nt]), \end{aligned}$$
then
$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}|U(nt)-U([nt])|\le 1. \end{aligned}$$
Hence, for all $\eta >0$,
$$\begin{aligned}&\mathbf {P}(\sup \limits _{0\le t\le 1} |U^{*}_n(t)-U^{**}_n(t)|>\eta )\\&\quad \le \mathbf {P}(\sup \limits _{0\le t \le 1} (|U(nt)-U([nt])|+ {{\,\mathrm{\mathbf{E}}\,}}|U(nt)-U([nt])|)>\eta \sqrt{\beta (n)})\\&\quad \le \mathbf {P}(\sup \limits _{0\le t \le 1} (\Pi ([nt]+1)-\Pi ([nt])+ 1)>\eta \sqrt{\beta (n)})\\&\quad = \mathbf {P}(\sup \limits _{0\le m \le n} (\Pi (m+1)-\Pi (m)+1)>\eta \sqrt{\beta (n)})\\&\quad \le \sum \limits _{m=0}^n \mathbf {P}(\Pi (m+1)-\Pi (m)+1>\eta \sqrt{\beta (n)})\\&\quad \le \sum \limits _{m=0}^n \frac{{{\,\mathrm{\mathbf{E}}\,}}e^{\Pi (m+1)-\Pi (m)+1}}{ e^{\eta \sqrt{\beta (n)}}} =(n+1)\frac{{{\,\mathrm{\mathbf{E}}\,}}e^{\Pi (1)}}{e^{\eta \sqrt{\beta (n)}-1}} =(n+1)e^{e-\eta \sqrt{\beta (n)}} \rightarrow 0 \end{aligned}$$
when $ n\rightarrow \infty $. Therefore, it is sufficient to show the relative compactness of $\{U^{**}_n\}_{n\ge n_0}$ (with $n_0$ as in Lemma 1) in the Skorohod topology.
c(U) For any $t_1, \ t_2 \in [0,1]$ satisfying $ \frac{1}{2n}\le t_2-t_1$ we have that
$$\begin{aligned}{}[nt_2]-[nt_1]\le n(t_2-t_1)+1 \le n(t_2-t_1)+2n(t_2-t_1)= 3n(t_2-t_1) \end{aligned}$$
$$\begin{aligned} \le 3n(t_2-t_1)\cdot (2n(t_2-t_1))^3=24n^4(t_2-t_1)^4. \end{aligned}$$
(10)
Put $k=[16/\theta ]+1$, $\tau _1=[n t_1], \ \tau _2=[n t_2]$.

Recall the Rosenthal inequality [15]: if $\varphi _i$ are independent random variables with ${{\,\mathrm{\mathbf{E}}\,}}\varphi _i=0$, then for all $k\ge 2$ there exists a constant c(k) such that

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}\Big |\sum _i \varphi _i\Big |^k\le c(k)\max \bigg \{\sum _i{{\,\mathrm{\mathbf{E}}\,}}|\varphi _i|^k, \Big (\sum _i {{\,\mathrm{\mathbf{E}}\,}}\varphi _i^2\Big )^{k/2}\bigg \}. \end{aligned}$$

(11)

For all $n\ge n_0$ (with $n_0$ as in Lemma 1), we then have

$$\begin{aligned}&{{\,\mathrm{\mathbf{E}}\,}}|U^{**}_n(t_2)-U^{**}_n(t_1)|^k=\frac{{{\,\mathrm{\mathbf{E}}\,}}\Big |\sum \limits _{i=1}^{\infty } (u_i-\overline{u}_i)\Big |^k}{(\beta (n))^{k/2}} \\&\quad \le \frac{c(k)}{(\beta (n))^{k/2}} \bigg ( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}| u_i-\overline{u}_i|^{k}+ \Big ( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}( u_i-\overline{u}_i)^2\Big )^{k/2}\bigg )\\&\quad \le \frac{C(k)}{(\beta (n))^{k/2}} \bigg (\sum \limits _{i=1}^{\infty } \mathbf {P}(\Pi _i(\tau _2-\tau _1)>0) + \Big ( \sum \limits _{i=1}^{\infty } \mathbf {P}(\Pi _i(\tau _2-\tau _1)>0) \Big )^{k/2}\bigg )\\&\quad =\frac{C(k)}{(\beta (n))^{k/2}} \left( {{\,\mathrm{\mathbf{E}}\,}}R(\tau _2-\tau _1) + \left( {{\,\mathrm{\mathbf{E}}\,}}R(\tau _2-\tau _1) \right) ^{k/2}\right) \\&\quad \le \frac{C(k)}{(\beta (n))^{k/2}}\left( 24n^4(t_2-t_1)^4+ ({{\,\mathrm{\mathbf{E}}\,}}R(3n(t_2-t_1)))^{k/2}\right) \le \widetilde{C}(\theta )(t_2-t_1)^4, \end{aligned}$$

where c(k), C(k) and $\widetilde{C}(\theta )$ depend only on their arguments.

Above, we have used (11) in the first inequality, (8) in the second and finally (10) and Lemma 1 alongside with the bound

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}R(\tau _2-\tau _1) \le {{\,\mathrm{\mathbf{E}}\,}}(\Pi ([nt_2])-\Pi ([nt_1]))=[nt_2]-[nt_1]. \end{aligned}$$

(12)

If $0\le t_2-t_1<\frac{1}{n}$, then $[nt_1]=[nt]$ or $[nt_2]=[nt]$ for all $t\in [t_1, t_2]$; therefore,

$$\begin{aligned} D{\mathop {=}\limits ^{\mathrm{def}}}{{\,\mathrm{\mathbf{E}}\,}}(|{U}^{**}_n(t)-{U}^{**}_n(t_1)|^{k/2} |{U}^{**}_n(t_2)-{U}^{**}_n(t)|^{k/2})=0\le (t_2-t_1)^2. \end{aligned}$$

If $t_2-t_1\ge 1/n$, then there are the following three cases:

1.
if $t_2-t\ge \frac{1}{2n}$, $t-t_1\ge \frac{1}{2n}$, then the Cauchy–Schwarz inequality implies
$$\begin{aligned} D\le \widetilde{C}(\theta ) (t_2-t)^2\cdot (t-t_1)^2 \le \widetilde{C}(\theta )(t_2-t_1)^2. \end{aligned}$$
2.
If $t_2-t\ge \frac{1}{2n}$, $t-t_1< \frac{1}{2n}$, then since
$$\begin{aligned} |U([nt])-U([nt_1])|\le _{\text {a.s.}} \Pi ([nt])-\Pi ([nt_1])\le _{st}\Pi (1), \end{aligned}$$
the same inequality yields
$$\begin{aligned} D\le \left( \widetilde{C}(\theta ) (t_2-t)^4 \cdot {{\,\mathrm{\mathbf{E}}\,}}\left( \frac{\Pi (1)+1}{\sqrt{\beta (n)}}\right) ^{k}\right) ^{1/2}\le \widehat{C}(\theta )(t_2-t_1)^2. \end{aligned}$$
3.
If $t_2-t< \frac{1}{2n}$, $t-t_1\ge \frac{1}{2n}$, then since
$$\begin{aligned} |U([nt_2])-U([nt])|\le _{\text {a.s.}} \Pi ([nt_2])-\Pi ([nt])\le _{st}\Pi (1), \end{aligned}$$
we have that
$$\begin{aligned} D\le \left( {{\,\mathrm{\mathbf{E}}\,}}\left( \frac{\Pi (1)+1}{\sqrt{\beta (n)}}\right) ^{k}\cdot \widetilde{C}(\theta ) (t-t_1)^4\right) ^{1/2}\le \widehat{C}(\theta )(t_2-t_1)^2. \end{aligned}$$
Now, the relative compactness follows from, for example, [4, Th. 13.5].

a(M) Because the covariance function has a limit, it is sufficient to appeal to Lemma 3 and [1, Th. 1.4] to establish existence of an almost sure continuous on [0, 1] modification of the limiting Gaussian process. Since the trajectories of this process are a.s. in C(0, 1), the weak convergence in the Skorohod topology implies the uniform convergence, see [4]. Thus, it is sufficient to prove a relative compactness of the family $\{M^{*}_n\}_{n\ge n_0}$ in the Skorohod topology (here, $n_0$ is the same as in Lemma 1).
b(M) Set $\tau _2=nt$ and $\tau _1=[nt]$. Since $\tau _2-\tau _1\le 1$,
$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}|M(\tau _2)-M(\tau _1)|\le & {} \sum _{i=1}^{\infty }(\tau _2-\tau _1)p_i e^{-p_i \tau _2} \\&\quad + \tau _1 p_i e^{-p_i \tau _1}(1-e^{-p_i (\tau _2-\tau _1)})\\&\le \sum _{i=1}^{\infty } p_i e^{-p_i \tau _2} + e^{-1}p_i (\tau _2-\tau _1)<\sum _{i=1}^{\infty } 2p_i=2. \end{aligned}$$

Let $m'''_i = m''_i(\tau _1,\tau _1+1)$ and $\overline{m}'''_i={{\,\mathrm{\mathbf{E}}\,}}m'''_i$. Then, we have almost surely

$$\begin{aligned} |M(\tau _2)-M(\tau _1)|\le & {} \sum _{i=1}^{\infty } (m'_i+m''_i) \le \sum _{i=1}^{\infty } (p_i +m'''_i)\\&\quad =1+\sum _{i=1}^{\infty } (m'''_i + \overline{m}'''_i - \overline{m}'''_i) <2+\left| \sum _{i=1}^{\infty } (m'''_i - \overline{m}'''_i) \right| . \end{aligned}$$

We know that for any integer $k\ge 2$

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}| m'''_i - {{\,\mathrm{\mathbf{E}}\,}}m'''_i |^k<2^k k! \mathbf {P}(\Pi _i(\tau _1+1-\tau _1)>0)=2^k k!(1-e^{-p_i})< 2^k k! p_i. \end{aligned}$$

Using the independence of the terms and Rosenthal inequality, for any $k\ge 2$,

$$\begin{aligned}&{{\,\mathrm{\mathbf{E}}\,}}\left| \sum _{i=1}^{\infty } (m'''_i - \overline{m}'''_i )\right| ^k\\&\quad \le c(k) \left( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}| m'''_i - \overline{m}'''_i |^k+ \left( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}( m'''_i - \overline{m}'''_i )^2\right) ^{k/2}\right) \\&\quad <c(k)(2^kk! + 4^k)=C(k). \end{aligned}$$

Hence, for $k\ge [2/\theta ]+1$ and all $\eta >0$

$$\begin{aligned}&\mathbf {P}\left( \sup \limits _{0\le t\le 1} |M^{*}_n(t)-M^{**}_n(t)|>\eta \right) \\&\quad \le \mathbf {P}(\sup \limits _{0\le t \le 1} (|M(nt)-M([nt])|+ {{\,\mathrm{\mathbf{E}}\,}}|M(nt)-M([nt])|)>\eta \sqrt{\alpha (n)})\\&\quad \le \mathbf {P}\left( \max \limits _{0\le [nt] \le n} \left( \left| \sum _{i=1}^{\infty } m'''_i - {{\,\mathrm{\mathbf{E}}\,}}m'''_i \right| + 4\right)>\eta \sqrt{\alpha (n)}\right) \\&\quad \le \sum \limits _{[nt]=m\in \{0,1,\ldots ,n\}} \mathbf {P}\left( \left| \sum _{i=1}^{\infty } m'''_i - {{\,\mathrm{\mathbf{E}}\,}}m'''_i \right| + 4>\eta \sqrt{\alpha (n)}\right) \\&\quad \le \sum \limits _{m=0}^n \frac{C(k)}{ (\eta \sqrt{\alpha (n)}-4)^k} =\frac{C(k)(n+1)}{ (\eta \sqrt{\alpha (n)}-4)^k} \rightarrow 0 \ \text {when} \ n\rightarrow \infty . \end{aligned}$$

Therefore, it is sufficient to show the local compactness of $\{M^{**}_n\}_{n\ge n_0}$ in the Skorohod topology.

c(M) Let $t_1, \ t_2 \in [0,1]$ and $ \frac{1}{2n}\le t_2-t_1$, then (10) holds. Set $k=[16/\theta ]+1$, $\tau _1=[n t_1], \ \tau _2=[n t_2]$.

Again, by independence and the Rosenthal inequality,

$$\begin{aligned}&{{\,\mathrm{\mathbf{E}}\,}}|M^{**}_n(t_2)-M^{**}_n(t_1)|^{k}=\frac{{{\,\mathrm{\mathbf{E}}\,}}\left| \sum \nolimits _{i=1}^{\infty }(m_i-\overline{m}_i)\right| ^k}{(\alpha (n))^{k/2}} \\&\quad \le \frac{c(k)}{(\alpha (n))^{k/2}} \left( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}| m_i-\overline{m}_i|^{k}+ \left( \sum \limits _{i=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}( m_i-\overline{m}_i)^2\right) ^{k/2}\right) \\&\quad \le \frac{C(\beta )}{(\alpha (n))^{k/2}} \left( \sum \limits _{i=1}^{\infty } \mathbf {P}(\Pi _i(\tau _2-\tau _1)>0) + \left( {{\,\mathrm{\mathbf{var}}\,}}(M(\tau _2)-M(\tau _1)) \right) ^{k/2}\right) \\&\quad =\frac{C(k)}{(\alpha (n))^{k/2}} \left( {{\,\mathrm{\mathbf{E}}\,}}R(\tau _2-\tau _1) + \left( {{\,\mathrm{\mathbf{var}}\,}}(M(\tau _2)-M(\tau _1)) \right) ^{k/2}\right) \\&\quad \le \frac{C(k)}{(\alpha (n))^{k/2}}\left( 24n^4(t_2-t_1)^4+ (C(\theta ) \alpha (n)(\tau _2-\tau _1)/n)^{k/2}\right) \le \widetilde{C}(\theta )(t_2-t_1)^4, \end{aligned}$$

where c(k), C(k) and $\widetilde{C}(\theta )$ depend only on their arguments.

Above, we have used inequalities (9), (10) and Lemmas 3, 1 alongside with the bound

$$\begin{aligned} {{\,\mathrm{\mathbf{E}}\,}}R(\tau _2-\tau _1) \le {{\,\mathrm{\mathbf{E}}\,}}(\Pi ([nt_2]-[nt_1]))=[nt_2]-[nt_1]. \end{aligned}$$

When $0\le t_2-t_1<\frac{1}{n}$, then $[nt_1]=[nt]$ or $[nt_2]=[nt]$ for any $t\in [t_1, t_2]$. Thus,

$$\begin{aligned} B{\mathop {=}\limits ^{\mathrm{def}}}{{\,\mathrm{\mathbf{E}}\,}}(|{M}^{**}_n(t)-{M}^{**}_n(t_1)|^{k/2} |{M}^{**}_n(t_2)-{M}^{**}_n(t)|^{k/2})=0\le (t_2-t_1)^2. \end{aligned}$$

When $t_2-t_1\ge 1/n$, we have the following three cases:

1.
if $t_2-t\ge \frac{1}{2n}$, $t-t_1\ge \frac{1}{2n}$, then the Cauchy–Schwarz inequality gives
$$\begin{aligned} B\le \widetilde{C}(\theta ) (t_2-t)^2\cdot (t-t_1)^2 \le \widetilde{C}(\theta )(t_2-t_1)^2; \end{aligned}$$
2.
if $t_2-t\ge \frac{1}{2n}$, $t-t_1< \frac{1}{2n}$, then since for any $l\ge 2$,
$$\begin{aligned}&{{\,\mathrm{\mathbf{E}}\,}}|M([nt])-M([nt_1])-{{\,\mathrm{\mathbf{E}}\,}}(M([nt])-M([nt_1])|^l\\&\quad \le {{\,\mathrm{\mathbf{E}}\,}}\left( 4+\left| \sum _{i=1}^{\infty } m''_i([nt_1]+1,[nt_1]) - {{\,\mathrm{\mathbf{E}}\,}}m''_i([nt_1]+1,[nt_1]) \right| \right) ^l< C(l), \end{aligned}$$
the Cauchy–Schwarz inequality yields the bound
$$\begin{aligned} B\le \left( \widetilde{C}(\theta ) (t_2-t)^4 \cdot \frac{C(k)}{\alpha (n)^{k/2}}\right) ^{1/2}\le \widehat{C}(\theta )(t_2-t_1)^2; \end{aligned}$$
3.
finally, $t_2-t< \frac{1}{2n}$, $t-t_1\ge \frac{1}{2n}$, is similar to the previous case.

Thus, the required compactness follows from [4, Th. 13.5].

Finally, for the next step we need to show that M(s), when time scaled, is close to its fully Poissonized version

$$\begin{aligned} \widetilde{M}(s){\mathop {=}\limits ^{\mathrm{def}}}M_{\Pi (s)}=\sum _{i=1}^\infty \Pi (s) p_i\mathbb {1}_{\Pi _i(s)=0}. \end{aligned}$$

Namely, we aim to show that

$$\begin{aligned} \sup \limits _{0\le t\le 1} |M^{*}_n(t)-\widetilde{M}_n(t)|\rightarrow 0\quad \text {in probability,} \end{aligned}$$

(13)

where

$$\begin{aligned} \widetilde{M}_n(t)=\frac{\widetilde{M}(nt) -{{\,\mathrm{\mathbf{E}}\,}}\widetilde{M}(nt)}{(\alpha (n))^{1/2}}. \end{aligned}$$

Introduce $\Pi '_i(s)=\Pi (s)-\Pi _i(s)$ and $\widetilde{\Pi }(s)=(\Pi (s)-s)/\sqrt{s}$. Since $\widetilde{M}(s)=\sum _{i=1}^\infty \Pi '_i(s) p_i \mathbb {1}_{\Pi _i(s)=0}$,

$$\begin{aligned}&|{{\,\mathrm{\mathbf{E}}\,}}\widetilde{M}(s)-{{\,\mathrm{\mathbf{E}}\,}}M(s)|=|{{\,\mathrm{\mathbf{E}}\,}}\sum _{i=1}^\infty (\Pi '_i(s)-s) p_i \mathbb {1}_{\Pi _i(s)=0}| \\&\quad = \Big |\sum _{i=1}^\infty (s(1-p_i)-s) p_i e^{-s p_i}\Big |=\frac{2 {{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (s),2}}{s}\rightarrow 0 \end{aligned}$$

as $s\rightarrow \infty $ and it is bounded by 1. Thus, there exists a sufficiently small $\varepsilon =\varepsilon (\theta )>0$ such that for $\delta _n=n^{\varepsilon -1}$

$$\begin{aligned} \sup \limits _{0\le t\le \delta _n} |M^{*}_n(t)-\widetilde{M}_n(t)|< \frac{\Pi (n\delta _n)+n\delta _n+1}{(\alpha (n))^{1/2}} \rightarrow 0\ \text {a.s.} \end{aligned}$$

when $n\rightarrow \infty $.

By the strong law of large numbers for M(s) and the well-known asymptotic behavior of ${{\,\mathrm{\mathbf{E}}\,}}M(s)$ (see, e.g., [12][Eq. (23)]), we conclude that for any $\theta \in (0,1]$, $M(s)/(s \alpha (s))^{1/2}\rightarrow 0$ a.s. when $s\rightarrow \infty $. Moreover, according to the central limit theorem $\widetilde{\Pi }(s)$ is asymptotically standard normal for large s.

Finally, we have almost surely

$$\begin{aligned} |M^{*}_n(t)-\widetilde{M}_n(t)|\le \frac{|\widetilde{\Pi }(nt)| M(nt)}{(nt \alpha (n))^{1/2}}+\frac{1}{(\alpha (n))^{1/2}}. \end{aligned}$$

Using this inequality, and the fact that $\sup \nolimits _{0\le t\le 1}(\cdot )\le \sup \nolimits _{0\le t\le \delta _n}(\cdot )+\sup \nolimits _{\delta _n\le t\le 1}(\cdot )$ and that $\sup \limits _{0\le t\le 1}(\cdot )$ is a continuous functional, we readily obtain 13.

Step 4: Approximation of the initial process Since $\Pi (t)$ is monotone, the strong law of large numbers implies that for any $\varepsilon , \delta \in (0,1)$ there is an integer $N=N(\varepsilon ,\delta )$ such that for all $n\ge N$ one has

$$\begin{aligned} \mathbf {P}(\forall t\in [0,1] \ \ \exists \tau : |\tau -t|\le \delta , \ \Pi (n\tau )= [nt]) {\mathop {=}\limits ^{\mathrm{def}}}\mathbf {P}(A(n))\ge 1-\varepsilon , \end{aligned}$$

see Lemma 2. Here and below, F stands for R, U or M. The relative compactness of the distributions $\{F_n^{*}\}_{n\ge n_0}$ implies that for any $\varepsilon \in (0,1)$ and $\eta >0$ there exist $\delta \in (0,1)$ and an integer $N_1=N_1(\varepsilon ,\eta )$ such that for all $n\ge N_1$,

$$\begin{aligned} \mathbf {P}\left( \sup \limits _{|t-\tau | \le \delta } \left| F^{*}_n(\tau )-F^{*}_n(t)\right| \ge \eta \right) \le \varepsilon . \end{aligned}$$

Hence, since

$$\begin{aligned} \mathbf {P}(F_n(t)=F^{*}_n(\tau )|\Pi (n\tau )=[nt])=1, \end{aligned}$$

for all $n\ge \max (N,N_1)$,

$$\begin{aligned}&\mathbf {P}\left( \sup \limits _{0 \le t \le 1} \left| F_n(t)-F^{*}_n(t)\right| \ge \eta \right) \\&\quad \le \mathbf {P}\left( \sup \limits _{0 \le t \le 1} \left| F_n(t)-F^{*}_n(t)\right| \ge \eta , A(n)\right) +\varepsilon \\&\quad \le \mathbf {P}\left( \sup \limits _{|t-\tau | \le \delta } \left| F^{*}_n(\tau )-F^{*}_n(t)\right| \ge \eta \right) +\varepsilon \le 2 \varepsilon . \end{aligned}$$

which proves Theorem 1.

References

Adler, R.J.: An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Institute of Math. Stat, Hayward, California (1990)
Barbour, A.D., Gnedin, A.V.: Small counts in the infinite occupancy scheme. Electron. J. Probab. 14(13), 365–384 (2009)
MathSciNet MATH Google Scholar
Ben-Hamou, A., Boucheron, S., Ohannessian, M.I.: Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23(1), 249–287 (2017)
Article MathSciNet Google Scholar
Billingsley, P.: Convergence of Probability Measures, 2nd edition. Wiley, London (1999)
Borovkov, A.A.: Probability Theory. Universitext (2013)
Chebunin, M.G.: Functional central limit theorem in an infinite urn scheme for distributions with superheavy tails. Siberian Electron. Math. Rep. 14, 1289–1298 (2017)
MathSciNet MATH Google Scholar
Chebunin, M.G., Kovalevskii, A.: Functional central limit theorems for certain statistics in an infinite urn scheme. Stats. Prob. Lett. 119, 344–348 (2016)
Article MathSciNet Google Scholar
Durieu, O., Wang, Y.: From infinite urn schemes to decompositions of self-similar Gaussian processes. Electron. J. Prob. 21(43), 1–23 (2016)
MATH Google Scholar
Dutko, M.: Central limit theorems for infinite urn models. Ann. Probab. 17, 1255–1263 (1989)
Article MathSciNet Google Scholar
Gnedin, A., Hansen, B., Pitman, J.: Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws. Probab. Surv. 4, 146–171 (2007)
Article MathSciNet Google Scholar
Good, I.J., Toulmin, G.H.: The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1/2), 45–63 (1956)
Article MathSciNet Google Scholar
Karlin, S.: Central limit theorems for certain infinite urn schemes. J. Math. Mech. 17(4), 373–401 (1967)
MathSciNet MATH Google Scholar
Muratov, A., Zuyev, S.: Bit flipping and time to recover. J. Appl. Prob. 53(3), 1–17 (2016)
Article MathSciNet Google Scholar
Orlitsky, A., Santhanam, N., Zhang, J.: Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inf. Theory 50(7), 1469–1481 (2004)
Article MathSciNet Google Scholar
Rosenthal, H.P.: On the subspaces of $l_p (p > 2)$ spanned by sequences of independent random variables. Israel J. Math. 8(3), 273–303 (1970)
Article MathSciNet Google Scholar

Download references

Acknowledgements

MC’s research was supported by RSF Grant 17-11-01173-Ext. He also acknowledges hospitality of Chalmers University where a part of this work has been done. The authors are thankful to Sergey Foss for his interest in this research and valuable comments and to the anonymous reviewer for thorough reading and spotting some inaccuracies in the previous version of the manuscript.

Funding

Open access funding provided by Chalmers University of Technology.

Author information

Authors and Affiliations

Sobolev Institute of Mathematics SB RAS, Novosibirsk State University, Novosibirsk, Russia
Mikhail Chebunin
Chalmers University of Technology, Gothenburg, Sweden
Sergei Zuyev

Authors

Mikhail Chebunin
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Zuyev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergei Zuyev.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

An explicit expression for the covariance between $R(\tau )$ and R(t) can be found in [7]. Take $\tau \le t$. The

$$\begin{aligned} c^{*}_{UU} (\tau ,t)= & {} {{\,\mathrm{\mathbf{cov}}\,}}(U(\tau ), U(t)) \\= & {} \sum _{k=1}^{\infty } \mathbf {P}(\Pi _k(\tau ), \Pi _k(t)\ \text {is odd} )- \mathbf {P}(\Pi _k(\tau )\ \text {is odd}) \mathbf {P}(\Pi _k(t)\ \text {is odd})\\= & {} \frac{1}{4} \sum _{k=1}^{\infty } \bigg ( (1-e^{-2p_k\tau })(1+e^{-2p_k(t-\tau )})-(1-e^{-2p_k\tau })(1-e^{-2p_kt}) \bigg )\\= & {} \frac{1}{4} \sum _{k=1}^{\infty } e^{-2p_k(t-\tau )}-e^{-2p_k(t+\tau )} =\frac{1}{2}{{\,\mathrm{\mathbf{E}}\,}}(U(t+\tau )-U(t-\tau )). \end{aligned}$$

Hence (since $\frac{\beta (nt)}{\beta (n)} \rightarrow t^{\theta }$ as $n \rightarrow \infty $)

$$\begin{aligned} c_{\upsilon \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c_{UU}^{*}(n\tau , nt)}{\alpha (n)} =\Gamma (1-\theta )2^{\theta -2}((t+\tau )^\theta -(t-\tau )^\theta ), \theta \in (0,1), \\ c_{\upsilon \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c_{UU}^{*}(n\tau , nt)}{n L^{*}(n)} =2\tau , \ \theta =1, \end{aligned}$$

cf. [12][Eq. (21)].

Next,

$$\begin{aligned} c_{MM}^{*} (\tau ,t)= & {} {{\,\mathrm{\mathbf{cov}}\,}}(M(\tau ), M(t))\\= & {} \sum _{k=1}^{\infty } {{\,\mathrm{\mathbf{E}}\,}}(tp_i \mathbb {1}(\Pi _i(t)=0)- tp_i e^{-tp_i}) (\tau p_i \mathbb {1}(\Pi _i(\tau )=0)- \tau p_i e^{-\tau p_i})\\= & {} \sum _{k=1}^{\infty } t\tau p_i^2 e^{-tp_i}(1-e^{-\tau p_i}) =\frac{2\tau }{t}{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (t),2}-\frac{2t \tau }{(t+\tau )^2}{{\,\mathrm{\mathbf{E}}\,}}R_{\Pi (t+\tau ),2}. \end{aligned}$$

Since $\frac{\alpha (nt)}{\alpha (n)} \rightarrow t^{\theta }$ when $n \rightarrow \infty $,

$$\begin{aligned} c_{\mu \mu }(\tau ,t)= & {} \lim _{n\rightarrow \infty } \frac{c_{MM}^{*}(n\tau , nt)}{\alpha (n)}\\= & {} \theta \Gamma (2-\theta )\left( \frac{\tau }{t^{1-\theta }} - \frac{t\tau }{(t+\tau )^{2-\theta }} \right) , \end{aligned}$$

cf. [12][Eq. (23)].

Continuing,

$$\begin{aligned} c^{*}_{RU} (\tau ,t)= & {} {{\,\mathrm{\mathbf{cov}}\,}}(R(\tau ), U(t)) = \sum _{k=1}^{\infty } {{\,\mathrm{\mathbf{cov}}\,}}(1-\mathbb {1}(\Pi _k(\tau )=0), \mathbb {1}(\Pi _k(t)\ \text {is odd}))\\= & {} \quad -\sum _{k=1}^{\infty } {{\,\mathrm{\mathbf{cov}}\,}}(\mathbb {1}(\Pi _k(\tau )=0), \mathbb {1}(\Pi _k(t)\ \text {is odd}))\\= & {} \quad -\sum _{k=1}^{\infty } \mathbf {P}(\Pi _k(\tau )=0, \Pi _k(t)\ \text {is odd} )- \mathbf {P}(\Pi _k(\tau )=0) \mathbf {P}(\Pi _k(t)\ \text {is odd})\\= & {} -\frac{1}{2} \sum _{k=1}^{\infty } \bigg ( e^{-p_k\tau }(1-e^{-2p_k(t-\tau )})- e^{-p_k\tau }(1-e^{-2p_kt}) \bigg )\\= & {} \frac{1}{2}\sum _{k=1}^{\infty } \bigg ( e^{-p_k(2t-\tau )} - e^{-p_k(2t+\tau )}\pm 1\bigg ) =\frac{1}{2}{{\,\mathrm{\mathbf{E}}\,}}(R(2t+\tau )-R(2t-\tau )). \end{aligned}$$

Similarly,

$$\begin{aligned} c^{*}_{RU}(t,\tau )= & {} {{\,\mathrm{\mathbf{cov}}\,}}(R(t), U(\tau )) = -\sum _{k=1}^{\infty } {{\,\mathrm{\mathbf{cov}}\,}}(\mathbb {1}(\Pi _k(t)=0), \mathbb {1}(\Pi _k(\tau )\ \text {is odd}))\\= & {} \frac{1}{2} \sum _{k=1}^{\infty } e^{-p_kt}(1-e^{-2p_k\tau })\\= & {} \frac{1}{2} \sum _{k=1}^{\infty } \bigg ( e^{-p_k t} - e^{-p_k(2\tau +t)}\pm 1\bigg )\\= & {} \frac{1}{2}{{\,\mathrm{\mathbf{E}}\,}}(R(2t+\tau )-R(t)). \end{aligned}$$

Because $\frac{\beta (nt)}{\beta (n)} \rightarrow t^{\theta }$ when $n \rightarrow \infty $, for $\theta \in (0,1)$ we have that

$$\begin{aligned} c_{\rho \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c_{RU}^{*}(n\tau , nt)}{\alpha (n)} =\Gamma (1-\theta )((2t+\tau )^{\theta } - (2t-\tau )^{\theta })/2,\\ c_{\rho \upsilon }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RU}(nt, n\tau )}{\alpha (n)}=\Gamma (1-\theta )((2t+\tau )^\theta - t^\theta )/2. \end{aligned}$$

For $\theta =1$, this reduces to

$$\begin{aligned} c_{\rho \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c_{RU}^{*}(n\tau , nt)}{n L^{*}(n)}=\tau ,\\ c_{\rho \upsilon }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RU}(nt, n\tau )}{nL^{*}(n)}=(t+\tau )/2, \end{aligned}$$

cf. [12, Th. 1].

Next,

$$\begin{aligned} c^{*}_{MU} (\tau ,t)= & {} {{\,\mathrm{\mathbf{cov}}\,}}(M(\tau ), U(t))\\= & {} \sum _{k=1}^{\infty } \tau p_k {{\,\mathrm{\mathbf{cov}}\,}}(\mathbb {1}(\Pi _k(\tau )=0), \mathbb {1}(\Pi _k(t)\ \text {is odd}))\\= & {} \frac{1}{2} \sum _{k=1}^{\infty }\tau p_k \bigg ( e^{-p_k(2t+\tau )}-e^{-p_k(2t-\tau )}\bigg )\\= & {} \frac{\tau }{2(2t+\tau )}{{\,\mathrm{\mathbf{E}}\,}}M(2t+\tau )-\frac{\tau }{2(2t-\tau )}{{\,\mathrm{\mathbf{E}}\,}}M(2t-\tau ), \end{aligned}$$

and

$$\begin{aligned}&c^{*}_{MU} (t,\tau )={{\,\mathrm{\mathbf{cov}}\,}}(M(t), U(\tau )) \\&\quad = \frac{1}{2} \sum _{k=1}^{\infty }t p_k \bigg (e^{-p_k(2\tau +t)}-e^{-p_k t}\bigg )\\&\quad = \frac{t}{2(2\tau +t)}{{\,\mathrm{\mathbf{E}}\,}}M(2\tau +t)-\frac{1}{2}{{\,\mathrm{\mathbf{E}}\,}}M(t). \end{aligned}$$

Finally,

$$\begin{aligned} c^{*}_{RM} (\tau ,t)= & {} {{\,\mathrm{\mathbf{cov}}\,}}(R(\tau ), M(t))\\= & {} \sum _{k=1}^{\infty } {{\,\mathrm{\mathbf{cov}}\,}}(1-\mathbb {1}(\Pi _k(\tau )=0), t p_k\mathbb {1}(\Pi _k(t)=0))\\= & {} -\sum _{k=1}^{\infty } t p_k {{\,\mathrm{\mathbf{cov}}\,}}(\mathbb {1}\{\Pi _k(\tau )=0\}, \mathbb {1}\{\Pi _k(t)=0)\}\\= & {} -\sum _{k=1}^{\infty }t p_k \bigg ( e^{-p_k t}- e^{-p_k(\tau +t)} \bigg ) =\frac{t}{\tau +t}{{\,\mathrm{\mathbf{E}}\,}}M(\tau +t)-{{\,\mathrm{\mathbf{E}}\,}}M(t), \end{aligned}$$

and

$$\begin{aligned} c^{*}_{RM} (t,\tau ) ={{\,\mathrm{\mathbf{cov}}\,}}(R(t), M(\tau )) =\frac{\tau }{\tau +t}{{\,\mathrm{\mathbf{E}}\,}}M(\tau +t)-\frac{\tau }{t}{{\,\mathrm{\mathbf{E}}\,}}M(t). \end{aligned}$$

Because $\frac{\alpha (nt)}{\alpha (n)} \rightarrow t^{\theta }$ when $n \rightarrow \infty $, for $\theta \in (0,1)$ we obtain

$$\begin{aligned} c_{\rho \mu }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RM} (n\tau , nt)}{\alpha (n)}=\theta \Gamma (1-\theta ) \left( \frac{t}{(t+\tau )^{1-\theta }} - t^\theta \right) , \\ c_{\rho \mu }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RM} (nt, n\tau )}{\alpha (n)} =\theta \Gamma (1-\theta )\left( \frac{\tau }{(t+\tau )^{1-\theta }} -\frac{\tau }{ t^{1-\theta }} \right) , \\ c_{\mu \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c^{*}_{MU} (n\tau , nt)}{\alpha (n)} =\theta \Gamma (1-\theta )\left( \frac{\tau }{2(2t+\tau )^{1-\theta }} -\frac{\tau }{2(2t-\tau )^{1-\theta }} \right) , \\ c_{\mu \upsilon }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{MU} (nt, n\tau )}{\alpha (n)} =\theta \Gamma (1-\theta )\left( \frac{t}{2(2\tau +t)^{1-\theta }} -\frac{t^\theta }{2} \right) , \end{aligned}$$

cf. [12][Eq. (23)].

Clearly, $L(n)\rightarrow 0$ as $n\rightarrow \infty $. According to [12][Lem. 4], in the case $\theta =1$ the function $L^{*}(n)\rightarrow 0$ when $n\rightarrow \infty $ is slowly varying and

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{L(n)}{L^{*}(n)}{\mathop {=}\limits ^{\mathrm{def}}}\lim _{n\rightarrow \infty }\delta _n=0. \end{aligned}$$

(14)

Therefore, in the case $\theta =1$,

$$\begin{aligned} c_{\rho \mu }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RM} (n\tau , nt)}{\alpha (n)} \sqrt{\delta _n}=0,\\ \quad c_{\rho \mu }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{RM} (nt, n\tau )}{\alpha (n)}\sqrt{\delta _n}=0, \\ c_{\mu \upsilon }(\tau ,t)&=\lim _{n\rightarrow \infty } \frac{c^{*}_{MU} (n\tau , nt)}{\alpha (n)} \sqrt{\delta _n}=0, \\ \quad c_{\mu \upsilon }(t,\tau )&=\lim _{n\rightarrow \infty } \frac{c^{*}_{MU} (nt, n\tau )}{\alpha (n)} \sqrt{\delta _n}=0. \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chebunin, M., Zuyev, S. Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models. J Theor Probab 35, 1–19 (2022). https://doi.org/10.1007/s10959-020-01053-6

Download citation

Received: 26 June 2019
Revised: 25 September 2020
Accepted: 25 October 2020
Published: 23 November 2020
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10959-020-01053-6

Keywords

Mathematics Subject Classification (2020)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models

Abstract

Similar content being viewed by others

Functional Limit Theorems for the Pólya Urn

Limit behavior of the q-Pólya urn

A generalized urn with multiple drawing and random addition

1 Introduction

Theorem 1

2 Proof of Theorem 1

Lemma 1

Lemma 2

Lemma 3

Proof

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2020)

Navigation

Functional Central Limit Theorems for Occupancies and Missing Mass Process in Infinite Urn Models

Abstract

Similar content being viewed by others

Functional Limit Theorems for the Pólya Urn

Limit behavior of the q-Pólya urn

A generalized urn with multiple drawing and random addition

1 Introduction

Theorem 1

2 Proof of Theorem 1

Lemma 1

Lemma 2

Lemma 3

Proof

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2020)

Search

Navigation