Abstract
The convergence of a sequence of point processes with dependent points, defined by a symmetric function of iid high-dimensional random vectors, to a Poisson random measure is proved. This also implies the convergence of the joint distribution of a fixed number of upper order statistics. As applications of the result a generalization of maximum convergence to point process convergence is given for simple linear rank statistics, rank-type U-statistics and the entries of sample covariance matrices.
Similar content being viewed by others
1 Introduction
In classical extreme value theory the asymptotic distribution of the maximum of random points plays a central role. Maximum type statistics build popular tests on the dependency structure of high-dimensional data. Especially, against sparse alternatives those tests possess good power properties (see Han et al. 2017; Drton et al. 2020; Zhou et al. 2019). Closely related to the maxima of random points are point processes, which play an important role in stochastic geometry and data analysis. They have applications in statistical ecology, astrostatistics and spatial epidemiology (Baddeley 2007). For a sequence \((Y_i)_i\) of real-valued random variables, we set
where \(\varepsilon _x\) is the Dirac measure in x. Let \(K:=(0,1)\times (u,\infty )\) with \(u\in \mathbb {R}\). Then, \(\widetilde{M}_p(K)\) counts the number of exceedances of the threshold u by the random variables \(Y_1,\ldots , Y_p\). If \(Y^{(k)}\) denotes the k-th upper order statistic of \(Y_1,\ldots ,Y_p\), it holds that \(\{\widetilde{M}_p(K)<k\}=\{Y^{(k)}\le u\}\), and in particular \(\{\widetilde{M}_p(K)=0\}=\{\max _{i=1,\ldots ,p} Y_i\le u\}\). Therefore, the weak convergence of a sequence of point processes gives information about the joint asymptotic distribution of a fixed number of upper order statistics. If the sequence \((Y_i)_i\) consists of independent and identically distributed (iid) random variables, maximum convergence and point process convergence are equivalent, but if the random variables exhibit dependency, this equivalence does not necessarily hold anymore. In this sense, point process convergence is a substantial generalization of the maximum convergence. Additionally, the time components i/p deliver valuable information of the random time points when a record occurs, i.e., the time points when \(Y_j>\max _{i=1,\ldots ,j-1}Y_i\).
Our main motivation comes from statistical inference for high-dimensional data, where the asymptotic distribution of the maximum of dependent random variables has found several applications in recent years (see for example Han et al. 2017; Drton et al. 2020; Zhou et al. 2019; Cai and Jiang 2011; Cai et al. 2013; Cai 2017; Cai and Liu 2011; Gösmann et al. 2022). The objective of this paper is to provide the methodology to extend meaningful results with reference to the convergence of the maximum of dependent random variables, to point process convergence.
To this end, we consider dependent points \(T_{\textbf{i}}:=g_{n,p}(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots , \textbf{x}_{i_m})\), where the index \(\textbf{i}= (i_1, i_2,\ldots , i_m)\in \{1,\ldots ,p\}^m\). The random vectors \(\textbf{x}_1,\ldots ,\textbf{x}_p\) are iid on \(\mathbb {R}^n\) and \(g_{n,p}:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) is a measurable, symmetric function. Important examples include U-statistics, simple linear rank statistics, rank-type U-statistics, the entries of sample covariance matrices or interpoint distances.
Additionally, we assume that the dimension of the points n is growing with the number of points p. Over the last decades the environment and therefore the requirements for statistical methods have changed fundamentally. Due to the huge improvement of computing power and data acquisition technologies one is confronted with large data sets, where the dimension of observations is as large or even larger than the sample size. These high-dimensional data occur naturally in online networks, genomics, financial engineering, wireless communication or image analysis (see Johnstone and Titterington 2009; Clarke et al. 2008; Donoho 2000). Hence, the analysis of high-dimensional data has developed as a meaningful and active research area.
We will show that the corresponding point process of the points \(T_\textbf{i}\) converges to a Poisson random measure (PRM) with a mean measure that involves the m-dimensional Lebesgue measure and an additional measure \(\mu\). If we replace the points \(T_\textbf{i}\) with iid random variables with the same distribution, the (non-degenerate) limiting distribution of the maximum will necessarily be an extreme value distribution of the form \(\exp (-\mu (x))\). Moreover, the convergence of the corresponding point process will be equivalent to the condition
However, since the random points \(T_\textbf{i}\) are not independent, we additionally need the following assumption on the dependence structure
where \(l=1,\ldots , m-1\).
In the finite-dimensional case where n is fixed, several results about point process convergence are available in similar settings. In Silverman and Brown (1978), Silverman and Brown showed point process convergence for \(m=2\), \(n=2\) and \(g_{2,p}(\textbf{x}_i,\textbf{x}_j)=a_p\Vert \textbf{x}_i-\textbf{x}_j\Vert _2^2\), where the \(\textbf{x}_i\) have a bounded and almost everywhere continuous density, \(a_p\) is a suitable scaling sequence and \(\Vert \cdot \Vert _2\) is the Euclidean norm on \(\mathbb {R}^2\). In the Weibull case \(\mu (x)= x^\alpha\) for \(x,\alpha >0\), Dabrowski et al. (2002) proved a generalization to points with a fixed dimension and \(g_{n,p}(\textbf{x}_i,\textbf{x}_j)=a_ph(\textbf{x}_i,\textbf{x}_j)\), where h is a measurable, symmetric function and \(a_p\) is a suitable scaling sequence.
Also in the finite-dimensional case, under similar assumptions as in (1.1) with \(\mu (x)=\beta x^\alpha\) for \(x,\alpha >0\), \(\beta \in \mathbb {R}\) and under condition (1.2), Schulte and Thäle (2012) showed convergence in distribution of point processes towards a Weibull process. The points of these point processes are obtained by applying a symmetric function \(g_{n,p}\) to all m-tuples of distinct points of a Poisson process on a standard Borel space. In Schulte and Thäle (2016), this result was extended to more general functions \(\mu\) and to binomial processes so that other PRMs were possible limit processes. In Decreusefond et al. (2016), Decreusefond, Schulte and Thäle provided an upper bound of the Kantorovich-Rubinstein distance between a PRM and the point process induced in the aforementioned way by a Poisson or a binomial process on an abstract state space. Notice that convergence in Kantorovich-Rubinstein distance implies convergence in distribution (see Panaretos and Zemel 2020, Theorem 2.2.1 or Decreusefond et al. 2016, p. 2149). In Chenavier et al. (2022) another point process result in a similar setting is given for the number of nearest neighbor balls in fixed dimension. Moreover, Basrak and Planinić (2021) presents a general framework for Poisson approximation of point processes on Polish spaces.
1.1 Structure of this paper
The remainder of this paper is structured as follows. In Section 2 we prove weak point process convergence for the dependent points \(T_{\textbf{i}}\) in the high-dimensional case as tool for the generalization of the convergence of the maximum (Theorem 2.1). We provide popular representations of the limiting process in terms of the transformed points of a homogeneous Poisson process. Moreover, we derive point process convergence for the record times. In Section 3 these tools are applied to study statistics based on relative ranks like simple linear rank statistics or rank-type U-statistics. We also prove convergence of the point processes of the off-diagonal entries of large sample covariance matrices. The technical proofs are deferred to Section 4.
1.2 Notation
Convergence in distribution (resp. probability) is denoted by \({\mathop {\rightarrow }\limits ^{d}}\) (resp. \({\mathop {\rightarrow }\limits ^{{\mathbb P}}}\)) and unless explicitly stated otherwise all limits are for \(n\rightarrow \infty\). For sequences \((a_n)_n\) and \((b_n)_n\) we write \(a_n=O(b_n)\) if \(a_n/b_n\le C\) for some constant \(C>0\) and every \(n\in \mathbb {N}\), and \(a_n=o(b_n)\) if \(\lim _{n\rightarrow \infty } a_n/b_n=0\). Additionally, we use the notation \(a_n\sim b_n\) if \(\lim _{n\rightarrow \infty } a_n/b_n=1\) and \(a_n\lesssim b_n\) if \(a_n\) is smaller than or equal to \(b_n\) up to a positive universal constant. We further write \(a\wedge b:=\min \{a,b\}\) for \(a,b\in \mathbb {R}\) and for a set A we denote |A| as the number of elements in A.
2 Point process convergence
We introduce the model that was briefly described in the introduction. Let \(\textbf{x}_1,\ldots ,\textbf{x}_p\) be iid \(\mathbb {R}^n\)-valued random vectors with \(\textbf{x}_i=(X_{i1}, \ldots , X_{in})^\top , i=1,\ldots ,p\), where \(p=p_n\) is some positive integer sequence tending to infinity as \(n\rightarrow \infty\).
We consider the random points
where \(\textbf{i}=(i_1, i_2,\ldots , i_m)\in \{1,\ldots ,p\}^m\) and \(g_n=g_{n,p}:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) is a measurable and symmetric function, where symmetric means \(g_{n}(\textbf{y}_1,\textbf{y}_2,\ldots , \textbf{y}_m)=g_{n}(\textbf{y}_{\pi (1)},\textbf{y}_{\pi (2)},\ldots , \textbf{y}_{\pi (m)})\) for all \(\textbf{y}_1,\textbf{y}_2,\ldots , \textbf{y}_m \in \mathbb {R}^n\) and all permutations \(\pi\) on \(\{1,2,\ldots , m\}\). We are interested in the limit behavior of the point processes \(M_n\) towards a PRM M,
where \(\textbf{i}/p=(i_1/p,\ldots ,i_m/p)\). The limit M is a PRM with mean measure
where \(\lambda _m\) is the Lebesgue measure on \(\mathbb {R}^m\). For an interval (a, b) with \(a<b\in \mathbb {R}\) we write \(\mu (a,b):=\mu ((a,b)):=\mu (a)-\mu (b)\) and \(\mu :(v,w)\rightarrow \mathbb {R}^+=\{x\in \mathbb {R}:x\ge 0\}\) is a function satisfying \(\lim _{x \rightarrow v} \mu (x)=\infty\) and \(\lim _{x \rightarrow w} \mu (x)=0\) for \(v,w\in \bar{\mathbb {R}}=\mathbb {R}\cup \{\infty , -\infty \}\) and \(v<w\). Furthermore, we set \(\eta _n(\cdot ):={\mathbb E}[M_n(\cdot )]\). We consider the \(M_n\)’s and M as random measures on the state space
with values in \(\mathcal {M}(S)\) the space of point measures on S, endowed with the vague topology (see Resnick 2008). The following result studies the convergence \(M_n {\mathop {\rightarrow }\limits ^{d}}M\), which denotes the convergence in distribution in \(\mathcal {M}(S)\).
Theorem 2.1
Let \(\textbf{x}_1,\ldots , \textbf{x}_p\) be n-dimensional, independent and identically distributed random vectors and \(p=p_n\) is some sequence of positive integers tending to infinity as \(n\rightarrow \infty\). Additionally, let \(g=g_n:\mathbb {R}^{mn}\rightarrow (v,w)\) be a measurable and symmetric function, where \(v,w\in \bar{\mathbb {R}}=\mathbb {R}\cup \{\infty , -\infty \}\) and \(v<w\). Assume that there exists a function \(\mu :(v,w)\rightarrow \mathbb {R}^+\) with \(\lim _{x \rightarrow v} \mu (x)=\infty\) and \(\lim _{x \rightarrow w} \mu (x)=0\) such that, for \(x\in (v,w)\) and \(n\rightarrow \infty\),
-
(A1) \(\left( {\begin{array}{c}p\\ m\end{array}}\right) {\mathbb P}(g_n(\textbf{x}_1,\textbf{x}_2,\ldots , \textbf{x}_m)>x) \rightarrow \mu (x)\) and
-
(A2) \({\mathbb P}(g_n\!(\textbf{x}_1,\textbf{x}_2,\ldots ,\textbf{x}_m)\!>\!x,g_n\!(\textbf{x}_{m-l+1},\ldots , \textbf{x}_{2m-l})\!>\!x)\!=\!o(p^{-(2m-l)})\) for \(l=\!1,\ldots , m\!-\!1\).
Then we have \(M_n{\mathop {\rightarrow }\limits ^{d}}M\).
Note that (A1) ensures the correct specification of the mean measure, while (A2) is an anti-clustering condition. Both conditions are standard in extreme value theory. It is worth mentioning that
where we use the conventions \(\mu (x)=0\) if \(x>w\), \(\mu (x)=\infty\) if \(x<v\), and \(\exp (-\infty )=0\). The typical distribution functions H are the Fréchet, Weibull and Gumbel distributions. In these cases, the limiting process M has a representation in terms of the transformed points of a homogeneous Poisson process. Let \((U_i)_i\) be an iid sequence of random vectors uniformly distributed on \(S_1\) and \(\Gamma _i=E_1+\ldots +E_i\), where \((E_i)_i\) is an iid sequence of standard exponentially distributed random variables, independent of \((U_i)_i\).
It is well–known that \(N_\Gamma :=\sum _{i=1}^\infty \varepsilon _{\Gamma _i}\) is a homogeneous Poisson process and hence it holds for every \(A\subset (0,\infty )\) that \(N_{\Gamma }(A)\) is Poisson distributed with parameter \(\lambda _1(A)\) (see for example Embrechts et al. 1997, Example 5.1.10). For the mean measure \(\eta\) of M we get for a product of intervals \(\mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\subset S_1\)
where we used in the second line that
as \(U_i\) is uniformly distributed on \(S_1\) for every i and
We get the following representations for the limiting processes M.
-
Fréchet case: For \(\alpha >0\) the Fréchet distribution is given by \(\Phi _\alpha (x)=\exp ({-x^{-\alpha }})\), \(x>0\). For \(0<r<s<\infty\) we have \(\mu _{\Phi _\alpha }(r,s]=r^{-\alpha }-s^{-\alpha }\) and therefore, we can write
$$\begin{aligned} M=M_{\Phi _\alpha }=\sum _{i=1}^\infty \varepsilon _{(U_i,\Gamma _i^{-1/\alpha })}. \end{aligned}$$ -
Weibull case: For \(\alpha >0\) the Weibull distribution is given by \(\Psi _\alpha (x)=\exp ({-|x|^\alpha })\), \(x<0\). For \(-\infty<r<s<0\) we have \(\mu _{\Psi _{\alpha }}(r,s]=|r|^\alpha -|s|^\alpha\) and
$$\begin{aligned} M=M_{\Psi _\alpha }=\sum _{i=1}^\infty \varepsilon _{(U_i,-\Gamma _i^{1/\alpha })}. \end{aligned}$$ -
Gumbel case: The Gumbel distribution is given by \(\Lambda (x)=\exp ({-{{\,\textrm{e}\,}}^{-x}})\) for all \(x\in \mathbb {R}\). For \(-\infty<r<s<\infty\) we have \(\mu _{\Lambda }(r,s]={{\,\textrm{e}\,}}^{-r}-{{\,\textrm{e}\,}}^{-s}\) and
$$\begin{aligned} M=M_\Lambda =\sum _{i=1}^\infty \varepsilon _{(U_i,-\log \Gamma _i)}. \end{aligned}$$
Besides the points \(T_\textbf{i}\), time components \(\textbf{i}/p=(i_1/p,\ldots ,i_m/p)\) with \(1\le i_1\le \ldots \le i_m\le p\) are considered in the definition of the point process \(M_n\). Whenever we do not need the time components in the following, we will use the shorthand notation
Under the conditions of Theorem 2.1, \(N_n\) converges in distribution to \(N(\cdot ):=M(S_1\times \cdot )\) which is a PRM with mean measure \(\mu\).
A direct consequence of the point process convergence is the convergence of the joint distribution of a fixed number of upper order statistics. In the Fréchet, Weibull and Gumbel cases the limit function can be described as the joint distribution function of transformations of the points \(\Gamma _i\).
Corollary 2.2
Let \(G_{n,(j)}\) be the j-th upper order statistic of the random variables \((g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m}))\), where \(1\le i_1<i_2<\ldots <i_m\le p\). Under the conditions of Theorem 2.1 and for a fixed \(k\ge 1\) the distribution function
where \(x_k<\ldots <x_1\in (v,w)\), converges to
as \(n\rightarrow \infty\). In particular, in the Fréchet, Weibull and Gumbel cases, it holds that
Proof
Since \(N_n(x,w)\) is the number of vectors \(\textbf{i}=(i_1,\ldots ,i_m)\) with \(1\le i_1<i_2<\ldots <i_m\le p\), for which \(g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m})\in (x,w)\), we get by Theorem 2.1 as \(n\rightarrow \infty\)
By the representation of the limiting point process in the Fréchet, Weibull and Gumbel cases, (2.4) is equal to one of the three distribution functions in the corollary.
One field, where point processes find many applications, is stochastic geometry. The paper Schulte and Thäle (2012), for example, considers order statistics for Poisson k-flats in \(\mathbb {R}^d\), Poisson polytopes on the unit sphere and random geometric graphs.
Setting \(k=1\) in Corollary 2.2 we obtain the convergence in distribution of the maximum of the points \(T_\textbf{i}\).
Corollary 2.3
Under the conditions of Theorem 2.1 we get
Example 2.4
(Interpoint distances) Let \(\textbf{x}_i=(X_{i1}, \ldots , X_{in})^\top , i=1,\ldots ,p\) be n-dimensional random vectors, whose components \((X_{it})_{i,t\ge 1}\) are independent and identically distributed random variables with zero mean and variance 1. We are interested in the asymptotic behavior of the largest interpoint distances
where \(\Vert \cdot \Vert _2\) is the Euclidean norm on \(\mathbb {R}^n\). Figure 1 shows the four largest interpoint distances of 500 points on \(\mathbb {R}^2\) with independent standard normal distributed components. Note that three of the largest four distances involve the same outlying vector \(\textbf{x}_i\).
We assume that there exists \(s>2\) such that \({\mathbb E}[|X_{11}|^{2s}(\log (|X_{11}|))^{s/2}]< \infty\) and \({\mathbb E}[X_{11}^4]\le 5\) and that \(p=p_n\rightarrow \infty\) satisfies \(p=O(n^{(s-2)/4})\). Additionally, we let \((b_n)_n\) and \((c_n)_n\) be sequences given by
where \(d_n=\sqrt{2\log \tilde{p}} - \tfrac{\log \log \tilde{p}+\log 4\pi }{2(2\log \tilde{p})^{1/2}}\) with \(\tilde{p}=p(p-1)/2\). For \(x\in \mathbb {R}\) one can check that
as \(n\rightarrow \infty\) (see Heiny and Kleemann (2023) for details). Therefore, the conditions (A1) and (A2) in Theorem 2.1 hold for \(m=2\), \(g_n(\textbf{x}_i,\textbf{x}_j)=c_n (D_{ij}-b_n)\) and \(\mu (x)={{\,\textrm{e}\,}}^{-x}\). By virtue of Theorem 2.1 we have
Finally Corollary 2.2 yields for a fixed \(k\ge 1\) that
where \(D_{n,(\ell )}\) is the \(\ell\)-th upper order statistic of the random variables \(c_n(D_{ij}-b_n)\) for \(1\le i<j\le p\).
2.1 Record times
In Theorem 2.1 we showed convergence of point processes including time components. Therefore, we can additionally derive results for the record times \(L(k), k\ge 1\) of the running maxima of the points \(T_\textbf{i}=g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m})\) for \(\textbf{i}=(i_1,\ldots ,i_m)\), which are recursively defined as follows:
(c.f. Sections 5.4.3 and 5.4.4 of Embrechts et al. 1997). To prove point process convergence for the record times we need the convergence in distribution of the sequence of processes \((Y_n(t), 0<t\le 1)\) in D(0, 1], the space of right continuous functions on (0, 1] with finite limits existing from the left, defined by
where \(\lfloor x\rfloor =\max \{y\in \mathbb {Z}:y\le x\}\) for \(x\in \mathbb {R}\), towards an extremal process. We call \(Y=(Y(t))_{t>0}\) an extremal process generated by the distribution function H, if the finite-dimensional distributions are given by
where \(k\ge 1\), \(0<t_1<\ldots <t_k\), \(x_i\in \mathbb {R}\) and \(1\le i\le k\) (see Embrechts et al. 1997, Definition 5.4.3). To define convergence in distribution in D(0, 1] we first need to introduce a metric \(\mathcal {D}\) on D(0, 1]. To this end, let \(\Lambda _{[0,1]}\) be a set of homeomorphisms
Then for \(f,g\in D[0,1]\) the Skorohod metric \(\tilde{\mathcal {D}}\) is defined by (see Billingsley 1999, Section 12)
Now set
where \(\tilde{f}\) and \(\tilde{g}\) are the right continuous extensions of f and g on [0, 1].The space of functions D[0, 1] and therefore D(0, 1] is separable under the Skorohod metric but not complete. However, one can find an equivalent metric, i.e., a metric which generates the same Skorohod topology, under which D[0, 1] is complete (see Billingsley 1999, Theorem 12.2). In particular, the Skorohod metric and the equivalent metric generate the same open sets and thus the \(\sigma\)-algebras of the Borel sets, which are generated by these open sets, are the same. Therefore, a sequence of probability measures on D(0, 1] is relatively compact if and only if it is tight (Billingsley 1999, Section 13). Hence, for every tight sequence of probability measures on D(0, 1] the convergence of the finite dimensional distributions on all continuity points of the limit distribution implies convergence in distribution (Billingsley 1999, Theorem 13.1).
For the PRM \(M=\sum _{i=1}^\infty \varepsilon _{(U_i,\Delta _i)}\), where \((U_i)_i\) is an iid sequence of random vectors uniformly distributed on \(S_1\) and
we set
where \(U_i^{(m)}\) is the m-th component of \(U_i\). Then the process Y has the finite dimensional distributions in (2.5) for \(k\ge 1\), \(0<t_i\le 1\), \(x_i\in \mathbb {R}\) and \(1\le i\le k\). Therefore, Y is an extremal process generated by H restricted to the interval (0, 1]. For these processes we can show the following invariance principle by application of the continuous mapping theorem (see Billingsley 1999, Theorem 2.7 or Resnick 2008, p. 152).
Proposition 2.5
Under the conditions of Theorem 2.1 and if \(H(\cdot )=\exp (-\mu (\cdot ))\) is an extreme value distribution it holds that
in D(0, 1] with respect to the metric \(\mathcal {D}\).
Since Y is a nondecreasing function, which is constant between isolated jumps, it has only countably many discontinuity points. Now let \((\tau _n)_n\) be the sequence of these discontinuity points of Y. Notice that by Embrechts et al. (1997), Theorem 5.4.7 the point process \(\sum _{k=1}^\infty \varepsilon _{\tau _k}\) is a PRM with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\). We are ready to state our result for the point process of record times.
Theorem 2.6
Under the conditions of Theorem 2.1 and if \(H(\cdot )=\exp (-\mu (\cdot ))\) is an extreme value distribution it holds that
in \(\mathcal {M}(0,1]\), the space of point measures on (0, 1].
Based on Theorem 2.6 we can make statements about the time points of the last and second last record at or before p.
Corollary 2.7
Assume the conditions of Theorem 2.6 and let \(\zeta (p)\) be the number of records among the random variables
Then the following statements hold for \(x,y\in (0,1]\) as \(n\rightarrow \infty\).
-
(1)
\({\mathbb P}(p^{-1}L(\zeta (p))\le x)={\mathbb P}(J_n(x,1]=0)\rightarrow {\mathbb P}(J(x,1]=0)=x\).
-
(2)
\({\mathbb P}(p^{-1}L(\zeta (p))\le x, p^{-1}L(\zeta (p)-1)\le y)\rightarrow y+y\log (x/y)\) for \(x>y\).
-
(3)
\({\mathbb P}(p^{-1}(L(\zeta (p))-L(\zeta (p)-1))\le x)\rightarrow x(1-\log (x))\).
Proof
Let \(0<y<x\le 1\). Part (1) is a direct consequence of the definitions of \(\zeta\) and L. Part (2) follows by
as \(n\rightarrow \infty\) and
To prove part (3) we assume that \(\tau ^{(1)}\) and \(\tau ^{(2)}\) are the first and the second upper order statistics of \((\tau _n)_n\). These upper order statistics exist since for every \(a>0\) there are only finitely many \(\tau _n\in [a,1]\). Then, we know by part (2) that
Since
we need to calculate \({\mathbb P}(\tau ^{(1)}-\tau ^{(2)}\le x)\). The joint density of \(\tau ^{(1)}\) and \(\tau ^{(2)}\) can be deduced from (2.6), it is
Hence, we get the following distribution function of \(\tau ^{(1)}-\tau ^{(2)}\)
which completes the proof.
3 Applications
3.1 Relative ranks
In recent years, maximum-type tests based on the convergence in distribution of the maximum of rank statistics of a data set have gained significant interest for statistical testing (Han et al. 2017). Let \(\textbf{y}_1,\ldots ,\textbf{y}_n\) be p-dimensional iid random vectors with \(\textbf{y}_t=(X_{1t},\ldots ,X_{pt})\) following a continuous distribution to avoid ties. We write \(Q_{it}\) for the rank of \(X_{it}\) among \(X_{i1},\ldots , X_{in}\). Additionally, let \(R_{ij}^{(t)}\) be the relative rank of the j-th entry compared to the i-th entry; that is \(R_{ij}^{(t)} = Q_{ j t'}\) with \(t'\) such that \(Q_{i t'}=t\) for \(t=1,\ldots ,n\).
A simpler explanation of \(R_{ij}^{(t)}\) is that we look at the j-th and i-th rows of \((Q_{it})\) and find the location of t in the i-th row. Then we choose the value in the j-th row at this location.
Many important statistics are based on (relative) ranks; we consider two classes of such statistics in this section. First, we introduce the so–called simple linear rank statistics, which are of the form
where g is a Lipschitz function (also called score function), and \((c_{nt})\) with \(c_{nt}=n^{-1} f(t/(n+1))\) for a Lipschitz function f and \(\sum _{t=1}^n c_{nt}^2 >0\) are called the regression constants. An example of such a simple linear rank statistic is Spearman’s \(\rho\), which will be discussed in detail in Section 3.1.2. For \(1\le i<j\le p\) the relative ranks \((R_{ij}^{(t)})_{t=1}^n\) depend on the vectors \(\textbf{x}_i\) and \(\textbf{x}_j\), where \(\textbf{x}_k=(X_{k1},\ldots , X_{kn})\) for \(1\le k\le p\). We assume that the vectors \(\textbf{x}_1,\ldots \textbf{x}_p\) are independent. It is worth mentioning that the ranks \((Q_{it})\) remain the same if we transform the marginal distributions to the (say) standard uniform distribution. Thus, the joint distribution of \((R_{ij}^{(t)})_{t=1}^n\), and thereby the distribution of \(V_{ij}\), does not depend on the distribution of \(\textbf{x}_i\) or \(\textbf{x}_j\). Therefore, we may assume without loss of generality that the random vectors \(\textbf{x}_1,\ldots , \textbf{x}_p\) are identically distributed. We can write \(V_{ij}=g_{n,V}(\textbf{x}_i,\textbf{x}_j)\) for a measurable function \(g_{n,V}:\mathbb {R}^{2n}\rightarrow \mathbb {R}\).
Next, we consider rank-type U-statistics of order \(m<n\) of the form
where the symmetric kernel h is such that \(U_{ij}\) depends only on \((R_{ij}^{(t)})_{t=1}^n\). An important example of a rank-type U- statistic is Kendall’s \(\tau\), which will be studied in Section 3.1.1. For more examples we refer to Han et al. (2017) and references therein. As for simple linear rank statistics, we are able to write \(U_{ij}=g_{n,U}(\textbf{x}_i,\textbf{x}_j)\), where \(g_{n,U}:\mathbb {R}^{2n}\rightarrow \mathbb {R}\) is a measurable function and \(\textbf{x}_1,\ldots \textbf{x}_p\) are iid random vectors.
An interesting property of rank-based statistics is the following pairwise independence. We also note that they are generally not mutually independent.
Lemma 3.1
(Lemma C4 in Han et al. 2017) For \(1\le i<j\le p\), let \(\Psi _{ij}\) be a function of the relative ranks \(\{R_{ij}^{(t)}, t=1,\ldots ,n\}\). Assume \(\textbf{x}_1,\ldots ,\textbf{x}_p\) are independent. Then for any \((i,j) \ne (k,l)\), \(i< j, k< l\), the random variables \(\Psi _{ij}\) and \(\Psi _{kl}\) are independent.
As an immediate consequence we obtain pairwise independence of \((U_{ij})\) and \((V_{ij})\), respectively.
Lemma 3.2
For any \((i,j) \ne (k,l)\), \(i< j, k< l\), the random variables \(V_{ij}\) and \(V_{kl}\) are independent and identically distributed. Moreover, \(U_{ij}\) and \(U_{kl}\) are independent and identically distributed.
We now want to standardize \(U_{ij}\) and \(V_{ij}\). By independence of \((X_{it})\), we have
where \(\overline{g}_n = n^{-1} \sum _{t=1}^ng(t/(n+1))\) is the sample mean of \(g(Q_{11}/(n+1)),\ldots , g(Q_{1n}/(n+1))\) and \(\overline{c}_n= \sum _{t=1}^nc_{nt}\). Expectation and variance of \(U_{ij}\) can also be calculated analytically. We set
and define the standardized versions of \(U_{ij}\) and \(V_{ij}\) by
It is well–known that \(\widetilde{V}_{ij}\) and \(\widetilde{U}_{ij}\) are asymptotically standard normal and the following lemma provides a complementary large deviation result.
Lemma 3.3
(Kallenberg 1982, p.404-405) Suppose that the kernel function h is bounded and non-degenerate. Then we have for \(x=o(n^{1/6})\) that
Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants \((c_{nt})_t\) satisfy
where C is some constant. Then it holds for \(x=o(n^{1/6})\)
For a discussion of (3.1), see (Kallenberg 1982, p.405). To proceed we need to find a suitable scaling and centering sequences for \(\widetilde{V}_{ij}\) and \(\widetilde{U}_{ij}\), respectively, such that the conditions of Theorem 2.1 are fulfilled. For an iid standard normal sequence \((X_i)\) it is known that
where \(\widetilde{d}_p=\sqrt{2\log p} - \tfrac{\log \log p+\log 4\pi }{2(2\log p)^{1/2}}\); see Embrechts et al. (1997, Example 3.3.29). Since we are dealing with \(p(p-1)/2\) random variables \((V_{ij})\) and \((U_{ij})\), respectively, which are asymptotically standard normal, \(d_p =\widetilde{d}_{p(p-1)/2}\) seems like a reasonable choice for scaling and centering sequences.
Our main result for rank-statistics is the following.
Theorem 3.4
-
(a)
Suppose that the kernel function h is bounded and non-degenerate. If \(p = \exp (o(n^{1/3}))\), the following point process convergence holds
$$\begin{aligned} N_n^U := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \widetilde{U}_{ij}- d_p)} {\mathop {\rightarrow }\limits ^{d}}N :=\sum _{i=1}^{\infty } \varepsilon _{-\log \Gamma _i}\,,\qquad n\rightarrow \infty \,, \end{aligned}$$(3.2)where \(\Gamma _i= E_1+\cdots +E_i\), \(i\ge 1\), and \((E_i)\) are iid standard exponential, i.e., N is a Poisson random measure with mean measure \(\mu (x,\infty )={{\,\textrm{e}\,}}^{-x}\), \(x\in \mathbb {R}\).
-
(b)
Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants \((c_{nt})_t\) satisfy (3.1). Then if \(p=\exp (o(n^{1/3}))\), it holds that
$$\begin{aligned} N_n^V := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \widetilde{V}_{ij}- d_p)} {\mathop {\rightarrow }\limits ^{d}} \, N\,,\qquad n\rightarrow \infty \,. \end{aligned}$$(3.3)
Proof
We start with the proof of (3.3) for which we will use Theorem 2.1, as \(\textbf{x}_1,\ldots \textbf{x}_p\) are iid and \(g_{n,V}\) is a measurable function. Therefore, we only have to show that for \(x\in \mathbb {R}\) it holds
-
(1)
\(\frac{p(p-1)}{2}{\mathbb P}(\widetilde{V}_{12}>x_p)\rightarrow {{\,\textrm{e}\,}}^{-x}\) as \(n\rightarrow \infty\),
-
(2)
\(p^3{\mathbb P}(\widetilde{V}_{12}>x_p, \widetilde{V}_{13}>x_p)\rightarrow 0\) as \(n\rightarrow \infty\),
where \(x_p=x/d_p+d_p\). We will begin with the proof of (1). Since \(x_p\sim d_p=o(n^{1/6})\) we get by Lemma 3.3
and by Mill’s ratio we have (writing \(\tilde{p}=\tfrac{p(p-1)}{2}\))
Regarding (2), we note that, by Lemma 3.2, \(\widetilde{V}_{12}\) and \(\widetilde{V}_{13}\) are independent. Thus, we get
where we used Lemma 3.3 and Mill’s ratio in the last two steps. That completes the proof of (3.3). The proof of (3.2) follows by analogous arguments.\(\square\)
Remark 3.5
Theorem 3.4 is a generalization of Theorems 1 and 2 in Han et al. (2017) who proved under the conditions of Theorem 3.4 and if \(p = \exp (o(n^{1/3}))\) that
and
As in Theorem 2.6, we additionally conclude point process convergence for the record times of the maxima of \(V_{ij}\) and \(U_{ij}\). To this end, we investigate the sequence \((\max _{1\le i<j\le k}U_{ij})_{k\ge 1}\). This sequence jumps at time k if one of the random variables \(U_{1k},\ldots , U_{k-1,k}\) is larger than every \(U_{ij}\) for \(1\le i<j\le k-1\). Between these jump (or record) times the sequence is constant.
Let \(L^U\) be this sequence of record times defined by
and let \(L^V\) be constructed analogously.
Theorem 3.6
Under the conditions of Theorem 3.4 it holds that
in \(\mathcal {M}(0,1]\), the space of point measures on (0, 1], where J is a Poisson random measure with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\).
As in Corollary 2.7, we can draw conclusions on the index of the last and second last jump before or at p. Let \(\zeta ^U(p)\) be the number of records among \(\max _{1\le i<j\le 2}U_{ij},\ldots , \max _{1\le i<j\le p}U_{ij}\). Then, as \(n\rightarrow \infty\), we have for \(x,y \in (0,1]\)
-
(1)
\({\mathbb P}(p^{-1}L^U(\zeta ^U(p))\le x)\rightarrow {\mathbb P}(J(x,1]=0)=x\),
-
(2)
\({\mathbb P}(p^{-1}L^U(\zeta ^U(p))\le x, p^{-1}L^U(\zeta ^U(p)-1)\le y)\rightarrow y+y\log (x/y)\) for \(x>y\),
-
(3)
\({\mathbb P}(p^{-1}(L^U(\zeta ^U(p))-L^U(\zeta ^U(p)-1))\le x)\rightarrow x(1-\log (x))\),
where (3) gives information about how much time elapses between the second last and the last jump of \((\max _{1\le i<j\le k}U_{ij})_{k\ge 1}\) before or at p.
3.1.1 Kendall’s tau
Kendall’s tau is an example of a rank-type U-statistic with bounded kernel. For \(i\ne j\) Kendall’s tau \(\tau _{ij}\) measures the ordinal association between the two sequences \((X_{i1},\ldots ,X_{in})\) and \((X_{j1},\ldots ,X_{jn})\). It is defined by
where the function \({\text {sign}}:\mathbb {R}\rightarrow \{1,0,-1\}\) is given by \({\text {sign}}(x)=x/|x|\) for \(x\ne 0\) and \({\text {sign}}(0)=0\). An interesting property of Kendall’s tau is that there exists a representation as a sum of independent random variables. We could not find this representation in the literature. Therefore, we state it here. The proof can be found in Section 4.
Proposition 3.7
We have
where \((D_i)_{i\ge 1}\) are independent random variables with \(D_i\) being uniformly distributed on the numbers \(-i/2, -i/2+1, \ldots , i/2\).
From Proposition 3.7 we deduce \({\mathbb E}[\tau _{ij}]=0\) and \({\text {Var}}(\tau _{ij})=\tfrac{2(2n+5)}{9n(n-1)}\). The next result is a corollary of Theorem 3.4.
Corollary 3.8
Under the conditions of Theorem 3.4 we have
3.1.2 Spearman’s rho
An example of a simple linear rank statistic is Spearman’s rho, which is a measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function. Recall that \(Q_{ik}\) and \(Q_{jk}\) are the ranks of \(X_{ik}\) and \(X_{jk}\) among \(\{X_{i1},\ldots , X_{in}\}\) and \(\{X_{j1},\ldots , X_{jn}\}\), respectively, and write \(q_n=(n+1)/2\) for the average rank. Then for \(1\le i\ne j\le p\) Spearman’s rho is defined by
For mean and variance we get
Therefore, we obtain the following corollary of Theorem 3.4.
Corollary 3.9
Under the conditions of Theorem 3.4 it holds that
The next auxiliary result allows us to transfer the weak convergence of a sequence of point processes to a another sequence of point processes, provided that the maximum distance between their points tends to zero in probability.
Proposition 3.10
For arrays \((X_{i,n})_{i,n\ge 1}\) and \((Y_{i,n})_{i,n\ge 1}\) of real-valued random variables, let \(N^X_n=\sum _{i=1}^p \varepsilon _{X_{i,n}}\) and assume that \(N^X_n{\mathop {\rightarrow }\limits ^{d}}N\). Consider a point process \(N^Y_n=\sum _{i=1}^p \varepsilon _{Y_{i,n}}\). If
then \(N^Y_n{\mathop {\rightarrow }\limits ^{d}}N\).
Example 3.11
It turns out that there is an interesting connection between Spearman’s rho and Kendall’s tau. By Hoeffding (1948, p.318) we can write Spearman’s rho as
where
is the major part of Spearman’s rho. Therefore, \(r_{ij}\) is a U-statistic of degree three with an asymmetric bounded kernel and with
We now use Proposition 3.10 and Corollary 3.9 to show that
For this purpose we consider the following difference
By (3.4), (3.6) and (3.5) this expression is asymptotically equal to
Since \(|\tau _{ij}|\) and \(|r_{ij}|\) are bounded above by constants, we deduce that
which verifies the condition in Proposition 3.10. Since \(N_n^{\rho }{\mathop {\rightarrow }\limits ^{d}}N\) by Corollary 3.9, we conclude the desired (3.7).
3.2 Sample covariances
An important field of current research is the estimation and testing of high-dimensional covariance structures. It finds application in genomics, social science and financial economics; see Cai (2017) for a detailed review and more references. Under quite general assumptions (Xiao and Wu 2013) investigated the maximum off-diagonal entry of a high-dimensional sample covariance matrix. We impose the same model assumptions (compare Xiao and Wu 2013, p. 2901-2903), but instead of the maximum we study the point process of off-diagonal entries.
We start by describing the model and spelling out the required assumptions. Let \(\textbf{x}_1,\ldots ,\textbf{x}_n\) be p-dimensional iid random vectors with \(\textbf{x}_i=(X_{1i},\ldots ,X_{pi})\), where \({\mathbb E}[X_{ji}]=0\) for \(1\le j\le p\) and \(\bar{X}_j:=\frac{1}{n}\sum _{k=1}^n X_{jk}\). Denote \(\Sigma =(\sigma _{i,j})_{1\le i,j\le p}\) as the covariance matrix of the vector \(\textbf{x}_1\) and assume \(\sigma _{i,i}=1\) for \(1\le i\le p\). The empirical covariance matrix \((\hat{\sigma }_{i,j})_{1\le i,j\le p}\) is given by
A fundamental problem in high-dimensional inference is to derive the asymptotic distribution of \(\max _{1\le i<j\le p} |\hat{\sigma }_{i,j}-\sigma _{i,j}|\). Since the \(\hat{\sigma }_{i,j}\)’s might have different variances we need to standardize \(\hat{\sigma }_{i,j}\) by \(\theta _{i,j}={\text {Var}}(X_{i1}X_{j1})\), which can be estimated by
We are interested in the points
Let \(\mathcal {I}_n=\{(i,j): 1\le i<j\le p\}\) be an index set. We use the following notations to formulate the required conditions:
Now, we can draft the following conditions.
-
(B1) \(\liminf \limits _{n \rightarrow \infty }\theta _n>0\).
-
(B2) \(\limsup \limits _{n\rightarrow \infty }\gamma _n<1\).
-
(B3) \(\gamma _n(b_n)\log (b_n)=o(1)\) for any sequence \((b_n)\) such that \(b_n\rightarrow \infty\).
-
(B3’) \(\gamma _n(b_n)=o(1)\) for any sequence \((b_n)\) such that \(b_n\rightarrow \infty\) and for some \(\varepsilon >0\)
$$\begin{aligned} \sum _{\alpha ,\beta \in \mathcal {I}_n} ({\text {Cov}}(X_{i1}X_{j1},\,X_{k1}X_{l1}))^2=O(p^{4-\varepsilon })\quad \text {for}\, \alpha =(i,j),\,\beta =(k,l). \end{aligned}$$ -
(B4) For some constants \(t>0\) and \(0<r\le 2\), \(\limsup \limits _{n\rightarrow \infty } \mathcal {K}_n(t,r)<\infty\), and
$$\begin{aligned} \log p={\left\{ \begin{array}{ll}o(n^{r/(4+r)}),\quad &{}\text {if}\,\,0<r<2, \\ o(n^{1/3}(\log n)^{-2/3}),\quad &{}\text {if}\,\,r=2.\end{array}\right. } \end{aligned}$$ -
(B4’) \(\log p=o(n^{r/(4+3r)})\), \(\limsup \limits _{n\rightarrow \infty } \mathcal {K}_n(t,r)<\infty\) for some constants \(t>0\) and \(r>0\).
-
(B4”) \(p=O(n^q)\) and \(\limsup \limits _{n\rightarrow \infty }\mathcal {M}_n(4q+4+\delta )<\infty\) for some constants \(q>0\) and \(\delta >0\).
To be able to adopt parts of the proof of Theorem 2 in Xiao and Wu (2013) we consider (instead of \((M_{i,j})\)) the transformed points \((W_{i,j})\) given by
and we define the point processes
Theorem 3.12
Let \({\mathbb E}[\textbf{x}_1]=0\) and \(\sigma _{i,i}=1\) for all i, and assume (B1) and (B2). Then under any one of the following conditions:
-
(i)
(B3) and (B4),
-
(ii)
(B3’) and (B4’),
-
(iii)
(B3) and (B4”),
-
(iv)
(B3’) and (B4”),
it holds, that
where \(\Gamma _i= E_1+\cdots +E_i\), \(i\ge 1\), and \((E_i)\) are iid standard exponential, i.e., N is a Poisson random measure with mean measure \(\mu (x,\infty )={{\,\textrm{e}\,}}^{-x}\), \(x\in \mathbb {R}\).
Proof
Under condition (i) set \(\mathcal {E}_n=n^{-(2-r)/(4(r+4))}\) if \(0<r<2\), and \(\mathcal {E}_n=n^{-1/6}(\log n)^{1/3}(\log p)^{1/2}\) if \(r=2\). Under condition (ii) let \(\mathcal {E}_n=(\log p)^{1/2}n^{-r/(6r+8)}\). Under (i) or (ii) we set
where \(T_n=\mathcal {E}_n(n/(\log p)^3)^{1/4}\). Under conditions (iii) and (iv) we set
Additionally, we define \(\tilde{\sigma }_{i,j}={\mathbb E}[\tilde{X}_{i1}\tilde{X}_{j1}]\) and \(\tilde{\theta }_{i,j}={\text {Var}}[\tilde{X}_{i1}\tilde{X}_{j1}]\). We consider
and the transformed points
We will show that \(N_n^{(W_1)}:=\sum _{1\le i<j\le p} \varepsilon _{W_{1;i,j}}{\mathop {\rightarrow }\limits ^{d}}N\) and thus by Proposition 3.10 \(N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N\).
Therefore, we first apply Kallenberg’s Theorem as in the proof of Theorem 2.1. We set
with disjoint intervals \(B_k=(r_k,s_k]\) and show
-
(1)
\(\lim \limits _{n\rightarrow \infty } \mu _n^{(W_1)}(B)=\mu (B)\),
-
(2)
\(\lim \limits _{n\rightarrow \infty } {\mathbb P}(N_n^{(W_1)}(B)=0)={{\,\textrm{e}\,}}^{-\mu (B)}\),
where \(\mu _n^{(W_1)}(B)={\mathbb E}[N_n^{(W_1)}(B)]\) and \(\mu\) is defined by \(\mu (B_k)={{\,\textrm{e}\,}}^{-r_k}-{{\,\textrm{e}\,}}^{-s_k}\).
From the proof of Theorem 2 of Xiao and Wu (2013, p. 2910, 2913-2914) we know that the conditions of Xiao and Wu (2013, Lemma 6) are satisfied. Furthermore, from the proof of Lemma 6 (Xiao and Wu 2013, p. 2909-2910) we get that for \(z\in \mathbb {R}\) and
and \(d\in \mathbb {N}\)
which is equivalent to
where \(A=\{(i_1,j_1),\ldots ,(i_d,j_d)\}\). Therefore, we get for \(d=1\)
which proves (1). Regarding (2), we use that \(1-{\mathbb P}(N_n^{(W_1)}(B)=0)={\mathbb P}\Big (\bigcup _{1\le i<j\le p} A_{i,j}\Big )\), where \(A_{i,j}=\{W_{1;i,j}\in B\}\). By Bonferroni’s inequality we have for every \(k\ge 1\),
where \(A=\{(i_1,j_1),\ldots ,(i_d,j_d)\}\) and \(P_{A,B}={\mathbb P}(W_{1;i_1,j_1}\in B,\ldots ,W_{1;i_d,j_d}\in B)\). First letting \(n\rightarrow \infty\) and then \(k \rightarrow \infty\), we deduce from (3.8) and (3.9) that
This proves (2) and we get \(N_n^{(W_1)}{\mathop {\rightarrow }\limits ^{d}}N\). By Proposition 3.10 it remains to show
Fortunately, this is shown in the course of the proof of Theorem 2 of Xiao and Wu (2013, p. 2911-2916).\(\square\)
The following examples are motivated by Xiao and Wu (2013, p. 2903-2905).
Example 3.13
(Physical dependence). Assume that \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) is distributed as a stationary process of the following form. For a measurable function g and a sequence of iid random variables \((\epsilon _i)_{i\in \mathbb {Z}}\) we set \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) with
and let \(\textbf{x}_k\), \(2\le k\le n\), be iid copies of \(\textbf{x}_1\). Moreover, for an iid copy \((\epsilon '_i)_{i\in \mathbb {Z}}\) of \((\epsilon _i)_{i\in \mathbb {Z}}\) and
we define the physical dependence measure of order q (see Wu (2005))
Then, we conclude from Lemma 3 of Xiao and Wu (2013) and Theorem 3.12 the following statement.
Assume that \(0<\Psi _4(0)<\infty\) and \({\text {Var}}(X_{i1}X_{j1})>0\) for all \(i,j\in \mathbb {Z}\) and \(|{\text {Cor}}(X_{i1}X_{j1},X_{k1}X_{l1})|<1\) for all i, j, k, l, such that they are not all the same. Then, if either one of the conditions
-
(i)
\(\Psi _q(k)=o(1/\log k)\) as \(k\rightarrow \infty\) and one of the assumptions (B4) and (B4’) or
-
(ii)
\(\sum _{j=0}^p(\Psi _4(j))^2=O(p^{1-\delta })\) for some \(\delta >0\) and one of the assumptions (B4’) or (B4”)
is satisfied, we have
As a special case we consider the linear process \(X_{i1}=\sum _{j=0}^\infty a_j\epsilon _{i-j}\), where the \(\epsilon _j\) are iid with \({\mathbb E}[\epsilon _j]=0\) and \({\mathbb E}[|\epsilon _j|^q]<\infty\) with \(q\ge 4\) and for \(a_j\in \mathbb {R}\) it holds that \(\sum _{j=0}^\infty a_j^2 \in (0,\infty )\). Then the physical dependence measure is given by \(\delta _q(j)=|a_j|\,{\mathbb E}\big [ |\epsilon _0-\epsilon '_0|^q\big ]^{1/q}\). Moreover, the conditions \(0<\Psi _4(0)<\infty\) and \({\text {Var}}(X_{i1}X_{j1})>0\) for all \(i,j\in \mathbb {Z}\) and \(|{\text {Cor}}(X_{i1}X_{j1},X_{k1}X_{l1})|<1\) for all i, j, k, l, such that they are not all the same, are fulfilled. If \(a_j=j^{-\beta }\ell (j)\), where \(1/2<\beta <1\) and \(\ell\) is a slowly varying function, then \((X_{i1})\) is a long memory process. The smaller the value of \(\beta\), the stronger is the dependence between the \((X_{i1})\). If one of the assumptions (B4) or (B4’) is satisfied, then condition (i) is fulfilled for every \(\beta \in (1/2,1)\).
Example 3.14
(Non-stationary linear processes). As in the previous example, \(\textbf{x}_1, \ldots , \textbf{x}_n\) are iid random vectors. Now \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) is given by
where \((\epsilon _i)_{i\in \mathbb {Z}}\) is a sequence of iid random variables with mean zero, variance one and finite fourth moment and the sequences \((f_{i,t})_{t\in \mathbb {Z}}\) satisfy \(\sum _{t\in \mathbb {Z}}f_{i,t}^2=1\). Let \(\kappa _4\) be the fourth cumulant of \(\epsilon _0\) and
Assume that \(\kappa _4>-2\) and
By Section 3.2 of Xiao and Wu (2013, p. 2904-2905) and Theorem 3.12 we get the following result. If either
-
(i)
\(h_n(k_n)\log k_n=o(1)\) for any positive sequence \(k_n\) such that \(k_n\rightarrow \infty\) as \(n\rightarrow \infty\) and one of the assumptions (B4) and (B4’) or
-
(ii)
\(\sum _{k=1}^p(h_n(k))^2=O(p^{1-\delta })\) for some \(\delta >0\) and one of the assumptions (B4’) or (B4”)
holds, then we have \(N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N\) as \(n\rightarrow \infty\).
To illustrate these assumptions we consider the special case \(\textbf{x}_1:=(\epsilon _1,\ldots ,\epsilon _p) A_n\), where \(A_n\in \mathbb {R}^{p\times p}\) is a deterministic, symmetric matrix with \((A_n)_{i,j}= a_{ij}\) for \(1\le i,j\le p\). We assume that \(\sum _{t=1}^p a_{it}^2=1\) for every \(1\le i\le p\).
The covariance matrix of \(\textbf{x}_1\) is given by \({\text {Cov}}(\textbf{x}_1)=A_nA_n^T\) with \((A_nA_n^T)_{ij}=\sum _{t=1}^p a_{it}a_{jt}\). Observe that the diagonal entries are equal to 1. To satisfy assumption (3.10) we have to assume that the entries except for the diagonal are asymptotically smaller than 1, i.e.
We set
as a measure of how close the matrices \(A_n\) are to diagonal matrices. For the point process convergence either (i) or (ii) has to be satisfied for \(h_n\).
4 Proofs of the results
4.1 Proofs of the results in Section 2
Proof of Theorem 2.1
. We will follow the lines of the proof of Theorem 2.1 in Dabrowski et al. (2002). Since the mean measure \(\eta\) has a density, the limit process M is simple and we can apply Kallenberg’s Theorem (see for instance Embrechts et al. (1997), p.233, Theorem 5.2.2) or Kallenberg 1983, p.35, Theorem 4.7). Therefore, it suffices to prove that for any finite union of bounded rectangles
it holds that
-
(1)
\(\lim \limits _{n\rightarrow \infty } \eta _n(R)=\eta (R)\),
-
(2)
\(\lim \limits _{n\rightarrow \infty } {\mathbb P}(M_n(R)=0)={{\,\textrm{e}\,}}^{-\eta (R)}\).
Without loss of generality we can assume that the \(A_k\)’s are chosen to be disjoint. First we will show (1). Set \(T:=T_{(1,2,\ldots , m)}=g_n(\textbf{x}_1,\textbf{x}_2,\ldots , \textbf{x}_m)\). If \(q=1\) we get
Since assumption (A1) implies \(p^m /(m!)\,{\mathbb P}(T\in B_1)\rightarrow \mu (B_1)\), we obtain the convergence \(\eta _n(R)\rightarrow \eta (R)\) as \(n\rightarrow \infty\). The case \(q\ge 1\) follows by
To show (2), we let \(P_n\) be the probability mass function of the Poisson distribution with mean \(\eta _n(R)\). Then we have
where the last equality holds by (1). Therefore, we only have to estimate \(|{\mathbb P}(M_n(R)=0)-P_n(0)|\). For this we employ the Stein-Chen method (see Barbour et al. (1992) for a discussion). The Stein equation for the Poisson distribution \(P_n\) with mean \(\eta _n(R)\) is given by
This equation is solved by the function
By (4.11) we see that
Therefore, we only have to estimate the right hand side of (4.12) and to this end we set
For \(\textbf{k}\in D\) let
Then we have the disjoint union \(D=D_{1\textbf{k}}\overset{.}{\cup }\ D_{2\textbf{k}}\overset{.}{\cup }\ \{\textbf{k}\}\), and therefore,
Now, we bound (4.12) by
It suffices to show that both terms in (4.13) tend to zero as \(n\rightarrow \infty\). From Barbour and Eagleson (1984, p.400) we have the following bound for the increments of the solution of Stein’s equation
Using (4.14) the first term of (4.13) is bounded above by
Using the definitions of \(\eta _\textbf{k}\) and \(M_n^{(2)}(\textbf{k})\), we get
Since by assumption (A1) it holds that \(p^m{\mathbb P}(T\in B_i)\rightarrow m!(\mu (r_i^{(m+1)})-\mu (s_i^{(m+1)}))\) as \(n\rightarrow \infty\), and
(4.15) and thus the first term of (4.13) tend to zero as \(n\rightarrow \infty\). As every \(I_\textbf{k}\) only depends on \(T_\textbf{k}\) and because \(D_{1\ell }\) only contains elements which have no component in common with \(\ell\), \(M_n^{(1)}(\ell )\) and \(I_\ell\) are independent. Therefore, the second term of (4.13) equals
where the last inequality follows from (4.14). Since \(I_\textbf{k}\le 1\) because the \(A_i\) are disjoint, the right-hand side in (4.16) is bounded above by
We set \(D_{2\textbf{k},r}:=\{\ell \in D:|\{\ell _1,\ldots , \ell _m,k_1,\ldots ,k_m\}|=2m-r\}\). Notice that \(\dot{\bigcup }_{r=1}^{m-1} D_{2\textbf{k},r}=D_{2\textbf{k}}\). Therefore, (4.17) is equal to
By assumption (A2), we have \(p^{2m-r}{\mathbb P}(T_\textbf{k}\in B_i,T_\ell \in B_j)\rightarrow 0\) for \(r=1, \ldots m-1\) as \(n\rightarrow \infty\). Additionally, it holds that
Consequently the second term of (4.13) tends to zero as \(n\rightarrow \infty\). This completes the proof.\(\square\)
Proof of Proposition 2.5
We proceed similarly to the proof of Proposition 4.20 of Resnick (2008). We want to show that \(Y_n{\mathop {\rightarrow }\limits ^{d}}Y\). Therefore, we define a map from the space of point measures \(\mathcal {M}(S)\) to D(0, 1], the space of right continuous functions on (0, 1] with finite limits existing from the left, and show that this map is continuous. Then, the Proposition follows by the continuous mapping theorem.
To this end, for a point measure \(\textbf{m}=\sum _{k=1}^\infty \varepsilon _{(t_k,y_k)} \in \mathcal {M}(S)\) we define \(V_1:\mathcal {M}(S)\rightarrow D(0,1]\) through
where \(t^*=\sup \{s>0:\textbf{m}(((0,1]^{m-1}\times (0,s]\times (v,w))\cap S)=0\}\). \(V_1\) is well-defined except at \(\textbf{m}\equiv 0\). Recalling the definition of \(N_n\) in (2.3), we note that \(V_1(N_n)(t)=Y_n(t)\) and \(V_1(N)(t)=Y(t)\) for \(0<t\le 1\).
We will start by proving the continuity of \(V_1\) in the case, where \(\mu (x)=-\log (H(x))\) and H is the Gumbel distribution. In this case, N has a.s. the following properties
for any \(0<s<t<1\) and \(x\in \mathbb {R}\). Therefore, we only have to show continuity at \(\textbf{m}\in \mathcal {M}(S)\) with these properties. Let \((\textbf{m}_n)_n\) be a sequence of point measures in \(\mathcal {M}(S)\), which converges vaguely to \(\textbf{m}\) (\(\textbf{m}_n{\mathop {\rightarrow }\limits ^{v}}\textbf{m}\)) as \(n\rightarrow \infty\) (see Resnick 2008, p. 140). Since \(V_1(\textbf{m})\) is right continuous there exists a right continuous extension on [0, 1], which we denote with \(\widetilde{V_1(\textbf{m})}\). Now choose \(\beta <\widetilde{V_1(\textbf{m})}(0)\) such that \(\textbf{m}(S_1\times \{\beta \})=0\). As \(\textbf{m}_n{\mathop {\rightarrow }\limits ^{v}}\textbf{m}\), we can conclude from Resnick (2008, Proposition 3.12) that there exists a \(1\le q<\infty\) such that for n large enough
We enumerate and designate the q points in the following way \(((t_i^{(n)},j_i^{(n)}),\, 1\le i\le q)\) with \(0<t_{1,m}^{(n)}<\ldots<t_{q,m}^{(n)}<1\), where \(t_{i,m}^{(n)}\) is the m-th component of \(t_i^{(n)}\), such that by Resnick (2008, Proposition 3.13)
where \(((t_i,j_i),\, 1\le i\le q)\) is the analogous enumeration of points of \(\textbf{m}\) in \(S_1\times (\beta ,\infty )\). Now choose
small enough so that the \(\delta\)-spheres of the distinct points of the set \(\{(t_i,j_i)\}\) are disjoint and in \(S_1\times [\beta ,\infty )\). Pick n so large that every \(\delta\)-sphere contains only one point of \(\textbf{m}_n\). Then set \(\lambda _n:[0,1]\rightarrow [0,1]\) with \(\lambda _n(0)=0\), \(\lambda _n(1)=1\), \(\lambda _n(t_{i,m})=t_{i,m}^{(n)}\) and \(\lambda _n\) is linearly interpolated elsewhere on [0, 1]. For this \(\lambda _n\) it holds that
Thereby, we get
which finishes the proof. The Fréchet and the Weibull case follow by similar arguments.\(\square\)
Proof of Theorem 2.6
We will proceed similarly as in (Resnick 2008, p. 217-218) using the continuous mapping theorem again. Since Y is the restriction to (0, 1] of an extremal process (see Resnick 2008, Section 4.3), it is a nondecreasing function, which is constant between isolated jumps. Let \(D^\uparrow (0,1]\) be the subset of D(0, 1] that contains all functions with this property. Set
where \(\{t_i\}\) are the discontinuity points of x. Then \(V_2(Y_n)=\sum _{k=1}^p\varepsilon _{p^{-1}L(k)}\) and \(V_2(Y)=\sum _{k=1}^\infty \varepsilon _{\tau _k}\), where \((\tau _k)_k\) is the sequence of discontinuity points of the extremal process generated by the Gumbel distribution \(H=\Lambda\), c.f. above Theorem 2.6. By (Embrechts et al. 1997, Theorem 5.4.7) the point process \(\sum _{k=1}^\infty \varepsilon _{\tau _k}\) is a PRM with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\). According to Proposition 2.5, it suffices to show that \(V_2\) is continuous. Let \((x_n)_n\) be a sequence of functions in \(D^\uparrow (0,1]\) with \(\mathcal {D}(x_n,x)\rightarrow 0\) as \(n\rightarrow \infty\) for an \(x\in D^\uparrow (0,1]\). Then there exist \(\lambda _n\in \Lambda _{[0,1]}\) such that
where \(\tilde{x}_n\) and \(\tilde{x}\) are the right continuous extensions of \(x_n\) and x on [0, 1]. We want to prove the vague convergence
where \(\{t_i^{(n)}\}\) and \(\{t_i\}\) are the discontinuity points of \(x_n\) and x, respectively. Consider an arbitrary continuous function f on (0, 1] with compact support contained in an interval [a, b] with \(0<a<b\le 1\), and x is continuous at a and b. It suffices to show that
The functions \(x_n,x\in D^\uparrow (0,1]\) have only finitely many discontinuity points in [a, b]. Therefore, only a finite number of terms in the sums are not equal to zero. Because of (4.18) and (4.19) the jump times on [a, b] of \(x_n\) are close to those of x, which proves (4.20). Hence, \(V_2\) is continuous, which finishes the proof.
4.2 Proof of Proposition 3.7
Let \(\pi\) denote the permutation of \(\{1,\ldots ,n\}\) induced by the order statistics of \(X_{2 1}, \ldots , X_{2 n}\), i.e.,
where the continuity of the distribution of X was used to avoid ties. We can rewrite \(\tau _{12}\) as
Let \(\textbf{q}_n=(q_1,\ldots ,q_n)\) be a permutation of the set \(\{1, \ldots ,n\}\). If \(i<j\) and \(q_i>q_j\), we call the pair \((q_i, q_j)\) an inversion of the permutation \(\textbf{q}_n\).
Since the \(X_{11},\ldots , X_{1n}\) are iid, the permutation
consisting of the ranks is uniformly distributed on the set of the n! permutations of \(\{1,\ldots ,n\}\). By \(I_n\) we denote the number of inversions of \(\textbf{q}_n\). For \(s<t\), we have
In view of (4.21), this implies
By Kendall and Stuart (1973, p. 479) or Margolius (2001, p. 3) (see also Sachkov 1997) the moment generating function of \(I_n\) is
We recognize that \(\frac{1-{{\,\textrm{e}\,}}^{jt}}{j(1-{{\,\textrm{e}\,}}^t)}\) is the moment generating function of a uniform distribution on the integers \(0, 1, \ldots , j-1\). Let \((U_i)_{i\ge 1}\) be a sequence of independent random variables such that \(U_i\) is uniformly distributed on the integers \(0, 1, \ldots , i\). We get
establishing the desired result.
4.3 Proof of Proposition 3.10
Our idea is to transfer the convergence of \(N_n^X\) onto \(N_n^Y\). To this end, it suffices to show (see Kallenberg 1983, Theorem 4.2) that for any continuous function f on \(\mathbb {R}\) with compact support,
Suppose the compact support of f is contained in \([K+\gamma _0, \infty )\) for some \(\gamma _0>0\) and \(K\in \mathbb {R}\). Since f is uniformly continuous, \(\omega (\gamma ):= \sup \{|f(x)-f(y)|: x,y \in \mathbb {R}, |x-y| \le \gamma \}\) tends to zero as \(\gamma \rightarrow 0\). We have to show that for any \(\varepsilon >0\),
On the sets
we have
Therefore, we see that, for \(\gamma \in (0, \gamma _0)\),
By assumption, it holds \(\lim _{n\rightarrow \infty } {\mathbb P}(A_{n,\gamma }^c) =0\). Thus, letting \(\gamma \rightarrow 0\) establishes (4.22).
Data availability
Not applicable.
References
Baddeley, A.: Spatial point processes and their applications. In Stochastic geometry, vol. 1892 of Lecture Notes in Math. Springer, Berlin, pp. 1–75 (2007)
Barbour, A.D., Eagleson, G.K.: Poisson convergence for dissociated statistics. J. Roy. Statist. Soc. Ser. B 46(3), 397–402 (1984)
Barbour, A.D., Holst, L., Janson, S.: Poisson approximation, vol. 2 of Oxford Studies in Probability. The Clarendon Press, Oxford University Press, New York. Oxford Science Publications (1992)
Basrak, B., Planinić, H.: Compound Poisson approximation for regularly varying fields with application to sequence alignment. Bernoulli 27(2), 1371–1408 (2021)
Billingsley, P.: Convergence of probability measures, second ed. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York. A Wiley-Interscience Publication (1999)
Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106(494), 672–684 (2011)
Cai, T., Liu, W., Xia, Y.: Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108(501), 265–277 (2013)
Cai, T.T.: Global testing and large-scale multiple testing for high-dimensional covariance structures. Annu. Rev. Stat. Appl. 4, 423–446 (2017)
Cai, T.T., Jiang, T.: Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39(3), 1496–1525 (2011)
Chenavier, N., Henze, N., Otto, M.: Limit laws for large kth-nearest neighbor balls. J. Appl. Probab. 59(3), 880–894 (2022)
Clarke, R., Ressom, H.W., Wang, A., Xuan, J., Liu, M.C., Gehan, E.A., Wang, Y.: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer 8(1), 37–49 (2008)
Dabrowski, A.R., Dehling, H.G., Mikosch, T., Sharipov, O.: Poisson limits for \(U\)-statistics. Stochastic Process. Appl. 99(1), 137–157 (2002)
Decreusefond, L., Schulte, M., Thäle, C.: Functional Poisson approximation in Kantorovich-Rubinstein distance with applications to U-statistics and stochastic geometry. Ann. Probab. 44(3), 2147–2197 (2016)
Donoho, D.: High-dimensional data analysis: the curses and blessings of dimensionality. Technical Report, Stanford University (2000)
Drton, M., Han, F., Shi, H.: High-dimensional consistent independence testing with maxima of rank correlations. Ann. Statist. 48(6), 3206–3227 (2020)
Embrechts, P., Klüppelberg, C., Mikosch, T.: ‘Modelling Extremal Events for Insurance and Finance’. Applications of Mathematics (New York), vol. 33. Springer, Berlin (1997)
Gösmann, J., Stoehr, C., Heiny, J., Dette, H.: Sequential change point detection in high dimensional time series. Electron. J. Stat. 16(1), 3608–3671 (2022)
Han, F., Chen, S., Liu, H.: Distribution-free tests of independence in high dimensions. Biometrika 104(4), 813–828 (2017)
Heiny, J., Kleemann, C.: Maximum interpoint distance of high-dimensional random vectors. arXiv preprint arXiv:2302.06965 (2023)
Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Statistics 19, 293–325 (1948)
Johnstone, I.M., Titterington, D.M.: Statistical challenges of high-dimensional data. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367(1906)4237–4253 (2009)
Kallenberg, O.: Random measures, third ed. Akademie-Verlag, Berlin; Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London (1983)
Kallenberg, W.C.M.: Cramér type large deviations for simple linear rank statistics. Z. Wahrsch. Verw. Gebiete 60(3), 403–409 (1982)
Kendall, M.G., Stuart, A.: The advanced theory of statistics. Vol. 2, third ed. Hafner Publishing Co., New York. Inference and relationship (1973)
Margolius, B.H.: Permutations with inversions. J. Integer Seq. 4, 2, Article 01.2.4, 13 (2001)
Panaretos, V.M., Zemel, Y.: An invitation to statistics in Wasserstein space. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham (2020)
Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer Series in Operations Research and Financial Engineering. Springer, New York. Reprint of the 1987 original (2008)
Sachkov, V.N.: Probabilistic methods in combinatorial analysis, vol. 56 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge. Translated from the Russian, Revised by the author (1997)
Schulte, M., Thäle, C.: The scaling limit of Poisson-driven order statistics with applications in geometric probability. Stochastic Process. Appl. 122(12), 4096–4120 (2012)
Schulte, M., Thäle, C.: Poisson point process convergence and extreme values in stochastic geometry. In Stochastic analysis for Poisson point processes, vol. 7 of Bocconi Springer Ser. Bocconi Univ. Press, pp. 255–294 (2016)
Silverman, B., Brown, T.: Short distances, flat triangles and Poisson limits. J. Appl. Probab. 15(4), 815–825 (1978)
Wu, W.B.: Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. USA 102(40), 14150–14154 (2005)
Xiao, H., Wu, W.B.: Asymptotic theory for maximum deviations of sample covariance matrix estimates. Stochastic Process. Appl. 123(7), 2899–2920 (2013)
Zhou, C., Han, F., Zhang, X.-S., Liu, H.: An extreme-value approach for testing the equality of large U-statistic based correlation matrices. Bernoulli 25(2), 1472–1503 (2019)
Acknowledgements
We thank Christoph Thäle for fruitful discussions. The work of two anonymous reviewers is gratefully acknowledged.
Funding
Open access funding provided by Stockholm University.
Author information
Authors and Affiliations
Contributions
The authors contributed equally.
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Heiny, J., Kleemann, C. Point process convergence for symmetric functions of high-dimensional random vectors. Extremes (2023). https://doi.org/10.1007/s10687-023-00482-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10687-023-00482-w
Keywords
- Point process convergence
- Extreme value theory
- Poisson process
- Gumbel distribution
- High-dimensional data
- U-statistics
- Kendall’s tau
- Spearman’s rho