1 Introduction

In classical extreme value theory the asymptotic distribution of the maximum of random points plays a central role. Maximum type statistics build popular tests on the dependency structure of high-dimensional data. Especially, against sparse alternatives those tests possess good power properties (see Han et al. 2017; Drton et al. 2020; Zhou et al. 2019). Closely related to the maxima of random points are point processes, which play an important role in stochastic geometry and data analysis. They have applications in statistical ecology, astrostatistics and spatial epidemiology (Baddeley 2007). For a sequence \((Y_i)_i\) of real-valued random variables, we set

$$\begin{aligned} \widetilde{M}_p:=\sum _{i=1}^p\varepsilon _{(i/p,Y_i)}, \end{aligned}$$

where \(\varepsilon _x\) is the Dirac measure in x. Let \(K:=(0,1)\times (u,\infty )\) with \(u\in \mathbb {R}\). Then, \(\widetilde{M}_p(K)\) counts the number of exceedances of the threshold u by the random variables \(Y_1,\ldots , Y_p\). If \(Y^{(k)}\) denotes the k-th upper order statistic of \(Y_1,\ldots ,Y_p\), it holds that \(\{\widetilde{M}_p(K)<k\}=\{Y^{(k)}\le u\}\), and in particular \(\{\widetilde{M}_p(K)=0\}=\{\max _{i=1,\ldots ,p} Y_i\le u\}\). Therefore, the weak convergence of a sequence of point processes gives information about the joint asymptotic distribution of a fixed number of upper order statistics. If the sequence \((Y_i)_i\) consists of independent and identically distributed (iid) random variables, maximum convergence and point process convergence are equivalent, but if the random variables exhibit dependency, this equivalence does not necessarily hold anymore. In this sense, point process convergence is a substantial generalization of the maximum convergence. Additionally, the time components i/p deliver valuable information of the random time points when a record occurs, i.e., the time points when \(Y_j>\max _{i=1,\ldots ,j-1}Y_i\).

Our main motivation comes from statistical inference for high-dimensional data, where the asymptotic distribution of the maximum of dependent random variables has found several applications in recent years (see for example Han et al. 2017; Drton et al. 2020; Zhou et al. 2019; Cai and Jiang 2011; Cai et al. 2013; Cai 2017; Cai and Liu 2011; Gösmann et al. 2022). The objective of this paper is to provide the methodology to extend meaningful results with reference to the convergence of the maximum of dependent random variables, to point process convergence.

To this end, we consider dependent points \(T_{\textbf{i}}:=g_{n,p}(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots , \textbf{x}_{i_m})\), where the index \(\textbf{i}= (i_1, i_2,\ldots , i_m)\in \{1,\ldots ,p\}^m\). The random vectors \(\textbf{x}_1,\ldots ,\textbf{x}_p\) are iid on \(\mathbb {R}^n\) and \(g_{n,p}:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) is a measurable, symmetric function. Important examples include U-statistics, simple linear rank statistics, rank-type U-statistics, the entries of sample covariance matrices or interpoint distances.

Additionally, we assume that the dimension of the points n is growing with the number of points p. Over the last decades the environment and therefore the requirements for statistical methods have changed fundamentally. Due to the huge improvement of computing power and data acquisition technologies one is confronted with large data sets, where the dimension of observations is as large or even larger than the sample size. These high-dimensional data occur naturally in online networks, genomics, financial engineering, wireless communication or image analysis (see Johnstone and Titterington 2009; Clarke et al. 2008; Donoho 2000). Hence, the analysis of high-dimensional data has developed as a meaningful and active research area.

We will show that the corresponding point process of the points \(T_\textbf{i}\) converges to a Poisson random measure (PRM) with a mean measure that involves the m-dimensional Lebesgue measure and an additional measure \(\mu\). If we replace the points \(T_\textbf{i}\) with iid random variables with the same distribution, the (non-degenerate) limiting distribution of the maximum will necessarily be an extreme value distribution of the form \(\exp (-\mu (x))\). Moreover, the convergence of the corresponding point process will be equivalent to the condition

$$\begin{aligned} \left( {\begin{array}{c}p\\ m\end{array}}\right) {\mathbb P}(g_{n,p}(\textbf{x}_1,\textbf{x}_2,\ldots , \textbf{x}_m)>x) \rightarrow \mu (x),\qquad p\rightarrow \infty . \end{aligned}$$
(1.1)

However, since the random points \(T_\textbf{i}\) are not independent, we additionally need the following assumption on the dependence structure

$$\begin{aligned} {\mathbb P}(g_{n,p}(\textbf{x}_1,\textbf{x}_2,\ldots ,\textbf{x}_m)>x, g_{n,p}(\textbf{x}_{m-l+1},\ldots , \textbf{x}_{2m-l})>x)=o(p^{-(2m-l)}), \qquad p\rightarrow \infty , \end{aligned}$$
(1.2)

where \(l=1,\ldots , m-1\).

In the finite-dimensional case where n is fixed, several results about point process convergence are available in similar settings. In Silverman and Brown (1978), Silverman and Brown showed point process convergence for \(m=2\), \(n=2\) and \(g_{2,p}(\textbf{x}_i,\textbf{x}_j)=a_p\Vert \textbf{x}_i-\textbf{x}_j\Vert _2^2\), where the \(\textbf{x}_i\) have a bounded and almost everywhere continuous density, \(a_p\) is a suitable scaling sequence and \(\Vert \cdot \Vert _2\) is the Euclidean norm on \(\mathbb {R}^2\). In the Weibull case \(\mu (x)= x^\alpha\) for \(x,\alpha >0\), Dabrowski et al. (2002) proved a generalization to points with a fixed dimension and \(g_{n,p}(\textbf{x}_i,\textbf{x}_j)=a_ph(\textbf{x}_i,\textbf{x}_j)\), where h is a measurable, symmetric function and \(a_p\) is a suitable scaling sequence.

Also in the finite-dimensional case, under similar assumptions as in (1.1) with \(\mu (x)=\beta x^\alpha\) for \(x,\alpha >0\), \(\beta \in \mathbb {R}\) and under condition (1.2), Schulte and Thäle (2012) showed convergence in distribution of point processes towards a Weibull process. The points of these point processes are obtained by applying a symmetric function \(g_{n,p}\) to all m-tuples of distinct points of a Poisson process on a standard Borel space. In Schulte and Thäle (2016), this result was extended to more general functions \(\mu\) and to binomial processes so that other PRMs were possible limit processes. In Decreusefond et al. (2016), Decreusefond, Schulte and Thäle provided an upper bound of the Kantorovich-Rubinstein distance between a PRM and the point process induced in the aforementioned way by a Poisson or a binomial process on an abstract state space. Notice that convergence in Kantorovich-Rubinstein distance implies convergence in distribution (see Panaretos and Zemel 2020, Theorem 2.2.1 or Decreusefond et al. 2016, p. 2149). In Chenavier et al. (2022) another point process result in a similar setting is given for the number of nearest neighbor balls in fixed dimension. Moreover, Basrak and Planinić (2021) presents a general framework for Poisson approximation of point processes on Polish spaces.

1.1 Structure of this paper

The remainder of this paper is structured as follows. In Section 2 we prove weak point process convergence for the dependent points \(T_{\textbf{i}}\) in the high-dimensional case as tool for the generalization of the convergence of the maximum (Theorem 2.1). We provide popular representations of the limiting process in terms of the transformed points of a homogeneous Poisson process. Moreover, we derive point process convergence for the record times. In Section 3 these tools are applied to study statistics based on relative ranks like simple linear rank statistics or rank-type U-statistics. We also prove convergence of the point processes of the off-diagonal entries of large sample covariance matrices. The technical proofs are deferred to Section 4.

1.2 Notation

Convergence in distribution (resp. probability) is denoted by \({\mathop {\rightarrow }\limits ^{d}}\) (resp. \({\mathop {\rightarrow }\limits ^{{\mathbb P}}}\)) and unless explicitly stated otherwise all limits are for \(n\rightarrow \infty\). For sequences \((a_n)_n\) and \((b_n)_n\) we write \(a_n=O(b_n)\) if \(a_n/b_n\le C\) for some constant \(C>0\) and every \(n\in \mathbb {N}\), and \(a_n=o(b_n)\) if \(\lim _{n\rightarrow \infty } a_n/b_n=0\). Additionally, we use the notation \(a_n\sim b_n\) if \(\lim _{n\rightarrow \infty } a_n/b_n=1\) and \(a_n\lesssim b_n\) if \(a_n\) is smaller than or equal to \(b_n\) up to a positive universal constant. We further write \(a\wedge b:=\min \{a,b\}\) for \(a,b\in \mathbb {R}\) and for a set A we denote |A| as the number of elements in A.

2 Point process convergence

We introduce the model that was briefly described in the introduction. Let \(\textbf{x}_1,\ldots ,\textbf{x}_p\) be iid \(\mathbb {R}^n\)-valued random vectors with \(\textbf{x}_i=(X_{i1}, \ldots , X_{in})^\top , i=1,\ldots ,p\), where \(p=p_n\) is some positive integer sequence tending to infinity as \(n\rightarrow \infty\).

We consider the random points

$$\begin{aligned} T_{\textbf{i}}:=g_{n,p}(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots , \textbf{x}_{i_m}), \end{aligned}$$

where \(\textbf{i}=(i_1, i_2,\ldots , i_m)\in \{1,\ldots ,p\}^m\) and \(g_n=g_{n,p}:\mathbb {R}^{mn}\rightarrow \mathbb {R}\) is a measurable and symmetric function, where symmetric means \(g_{n}(\textbf{y}_1,\textbf{y}_2,\ldots , \textbf{y}_m)=g_{n}(\textbf{y}_{\pi (1)},\textbf{y}_{\pi (2)},\ldots , \textbf{y}_{\pi (m)})\) for all \(\textbf{y}_1,\textbf{y}_2,\ldots , \textbf{y}_m \in \mathbb {R}^n\) and all permutations \(\pi\) on \(\{1,2,\ldots , m\}\). We are interested in the limit behavior of the point processes \(M_n\) towards a PRM M,

$$\begin{aligned} M_n=\sum _{1\le i_1<i_2<\ldots <i_m\le p} \varepsilon _{(\textbf{i}/p,\,T_{\textbf{i}})} {\mathop {\rightarrow }\limits ^{d}}M\,,\qquad n\rightarrow \infty \,, \end{aligned}$$

where \(\textbf{i}/p=(i_1/p,\ldots ,i_m/p)\). The limit M is a PRM with mean measure

$$\begin{aligned} \eta \Big (\mathbin {\mathop {\otimes }\limits _{l=1}^{m+1}}(r_l,s_l)\Big )=m!{}\lambda _m\Big (\mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l)\Big )\mu (r_{m+1},s_{m+1}), \end{aligned}$$

where \(\lambda _m\) is the Lebesgue measure on \(\mathbb {R}^m\). For an interval (ab) with \(a<b\in \mathbb {R}\) we write \(\mu (a,b):=\mu ((a,b)):=\mu (a)-\mu (b)\) and \(\mu :(v,w)\rightarrow \mathbb {R}^+=\{x\in \mathbb {R}:x\ge 0\}\) is a function satisfying \(\lim _{x \rightarrow v} \mu (x)=\infty\) and \(\lim _{x \rightarrow w} \mu (x)=0\) for \(v,w\in \bar{\mathbb {R}}=\mathbb {R}\cup \{\infty , -\infty \}\) and \(v<w\). Furthermore, we set \(\eta _n(\cdot ):={\mathbb E}[M_n(\cdot )]\). We consider the \(M_n\)’s and M as random measures on the state space

$$\begin{aligned} S=S_1\times (v,w)=\{(z_1, z_2,\ldots , z_m): 0<z_1<z_2<\ldots <z_m\le 1\}\times (v,w) \end{aligned}$$

with values in \(\mathcal {M}(S)\) the space of point measures on S, endowed with the vague topology (see Resnick 2008). The following result studies the convergence \(M_n {\mathop {\rightarrow }\limits ^{d}}M\), which denotes the convergence in distribution in \(\mathcal {M}(S)\).

Theorem 2.1

Let \(\textbf{x}_1,\ldots , \textbf{x}_p\) be n-dimensional, independent and identically distributed random vectors and \(p=p_n\) is some sequence of positive integers tending to infinity as \(n\rightarrow \infty\). Additionally, let \(g=g_n:\mathbb {R}^{mn}\rightarrow (v,w)\) be a measurable and symmetric function, where \(v,w\in \bar{\mathbb {R}}=\mathbb {R}\cup \{\infty , -\infty \}\) and \(v<w\). Assume that there exists a function \(\mu :(v,w)\rightarrow \mathbb {R}^+\) with \(\lim _{x \rightarrow v} \mu (x)=\infty\) and \(\lim _{x \rightarrow w} \mu (x)=0\) such that, for \(x\in (v,w)\) and \(n\rightarrow \infty\),

  • (A1) \(\left( {\begin{array}{c}p\\ m\end{array}}\right) {\mathbb P}(g_n(\textbf{x}_1,\textbf{x}_2,\ldots , \textbf{x}_m)>x) \rightarrow \mu (x)\) and

  • (A2) \({\mathbb P}(g_n\!(\textbf{x}_1,\textbf{x}_2,\ldots ,\textbf{x}_m)\!>\!x,g_n\!(\textbf{x}_{m-l+1},\ldots , \textbf{x}_{2m-l})\!>\!x)\!=\!o(p^{-(2m-l)})\) for \(l=\!1,\ldots , m\!-\!1\).

Then we have \(M_n{\mathop {\rightarrow }\limits ^{d}}M\).

Note that (A1) ensures the correct specification of the mean measure, while (A2) is an anti-clustering condition. Both conditions are standard in extreme value theory. It is worth mentioning that

$$\lim _{n \rightarrow \infty }{\mathbb P}\Big (\max _{1\le i_1<i_2<\ldots <i_m\le p} T_{\textbf{i}}\le x\Big )=\exp (-\mu (x))=:H(x), \qquad x\in \mathbb {R},$$

where we use the conventions \(\mu (x)=0\) if \(x>w\), \(\mu (x)=\infty\) if \(x<v\), and \(\exp (-\infty )=0\). The typical distribution functions H are the Fréchet, Weibull and Gumbel distributions. In these cases, the limiting process M has a representation in terms of the transformed points of a homogeneous Poisson process. Let \((U_i)_i\) be an iid sequence of random vectors uniformly distributed on \(S_1\) and \(\Gamma _i=E_1+\ldots +E_i\), where \((E_i)_i\) is an iid sequence of standard exponentially distributed random variables, independent of \((U_i)_i\).

It is well–known that \(N_\Gamma :=\sum _{i=1}^\infty \varepsilon _{\Gamma _i}\) is a homogeneous Poisson process and hence it holds for every \(A\subset (0,\infty )\) that \(N_{\Gamma }(A)\) is Poisson distributed with parameter \(\lambda _1(A)\) (see for example Embrechts et al. 1997, Example 5.1.10). For the mean measure \(\eta\) of M we get for a product of intervals \(\mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\subset S_1\)

$$\begin{aligned} \eta \Big (\mathbin {\mathop {\otimes }\limits _{l=1}^{m+1}}(r_l,s_l]\Big )&=m!\lambda _m\Big (\mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\Big )(\mu (r_{m+1})-\mu (s_{m+1}))\\&=\sum _{i=1}^\infty {\mathbb P}\Big (U_i\in \mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\Big ){\mathbb E}[\varepsilon _{\Gamma _i}(\mu (s_{m+1}),\mu (r_{m+1})]\\&={\mathbb E}\Big [\sum _{i=1}^\infty \varepsilon _{(U_i,\Gamma _i)}\Big (\mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\times (\mu (s_{m+1}),\mu (r_{m+1})]\Big )\Big ], \end{aligned}$$

where we used in the second line that

$$\begin{aligned} {\mathbb P}\Big (U_i\in \mathbin {\mathop {\otimes }\limits _{l=1}^{m}}(r_l,s_l]\Big )=m!\prod _{l=1}^m(s_l-r_l) \end{aligned}$$

as \(U_i\) is uniformly distributed on \(S_1\) for every i and

$$\begin{aligned} {\mathbb E}[N_\Gamma (\mu (s_{m+1}),\mu (r_{m+1}))]=\lambda _1(\mu (s_{m+1}),\mu (r_{m+1})). \end{aligned}$$

We get the following representations for the limiting processes M.

  • Fréchet case: For \(\alpha >0\) the Fréchet distribution is given by \(\Phi _\alpha (x)=\exp ({-x^{-\alpha }})\), \(x>0\). For \(0<r<s<\infty\) we have \(\mu _{\Phi _\alpha }(r,s]=r^{-\alpha }-s^{-\alpha }\) and therefore, we can write

    $$\begin{aligned} M=M_{\Phi _\alpha }=\sum _{i=1}^\infty \varepsilon _{(U_i,\Gamma _i^{-1/\alpha })}. \end{aligned}$$
  • Weibull case: For \(\alpha >0\) the Weibull distribution is given by \(\Psi _\alpha (x)=\exp ({-|x|^\alpha })\), \(x<0\). For \(-\infty<r<s<0\) we have \(\mu _{\Psi _{\alpha }}(r,s]=|r|^\alpha -|s|^\alpha\) and

    $$\begin{aligned} M=M_{\Psi _\alpha }=\sum _{i=1}^\infty \varepsilon _{(U_i,-\Gamma _i^{1/\alpha })}. \end{aligned}$$
  • Gumbel case: The Gumbel distribution is given by \(\Lambda (x)=\exp ({-{{\,\textrm{e}\,}}^{-x}})\) for all \(x\in \mathbb {R}\). For \(-\infty<r<s<\infty\) we have \(\mu _{\Lambda }(r,s]={{\,\textrm{e}\,}}^{-r}-{{\,\textrm{e}\,}}^{-s}\) and

    $$\begin{aligned} M=M_\Lambda =\sum _{i=1}^\infty \varepsilon _{(U_i,-\log \Gamma _i)}. \end{aligned}$$

Besides the points \(T_\textbf{i}\), time components \(\textbf{i}/p=(i_1/p,\ldots ,i_m/p)\) with \(1\le i_1\le \ldots \le i_m\le p\) are considered in the definition of the point process \(M_n\). Whenever we do not need the time components in the following, we will use the shorthand notation

$$\begin{aligned} N_n(\cdot ):=M_n(S_1\times \cdot )=\sum _{1\le i_1\le \ldots \le i_m\le p}\varepsilon _{T_\textbf{i}}(\cdot ). \end{aligned}$$
(2.3)

Under the conditions of Theorem 2.1, \(N_n\) converges in distribution to \(N(\cdot ):=M(S_1\times \cdot )\) which is a PRM with mean measure \(\mu\).

A direct consequence of the point process convergence is the convergence of the joint distribution of a fixed number of upper order statistics. In the Fréchet, Weibull and Gumbel cases the limit function can be described as the joint distribution function of transformations of the points \(\Gamma _i\).

Corollary 2.2

Let \(G_{n,(j)}\) be the j-th upper order statistic of the random variables \((g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m}))\), where \(1\le i_1<i_2<\ldots <i_m\le p\). Under the conditions of Theorem 2.1 and for a fixed \(k\ge 1\) the distribution function

$$\begin{aligned} {\mathbb P}(G_{n,(1)}\le x_1,\ldots ,G_{n,(k)}\le x_k), \end{aligned}$$

where \(x_k<\ldots <x_1\in (v,w)\), converges to

$$\begin{aligned} {\mathbb P}\Big (N(x_1,w)= 0, N(x_2,w)\le 1\ldots ,N(x_k,w)\le k-1\Big ), \end{aligned}$$

as \(n\rightarrow \infty\). In particular, in the Fréchet, Weibull and Gumbel cases, it holds that

$$\begin{aligned}&{\mathbb P}\Big (N(x_1,w)= 0, N(x_2,w)\le 1\ldots ,N(x_k,w)\le k-1\Big )\\&= {\left\{ \begin{array}{ll} {\mathbb P}(\Gamma _1^{-1/\alpha }\le x_1,\ldots , \Gamma _k^{-1/\alpha }\le x_k),\quad &{}\text {if}\,\,\,\mu =\mu _{\Phi _\alpha }, \\ {\mathbb P}(-\Gamma _1^{1/\alpha }\le x_1,\ldots , -\Gamma _k^{1/\alpha }\le x_k),\quad &{}\text {if}\,\,\,\mu =\mu _{\Psi _\alpha },\\ {\mathbb P}(-\log \Gamma _1\le x_1,\ldots , -\log \Gamma _k\le x_k),\quad &{}\text {if}\,\,\,\mu =\mu _{\Lambda }. \end{array}\right. } \end{aligned}$$

Proof

Since \(N_n(x,w)\) is the number of vectors \(\textbf{i}=(i_1,\ldots ,i_m)\) with \(1\le i_1<i_2<\ldots <i_m\le p\), for which \(g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m})\in (x,w)\), we get by Theorem 2.1 as \(n\rightarrow \infty\)

$$\begin{aligned} \begin{aligned} {\mathbb P}(G_{n,(1)}\le x_1,\ldots ,G_{n,(k)}\le x_k)&={\mathbb P}\Big (N_n(x_1,w)= 0,N_n(x_2,w)\le 1,\ldots ,N_n (x_k,w)\le k-1\Big ) \\&\rightarrow {\mathbb P}\Big (N(x_1,w)= 0,N(x_2,w)\le 1\ldots ,N(x_k,w)\le k-1\Big ). \end{aligned} \end{aligned}$$
(2.4)

By the representation of the limiting point process in the Fréchet, Weibull and Gumbel cases, (2.4) is equal to one of the three distribution functions in the corollary.

One field, where point processes find many applications, is stochastic geometry. The paper Schulte and Thäle (2012), for example, considers order statistics for Poisson k-flats in \(\mathbb {R}^d\), Poisson polytopes on the unit sphere and random geometric graphs.

Setting \(k=1\) in Corollary 2.2 we obtain the convergence in distribution of the maximum of the points \(T_\textbf{i}\).

Corollary 2.3

Under the conditions of Theorem 2.1 we get

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb P}\Big (\max _{1\le i_1<i_2<\ldots <i_m\le p} g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m})\le x\Big ) = \exp (-\mu (x)) \,, \qquad x\in \mathbb {R}\,. \end{aligned}$$
Fig. 1
figure 1

Four largest distances between 500 normal distributed points

Example 2.4

(Interpoint distances) Let \(\textbf{x}_i=(X_{i1}, \ldots , X_{in})^\top , i=1,\ldots ,p\) be n-dimensional random vectors, whose components \((X_{it})_{i,t\ge 1}\) are independent and identically distributed random variables with zero mean and variance 1. We are interested in the asymptotic behavior of the largest interpoint distances

$$\begin{aligned} D_{ij}= \Vert \textbf{x}_i -\textbf{x}_j \Vert ^2_2= \sum _{t=1}^n(X_{it}-X_{jt})^2\,, \qquad 1\le i <j\le p\,, \end{aligned}$$

where \(\Vert \cdot \Vert _2\) is the Euclidean norm on \(\mathbb {R}^n\). Figure 1 shows the four largest interpoint distances of 500 points on \(\mathbb {R}^2\) with independent standard normal distributed components. Note that three of the largest four distances involve the same outlying vector \(\textbf{x}_i\).

We assume that there exists \(s>2\) such that \({\mathbb E}[|X_{11}|^{2s}(\log (|X_{11}|))^{s/2}]< \infty\) and \({\mathbb E}[X_{11}^4]\le 5\) and that \(p=p_n\rightarrow \infty\) satisfies \(p=O(n^{(s-2)/4})\). Additionally, we let \((b_n)_n\) and \((c_n)_n\) be sequences given by

$$\begin{aligned} b_n=2n+\sqrt{2n({\mathbb E}[X^4]+1)}d_n\quad \text { and } \quad c_n=\frac{d_n}{\sqrt{2n({\mathbb E}[X^4]+1)}}\,, \end{aligned}$$

where \(d_n=\sqrt{2\log \tilde{p}} - \tfrac{\log \log \tilde{p}+\log 4\pi }{2(2\log \tilde{p})^{1/2}}\) with \(\tilde{p}=p(p-1)/2\). For \(x\in \mathbb {R}\) one can check that

$$\tilde{p}\, {\mathbb P}\big ( c_n (D_{12}-b_n)>x\big ) \rightarrow {{\,\textrm{e}\,}}^{-x} \quad \text {and} \quad {\mathbb P}\big ( c_n (D_{12}-b_n)>x, c_n(D_{23}-b_n)>x\big )=o(p^{-3})$$

as \(n\rightarrow \infty\) (see Heiny and Kleemann (2023) for details). Therefore, the conditions (A1) and (A2) in Theorem 2.1 hold for \(m=2\), \(g_n(\textbf{x}_i,\textbf{x}_j)=c_n (D_{ij}-b_n)\) and \(\mu (x)={{\,\textrm{e}\,}}^{-x}\). By virtue of Theorem 2.1 we have

$$\begin{aligned} \sum _{1\le i<j\le p}\varepsilon _{c_n (D_{ij}-b_n)}{\mathop {\rightarrow }\limits ^{d}}N_\Lambda =\sum _{i=1}^\infty \varepsilon _{-\log \Gamma _i}. \end{aligned}$$

Finally Corollary 2.2 yields for a fixed \(k\ge 1\) that

$$\begin{aligned} (D_{n,(1)},\ldots ,D_{n,(k)}){\mathop {\rightarrow }\limits ^{d}}(-\log \Gamma _1,\ldots , -\log \Gamma _k), \end{aligned}$$

where \(D_{n,(\ell )}\) is the \(\ell\)-th upper order statistic of the random variables \(c_n(D_{ij}-b_n)\) for \(1\le i<j\le p\).

2.1 Record times

In Theorem 2.1 we showed convergence of point processes including time components. Therefore, we can additionally derive results for the record times \(L(k), k\ge 1\) of the running maxima of the points \(T_\textbf{i}=g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m})\) for \(\textbf{i}=(i_1,\ldots ,i_m)\), which are recursively defined as follows:

$$\begin{aligned} L(1)&=1\,,\\ L(k+1)&=\inf \{j>L(k):\max \limits _{1\le i_1<\ldots<i_m\le j} T_\textbf{i}>\max \limits _{1\le i_1<\ldots <i_m\le L(k)} T_\textbf{i}\},\qquad k\in \mathbb {N}, \end{aligned}$$

(c.f. Sections 5.4.3 and 5.4.4 of Embrechts et al. 1997). To prove point process convergence for the record times we need the convergence in distribution of the sequence of processes \((Y_n(t), 0<t\le 1)\) in D(0, 1], the space of right continuous functions on (0, 1] with finite limits existing from the left, defined by

$$\begin{aligned} Y_n(t)={\left\{ \begin{array}{ll} \max \limits _{1\le i_1<i_2<\ldots <i_m\le \lfloor pt\rfloor } g_n(\textbf{x}_{i_1},\textbf{x}_{i_2},\ldots ,\textbf{x}_{i_m}),\quad &{} \frac{m}{p}\le t\le 1\\ g_n(\textbf{x}_{1},\textbf{x}_{2},\ldots ,\textbf{x}_{m}), &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(\lfloor x\rfloor =\max \{y\in \mathbb {Z}:y\le x\}\) for \(x\in \mathbb {R}\), towards an extremal process. We call \(Y=(Y(t))_{t>0}\) an extremal process generated by the distribution function H, if the finite-dimensional distributions are given by

$$\begin{aligned} {\mathbb P}(Y(t_1)\le x_1,\ldots , Y(t_k)\le x_k)=H^{t_1}\Big (\bigwedge _{i=1}^k x_i\Big )H^{t_2-t_1}\Big (\bigwedge _{i=2}^k x_i\Big )\ldots H^{t_k-t_{k-1}}( x_k), \end{aligned}$$
(2.5)

where \(k\ge 1\), \(0<t_1<\ldots <t_k\), \(x_i\in \mathbb {R}\) and \(1\le i\le k\) (see Embrechts et al. 1997, Definition 5.4.3). To define convergence in distribution in D(0, 1] we first need to introduce a metric \(\mathcal {D}\) on D(0, 1]. To this end, let \(\Lambda _{[0,1]}\) be a set of homeomorphisms

$$\begin{aligned} \Lambda _{[0,1]}=\{\lambda :[0,1]\rightarrow [0,1]: \lambda (0)=0, \lambda (1)=1, \lambda \text { is continuous and strictly increasing}\}. \end{aligned}$$

Then for \(f,g\in D[0,1]\) the Skorohod metric \(\tilde{\mathcal {D}}\) is defined by (see Billingsley 1999, Section 12)

$$\begin{aligned} \tilde{\mathcal {D}}(f,g):=\inf \{&\epsilon >0: \text {there exists a}\,\, \lambda \in \Lambda _{[0,1]}\,\, \text {such that}\\&\sup _{0\le t\le 1}|\lambda (t)-t|\le \epsilon ,\,\sup _{0\le t\le 1}|f(t)-g(\lambda (t))|\le \epsilon \}. \end{aligned}$$

Now set

$$\begin{aligned} \mathcal {D}(f,g):=\tilde{\mathcal {D}}(\tilde{f},\tilde{g}),\quad f,g\in D(0,1], \end{aligned}$$

where \(\tilde{f}\) and \(\tilde{g}\) are the right continuous extensions of f and g on [0, 1].The space of functions D[0, 1] and therefore D(0, 1] is separable under the Skorohod metric but not complete. However, one can find an equivalent metric, i.e., a metric which generates the same Skorohod topology, under which D[0, 1] is complete (see Billingsley 1999, Theorem 12.2). In particular, the Skorohod metric and the equivalent metric generate the same open sets and thus the \(\sigma\)-algebras of the Borel sets, which are generated by these open sets, are the same. Therefore, a sequence of probability measures on D(0, 1] is relatively compact if and only if it is tight (Billingsley 1999, Section 13). Hence, for every tight sequence of probability measures on D(0, 1] the convergence of the finite dimensional distributions on all continuity points of the limit distribution implies convergence in distribution (Billingsley 1999, Theorem 13.1).

For the PRM \(M=\sum _{i=1}^\infty \varepsilon _{(U_i,\Delta _i)}\), where \((U_i)_i\) is an iid sequence of random vectors uniformly distributed on \(S_1\) and

$$\begin{aligned} \Delta _i={\left\{ \begin{array}{ll} -\log (\Gamma _i)\quad &{}\text {if}\,\,\, H=\Lambda ,\\ \Gamma _i^{- 1/\alpha }\quad &{}\text {if}\,\,\, H=\Phi _\alpha ,\\ -\Gamma _i^{1/\alpha }\quad &{}\text {if}\,\,\, H=\Psi _\alpha , \end{array}\right. } \end{aligned}$$

we set

$$\begin{aligned} Y(t)=\sup \{\Delta _i:U_i^{(m)}\le t, i\ge 1\}\,,\qquad t\in (0,1]\,, \end{aligned}$$

where \(U_i^{(m)}\) is the m-th component of \(U_i\). Then the process Y has the finite dimensional distributions in (2.5) for \(k\ge 1\), \(0<t_i\le 1\), \(x_i\in \mathbb {R}\) and \(1\le i\le k\). Therefore, Y is an extremal process generated by H restricted to the interval (0, 1]. For these processes we can show the following invariance principle by application of the continuous mapping theorem (see Billingsley 1999, Theorem 2.7 or Resnick 2008, p. 152).

Proposition 2.5

Under the conditions of Theorem 2.1 and if \(H(\cdot )=\exp (-\mu (\cdot ))\) is an extreme value distribution it holds that

$$\begin{aligned} Y_n{\mathop {\rightarrow }\limits ^{d}}Y\,, \qquad n\rightarrow \infty \,, \end{aligned}$$

in D(0, 1] with respect to the metric \(\mathcal {D}\).

Since Y is a nondecreasing function, which is constant between isolated jumps, it has only countably many discontinuity points. Now let \((\tau _n)_n\) be the sequence of these discontinuity points of Y. Notice that by Embrechts et al. (1997), Theorem 5.4.7 the point process \(\sum _{k=1}^\infty \varepsilon _{\tau _k}\) is a PRM with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\). We are ready to state our result for the point process of record times.

Theorem 2.6

Under the conditions of Theorem 2.1 and if \(H(\cdot )=\exp (-\mu (\cdot ))\) is an extreme value distribution it holds that

$$\begin{aligned} J_n:=\sum _{k=1}^p\varepsilon _{p^{-1}L(k)}{\mathop {\rightarrow }\limits ^{d}}J:=\sum _{k=1}^\infty \varepsilon _{\tau _k}, \end{aligned}$$

in \(\mathcal {M}(0,1]\), the space of point measures on (0, 1].

Based on Theorem 2.6 we can make statements about the time points of the last and second last record at or before p.

Corollary 2.7

Assume the conditions of Theorem 2.6 and let \(\zeta (p)\) be the number of records among the random variables

$$\max _{1\le i_1,\ldots , i_m\le m}T_{\textbf{i}},\ldots , \max _{1\le i_1,\ldots , i_m\le p}T_{\textbf{i}}.$$

Then the following statements hold for \(x,y\in (0,1]\) as \(n\rightarrow \infty\).

  1. (1)

    \({\mathbb P}(p^{-1}L(\zeta (p))\le x)={\mathbb P}(J_n(x,1]=0)\rightarrow {\mathbb P}(J(x,1]=0)=x\).

  2. (2)

    \({\mathbb P}(p^{-1}L(\zeta (p))\le x, p^{-1}L(\zeta (p)-1)\le y)\rightarrow y+y\log (x/y)\) for \(x>y\).

  3. (3)

    \({\mathbb P}(p^{-1}(L(\zeta (p))-L(\zeta (p)-1))\le x)\rightarrow x(1-\log (x))\).

Proof

Let \(0<y<x\le 1\). Part (1) is a direct consequence of the definitions of \(\zeta\) and L. Part (2) follows by

$$\begin{aligned} {\mathbb P}(p^{-1}L(\zeta (p))\le x, p^{-1}L(\zeta (p)-1)\le y)&={\mathbb P}(J_n(x,1]=0, J_n(y,1]\le 1)\\ {}&\rightarrow {\mathbb P}(J(x,1]=0, J(y,1]\le 1) \end{aligned}$$

as \(n\rightarrow \infty\) and

$$\begin{aligned} {\mathbb P}(J(x,1]=0, J(y,1]\le 1)={\mathbb P}(J(x,1]=0){\mathbb P}(J(y,x]\le 1)=y+y\log (x/y). \end{aligned}$$

To prove part (3) we assume that \(\tau ^{(1)}\) and \(\tau ^{(2)}\) are the first and the second upper order statistics of \((\tau _n)_n\). These upper order statistics exist since for every \(a>0\) there are only finitely many \(\tau _n\in [a,1]\). Then, we know by part (2) that

$$\begin{aligned} {\mathbb P}(\tau ^{(1)}\le x, \tau ^{(2)}\le y)={\mathbb P}(J(x,1]=0, J(y,1]\le 1)={\left\{ \begin{array}{ll} y+y\log (x/y),\quad &{}x>y\\ x, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(2.6)

Since

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb P}(p^{-1}(L(\zeta (p))-L(\zeta (p)-1))\le x)= {\mathbb P}(\tau ^{(1)}-\tau ^{(2)}\le x) \end{aligned}$$

we need to calculate \({\mathbb P}(\tau ^{(1)}-\tau ^{(2)}\le x)\). The joint density of \(\tau ^{(1)}\) and \(\tau ^{(2)}\) can be deduced from (2.6), it is

$$\begin{aligned} f_{\tau ^{(1)}\tau ^{(2)}}(u,v)={\left\{ \begin{array}{ll} 1/u\quad &{}u>v\\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Hence, we get the following distribution function of \(\tau ^{(1)}-\tau ^{(2)}\)

$$\begin{aligned} {\mathbb P}(\tau ^{(1)}-\tau ^{(2)}\le x)&=\int _0^x\int _0^{1-w} f_{\tau ^{(1)}\tau ^{(2)}}(w+v,v)dv\, dw\\&=\int _0^x\int _0^{1-w} 1/(w+v) dv \,dw = \int _0^x \log (1/w) dw = x(1-\log (x)), \end{aligned}$$

which completes the proof.

3 Applications

3.1 Relative ranks

In recent years, maximum-type tests based on the convergence in distribution of the maximum of rank statistics of a data set have gained significant interest for statistical testing (Han et al. 2017). Let \(\textbf{y}_1,\ldots ,\textbf{y}_n\) be p-dimensional iid random vectors with \(\textbf{y}_t=(X_{1t},\ldots ,X_{pt})\) following a continuous distribution to avoid ties. We write \(Q_{it}\) for the rank of \(X_{it}\) among \(X_{i1},\ldots , X_{in}\). Additionally, let \(R_{ij}^{(t)}\) be the relative rank of the j-th entry compared to the i-th entry; that is \(R_{ij}^{(t)} = Q_{ j t'}\) with \(t'\) such that \(Q_{i t'}=t\) for \(t=1,\ldots ,n\).

A simpler explanation of \(R_{ij}^{(t)}\) is that we look at the j-th and i-th rows of \((Q_{it})\) and find the location of t in the i-th row. Then we choose the value in the j-th row at this location.

Many important statistics are based on (relative) ranks; we consider two classes of such statistics in this section. First, we introduce the so–called simple linear rank statistics, which are of the form

$$\begin{aligned} V_{ij}=\sum _{t=1}^n c_{nt} \, g(R_{ij}^{(t)}/(n+1))\,, \qquad 1\le i< j\le p\,, \end{aligned}$$

where g is a Lipschitz function (also called score function), and \((c_{nt})\) with \(c_{nt}=n^{-1} f(t/(n+1))\) for a Lipschitz function f and \(\sum _{t=1}^n c_{nt}^2 >0\) are called the regression constants. An example of such a simple linear rank statistic is Spearman’s \(\rho\), which will be discussed in detail in Section 3.1.2. For \(1\le i<j\le p\) the relative ranks \((R_{ij}^{(t)})_{t=1}^n\) depend on the vectors \(\textbf{x}_i\) and \(\textbf{x}_j\), where \(\textbf{x}_k=(X_{k1},\ldots , X_{kn})\) for \(1\le k\le p\). We assume that the vectors \(\textbf{x}_1,\ldots \textbf{x}_p\) are independent. It is worth mentioning that the ranks \((Q_{it})\) remain the same if we transform the marginal distributions to the (say) standard uniform distribution. Thus, the joint distribution of \((R_{ij}^{(t)})_{t=1}^n\), and thereby the distribution of \(V_{ij}\), does not depend on the distribution of \(\textbf{x}_i\) or \(\textbf{x}_j\). Therefore, we may assume without loss of generality that the random vectors \(\textbf{x}_1,\ldots , \textbf{x}_p\) are identically distributed. We can write \(V_{ij}=g_{n,V}(\textbf{x}_i,\textbf{x}_j)\) for a measurable function \(g_{n,V}:\mathbb {R}^{2n}\rightarrow \mathbb {R}\).

Next, we consider rank-type U-statistics of order \(m<n\) of the form

$$\begin{aligned} U_{ij}= \frac{1}{n(n-1)\cdots (n-m+1)}\sum _{1\le t_1\ne \cdots \ne t_m\le n} h( (X_{i t_1},X_{j t_1}), \ldots , (X_{i t_m},X_{j t_m}))\,, \end{aligned}$$

where the symmetric kernel h is such that \(U_{ij}\) depends only on \((R_{ij}^{(t)})_{t=1}^n\). An important example of a rank-type U- statistic is Kendall’s \(\tau\), which will be studied in Section 3.1.1. For more examples we refer to Han et al. (2017) and references therein. As for simple linear rank statistics, we are able to write \(U_{ij}=g_{n,U}(\textbf{x}_i,\textbf{x}_j)\), where \(g_{n,U}:\mathbb {R}^{2n}\rightarrow \mathbb {R}\) is a measurable function and \(\textbf{x}_1,\ldots \textbf{x}_p\) are iid random vectors.

An interesting property of rank-based statistics is the following pairwise independence. We also note that they are generally not mutually independent.

Lemma 3.1

(Lemma C4 in Han et al. 2017) For \(1\le i<j\le p\), let \(\Psi _{ij}\) be a function of the relative ranks \(\{R_{ij}^{(t)}, t=1,\ldots ,n\}\). Assume \(\textbf{x}_1,\ldots ,\textbf{x}_p\) are independent. Then for any \((i,j) \ne (k,l)\), \(i< j, k< l\), the random variables \(\Psi _{ij}\) and \(\Psi _{kl}\) are independent.

As an immediate consequence we obtain pairwise independence of \((U_{ij})\) and \((V_{ij})\), respectively.

Lemma 3.2

For any \((i,j) \ne (k,l)\), \(i< j, k< l\), the random variables \(V_{ij}\) and \(V_{kl}\) are independent and identically distributed. Moreover, \(U_{ij}\) and \(U_{kl}\) are independent and identically distributed.

We now want to standardize \(U_{ij}\) and \(V_{ij}\). By independence of \((X_{it})\), we have

$$\begin{aligned} {\mathbb E}[V_{ij}] = \overline{g}_n \sum _{t=1}^n c_{nt}\,, \quad {\text {Var}}(V_{ij})= \frac{1}{n-1} \sum _{t=1}^n ( g(t/(n+1))- \overline{g}_n)^2 \sum _{s=1}^n (c_{ns}- \overline{c}_n)^2\,, \end{aligned}$$

where \(\overline{g}_n = n^{-1} \sum _{t=1}^ng(t/(n+1))\) is the sample mean of \(g(Q_{11}/(n+1)),\ldots , g(Q_{1n}/(n+1))\) and \(\overline{c}_n= \sum _{t=1}^nc_{nt}\). Expectation and variance of \(U_{ij}\) can also be calculated analytically. We set

$$\begin{aligned} \mu _V= {\mathbb E}[V_{12}]\,, \sigma _V^2= {\text {Var}}(V_{12}) \quad \text { and } \quad \mu _U= {\mathbb E}[U_{12}]\,, \sigma _U^2={\text {Var}}(U_{12})\,, \end{aligned}$$

and define the standardized versions of \(U_{ij}\) and \(V_{ij}\) by

$$\begin{aligned} \widetilde{V}_{ij} = (V_{ij}-\mu _V)/\sigma _V \quad \text { and } \quad \widetilde{U}_{ij}=(U_{ij}-\mu _U)/\sigma _U\,,\quad 1\le i<j\le p. \end{aligned}$$

It is well–known that \(\widetilde{V}_{ij}\) and \(\widetilde{U}_{ij}\) are asymptotically standard normal and the following lemma provides a complementary large deviation result.

Lemma 3.3

(Kallenberg 1982, p.404-405) Suppose that the kernel function h is bounded and non-degenerate. Then we have for \(x=o(n^{1/6})\) that

$$\begin{aligned} {\mathbb P}(\widetilde{U}_{12} >x) = \overline{\Phi }(x) (1+o(1)), \qquad n\rightarrow \infty \,. \end{aligned}$$

Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants \((c_{nt})_t\) satisfy

$$\begin{aligned} \max _{1\le t\le n} |c_{nt}- \overline{c}_n|^2 \le \frac{C^2}{n^{ 2/3}} \sum _{t=1}^n(c_{nt}-\overline{c}_n)^2 \,, \quad \Big | \sum _{t=1}^n(c_{nt}-\overline{c}_n)^3 \Big |^2 \le \frac{C^2}{n} \Big | \sum _{t=1}^n(c_{nt}-\overline{c}_n)^2 \Big |^3\,, \end{aligned}$$
(3.1)

where C is some constant. Then it holds for \(x=o(n^{1/6})\)

$$\begin{aligned} {\mathbb P}(\widetilde{V}_{12} >x) = \overline{\Phi }(x) (1+o(1)),\qquad n\rightarrow \infty \,. \end{aligned}$$

For a discussion of (3.1), see (Kallenberg 1982, p.405). To proceed we need to find a suitable scaling and centering sequences for \(\widetilde{V}_{ij}\) and \(\widetilde{U}_{ij}\), respectively, such that the conditions of Theorem 2.1 are fulfilled. For an iid standard normal sequence \((X_i)\) it is known that

$$\begin{aligned}\lim _{p \rightarrow \infty } {\mathbb P}\Big (\widetilde{d}_p \big (\max _{i=1,\ldots ,p} X_i-\widetilde{d}_p\big )\le x\Big )=\exp (-\textrm{e}\,^{-x})=\Lambda (x)\,, \qquad x\in \mathbb {R}\,, \end{aligned}$$

where \(\widetilde{d}_p=\sqrt{2\log p} - \tfrac{\log \log p+\log 4\pi }{2(2\log p)^{1/2}}\); see Embrechts et al. (1997, Example 3.3.29). Since we are dealing with \(p(p-1)/2\) random variables \((V_{ij})\) and \((U_{ij})\), respectively, which are asymptotically standard normal, \(d_p =\widetilde{d}_{p(p-1)/2}\) seems like a reasonable choice for scaling and centering sequences.

Our main result for rank-statistics is the following.

Theorem 3.4

  1. (a)

    Suppose that the kernel function h is bounded and non-degenerate. If \(p = \exp (o(n^{1/3}))\), the following point process convergence holds

    $$\begin{aligned} N_n^U := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \widetilde{U}_{ij}- d_p)} {\mathop {\rightarrow }\limits ^{d}}N :=\sum _{i=1}^{\infty } \varepsilon _{-\log \Gamma _i}\,,\qquad n\rightarrow \infty \,, \end{aligned}$$
    (3.2)

    where \(\Gamma _i= E_1+\cdots +E_i\), \(i\ge 1\), and \((E_i)\) are iid standard exponential, i.e., N is a Poisson random measure with mean measure \(\mu (x,\infty )={{\,\textrm{e}\,}}^{-x}\), \(x\in \mathbb {R}\).

  2. (b)

    Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants \((c_{nt})_t\) satisfy (3.1). Then if \(p=\exp (o(n^{1/3}))\), it holds that

    $$\begin{aligned} N_n^V := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \widetilde{V}_{ij}- d_p)} {\mathop {\rightarrow }\limits ^{d}} \, N\,,\qquad n\rightarrow \infty \,. \end{aligned}$$
    (3.3)

Proof

We start with the proof of (3.3) for which we will use Theorem 2.1, as \(\textbf{x}_1,\ldots \textbf{x}_p\) are iid and \(g_{n,V}\) is a measurable function. Therefore, we only have to show that for \(x\in \mathbb {R}\) it holds

  1. (1)

    \(\frac{p(p-1)}{2}{\mathbb P}(\widetilde{V}_{12}>x_p)\rightarrow {{\,\textrm{e}\,}}^{-x}\) as \(n\rightarrow \infty\),

  2. (2)

    \(p^3{\mathbb P}(\widetilde{V}_{12}>x_p, \widetilde{V}_{13}>x_p)\rightarrow 0\) as \(n\rightarrow \infty\),

where \(x_p=x/d_p+d_p\). We will begin with the proof of (1). Since \(x_p\sim d_p=o(n^{1/6})\) we get by Lemma 3.3

$$\begin{aligned} \frac{p(p-1)}{2}{\mathbb P}(\widetilde{V}_{12}>x_p)=\frac{p(p-1)}{2}\bar{\Phi }(x_p)(1+o(1)) \end{aligned}$$

and by Mill’s ratio we have (writing \(\tilde{p}=\tfrac{p(p-1)}{2}\))

$$\begin{aligned} \tilde{p}\,\bar{\Phi }(x_p)\sim \tilde{p}\frac{1}{\sqrt{2\pi }x_p}{{\,\textrm{e}\,}}^{-x_p^2/2}\sim \tilde{p}\frac{1}{\sqrt{2\pi }\sqrt{2\log \tilde{p}}}{{\,\textrm{e}\,}}^{-\log \tilde{p}+(\log \log \tilde{p})/2+(\log (4\pi ))/2}{{\,\textrm{e}\,}}^{-x}={{\,\textrm{e}\,}}^{-x}. \end{aligned}$$

Regarding (2), we note that, by Lemma 3.2, \(\widetilde{V}_{12}\) and \(\widetilde{V}_{13}\) are independent. Thus, we get

$$\begin{aligned} p^3{\mathbb P}(\widetilde{V}_{12}>x_p, \widetilde{V}_{13}>x_p)=p^3{\mathbb P}(\widetilde{V}_{12}>x_p)^2=p^3(\bar{\Phi }(x_p)(1+o(1)))^2\rightarrow 0,\qquad n\rightarrow \infty , \end{aligned}$$

where we used Lemma 3.3 and Mill’s ratio in the last two steps. That completes the proof of (3.3). The proof of (3.2) follows by analogous arguments.\(\square\)

Remark 3.5

Theorem 3.4 is a generalization of Theorems 1 and 2 in Han et al. (2017) who proved under the conditions of Theorem 3.4 and if \(p = \exp (o(n^{1/3}))\) that

$$\begin{aligned} \lim _{n \rightarrow \infty }{\mathbb P}\Big ( \max _{1\le i<j\le p} \widetilde{V}_{ij}^2 -4 \log p + \log \log p \le x\Big ) = \exp \Big (-\tfrac{1}{\sqrt{8 \pi }} {{\,\textrm{e}\,}}^{-x/2} \Big )\,, \quad x\in \mathbb {R}\, \end{aligned}$$

and

$$\begin{aligned} \lim _{n \rightarrow \infty }{\mathbb P}\Big ( \max _{1\le i<j\le p} \widetilde{U}_{ij}^2 -4 \log p + \log \log p \le x\Big ) = \exp \Big (-\tfrac{1}{\sqrt{8 \pi }} {{\,\textrm{e}\,}}^{-x/2} \Big )\,, \quad x\in \mathbb {R}\,. \end{aligned}$$

As in Theorem 2.6, we additionally conclude point process convergence for the record times of the maxima of \(V_{ij}\) and \(U_{ij}\). To this end, we investigate the sequence \((\max _{1\le i<j\le k}U_{ij})_{k\ge 1}\). This sequence jumps at time k if one of the random variables \(U_{1k},\ldots , U_{k-1,k}\) is larger than every \(U_{ij}\) for \(1\le i<j\le k-1\). Between these jump (or record) times the sequence is constant.

Let \(L^U\) be this sequence of record times defined by

$$\begin{aligned} L^U(1)&=1,\\ L^U(k+1)&=\inf \{\ell>L^U(k):\max \limits _{1\le i< j\le \ell } U_{ij}>\max \limits _{1\le i<j\le L^U(k)} U_{ij}\},\qquad k\in \mathbb {N}, \end{aligned}$$

and let \(L^V\) be constructed analogously.

Theorem 3.6

Under the conditions of Theorem 3.4 it holds that

$$\begin{aligned} \sum _{k=1}^p\varepsilon _{p^{-1}L^V(k)}{\mathop {\rightarrow }\limits ^{d}}J\quad \text {and}\quad \sum _{k=1}^p\varepsilon _{p^{-1}L^U(k)}{\mathop {\rightarrow }\limits ^{d}}J, \end{aligned}$$

in \(\mathcal {M}(0,1]\), the space of point measures on (0, 1], where J is a Poisson random measure with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\).

As in Corollary 2.7, we can draw conclusions on the index of the last and second last jump before or at p. Let \(\zeta ^U(p)\) be the number of records among \(\max _{1\le i<j\le 2}U_{ij},\ldots , \max _{1\le i<j\le p}U_{ij}\). Then, as \(n\rightarrow \infty\), we have for \(x,y \in (0,1]\)

  1. (1)

    \({\mathbb P}(p^{-1}L^U(\zeta ^U(p))\le x)\rightarrow {\mathbb P}(J(x,1]=0)=x\),

  2. (2)

    \({\mathbb P}(p^{-1}L^U(\zeta ^U(p))\le x, p^{-1}L^U(\zeta ^U(p)-1)\le y)\rightarrow y+y\log (x/y)\) for \(x>y\),

  3. (3)

    \({\mathbb P}(p^{-1}(L^U(\zeta ^U(p))-L^U(\zeta ^U(p)-1))\le x)\rightarrow x(1-\log (x))\),

where (3) gives information about how much time elapses between the second last and the last jump of \((\max _{1\le i<j\le k}U_{ij})_{k\ge 1}\) before or at p.

3.1.1 Kendall’s tau

Kendall’s tau is an example of a rank-type U-statistic with bounded kernel. For \(i\ne j\) Kendall’s tau \(\tau _{ij}\) measures the ordinal association between the two sequences \((X_{i1},\ldots ,X_{in})\) and \((X_{j1},\ldots ,X_{jn})\). It is defined by

$$\begin{aligned} \tau _{ij}&=\frac{2}{n(n-1)}\sum _{1\le t_1< t_2\le n}{\text {sign}}(X_{it_1}-X_{it_2}){\text {sign}}(X_{jt_1}-X_{jt_2})\\&=\frac{2}{n(n-1)}\sum _{1\le t_1< t_2\le n}{\text {sign}}(R_{ij}^{(t_2)}-R_{ij}^{(t_1)}), \end{aligned}$$

where the function \({\text {sign}}:\mathbb {R}\rightarrow \{1,0,-1\}\) is given by \({\text {sign}}(x)=x/|x|\) for \(x\ne 0\) and \({\text {sign}}(0)=0\). An interesting property of Kendall’s tau is that there exists a representation as a sum of independent random variables. We could not find this representation in the literature. Therefore, we state it here. The proof can be found in Section 4.

Proposition 3.7

We have

$$\begin{aligned} \tau _{12} {\mathop {=}\limits ^{d}}\frac{4}{n(n-1)} \sum _{i=1}^{n-1} D_i\,, \end{aligned}$$

where \((D_i)_{i\ge 1}\) are independent random variables with \(D_i\) being uniformly distributed on the numbers \(-i/2, -i/2+1, \ldots , i/2\).

From Proposition 3.7 we deduce \({\mathbb E}[\tau _{ij}]=0\) and \({\text {Var}}(\tau _{ij})=\tfrac{2(2n+5)}{9n(n-1)}\). The next result is a corollary of Theorem 3.4.

Corollary 3.8

Under the conditions of Theorem 3.4 we have

$$\begin{aligned} N_n^{\tau } := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \tau _{ij}/\sqrt{{\text {Var}}(\tau _{ij}})- d_p)} {\mathop {\rightarrow }\limits ^{d}}N =\sum _{i=1}^{\infty } \varepsilon _{-\log \Gamma _i}\,,\qquad n\rightarrow \infty \,. \end{aligned}$$

3.1.2 Spearman’s rho

An example of a simple linear rank statistic is Spearman’s rho, which is a measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function. Recall that \(Q_{ik}\) and \(Q_{jk}\) are the ranks of \(X_{ik}\) and \(X_{jk}\) among \(\{X_{i1},\ldots , X_{in}\}\) and \(\{X_{j1},\ldots , X_{jn}\}\), respectively, and write \(q_n=(n+1)/2\) for the average rank. Then for \(1\le i\ne j\le p\) Spearman’s rho is defined by

$$\begin{aligned} \rho _{ij}&=\frac{\sum _{k=1}^n(Q_{ik}-q_n)(Q_{jk}-q_n)}{\big (\sum _{k=1}^n(Q_{ik}-q_n)^2\sum _{k=1}^n(Q_{jk}-q_n)^2\big )^{1/2}}\\&=\frac{12}{n(n^2-1)}\sum _{k=1}^n\Big (k-\frac{n+1}{2}\Big )\Big (R_{ij}^{(k)}-\frac{n+1}{2}\Big ). \end{aligned}$$

For mean and variance we get

$$\begin{aligned} {\mathbb E}[\rho _{ij}]=0 \quad \text {and}\quad {\text {Var}}(\rho _{ij})=1/(n-1). \end{aligned}$$
(3.4)

Therefore, we obtain the following corollary of Theorem 3.4.

Corollary 3.9

Under the conditions of Theorem 3.4 it holds that

$$\begin{aligned} N_n^\rho := \sum _{1\le i<j\le p} \varepsilon _{ d_p( \rho _{ij}/\sqrt{{\text {Var}}(\rho _{ij})}- d_p)} {\mathop {\rightarrow }\limits ^{d}}N\,. \end{aligned}$$

The next auxiliary result allows us to transfer the weak convergence of a sequence of point processes to a another sequence of point processes, provided that the maximum distance between their points tends to zero in probability.

Proposition 3.10

For arrays \((X_{i,n})_{i,n\ge 1}\) and \((Y_{i,n})_{i,n\ge 1}\) of real-valued random variables, let \(N^X_n=\sum _{i=1}^p \varepsilon _{X_{i,n}}\) and assume that \(N^X_n{\mathop {\rightarrow }\limits ^{d}}N\). Consider a point process \(N^Y_n=\sum _{i=1}^p \varepsilon _{Y_{i,n}}\). If

$$\max _{i=1,\ldots ,p}|X_{i,n}-Y_{i,n}| {\mathop {\rightarrow }\limits ^{{\mathbb P}}}0,$$

then \(N^Y_n{\mathop {\rightarrow }\limits ^{d}}N\).

Example 3.11

It turns out that there is an interesting connection between Spearman’s rho and Kendall’s tau. By Hoeffding (1948, p.318) we can write Spearman’s rho as

$$\begin{aligned} \rho _{ij}=\frac{n-2}{n+1}r_{ij}+\frac{3\tau _{ij}}{n+1},\quad \quad 1\le i\ne j\le p, \end{aligned}$$
(3.5)

where

$$\begin{aligned} r_{ij}=\frac{3}{n(n-1)(n-2)}\sum _{1\le t_1\ne t_2\ne t_3\le n}{\text {sign}}(X_{it_1}-X_{it_2}){\text {sign}}(X_{jt_1}-X_{jt_3}) \end{aligned}$$

is the major part of Spearman’s rho. Therefore, \(r_{ij}\) is a U-statistic of degree three with an asymmetric bounded kernel and with

$$\begin{aligned} {\mathbb E}[r_{ij}]=0\quad \text {and}\quad {\text {Var}}(r_{ij})=\frac{n^2-3}{n(n-1)(n-2)},\quad \quad 1\le i\ne j \le p. \end{aligned}$$
(3.6)

We now use Proposition 3.10 and Corollary 3.9 to show that

$$\begin{aligned} N_n^{r} := \sum _{1\le i<j\le p} \varepsilon _{ d_p( r_{ij}/\sqrt{{\text {Var}}(r_{ij})}- d_p)} {\mathop {\rightarrow }\limits ^{d}}N:= \sum _{i=1}^{\infty } \varepsilon _{-\log \Gamma _i}\,,\qquad n\rightarrow \infty \,. \end{aligned}$$
(3.7)

For this purpose we consider the following difference

$$\begin{aligned} d_p\Big ( \frac{\rho _{ij}}{\sqrt{{\text {Var}}(\rho _{ij})}} - d_p\Big )- d_p\Big ( \frac{r_{ij}}{\sqrt{{\text {Var}}(r_{ij})}}- d_p\Big ) = d_p\Big (\frac{\rho _{ij}}{\sqrt{{\text {Var}}(\rho _{ij})}}-\frac{r_{ij}}{\sqrt{{\text {Var}}(r_{ij})}}\Big ). \end{aligned}$$

By (3.4), (3.6) and (3.5) this expression is asymptotically equal to

$$\begin{aligned} \frac{d_p}{\sqrt{n}}(\rho _{ij}-r_{ij})=\frac{3d_p}{\sqrt{n}(n+1)}(\tau _{ij}-r_{ij}). \end{aligned}$$

Since \(|\tau _{ij}|\) and \(|r_{ij}|\) are bounded above by constants, we deduce that

$$\begin{aligned} \max _{1\le i<j\le p}\Big |\frac{3d_p}{\sqrt{n}(n+1)}(\tau _{ij}-r_{ij})\Big |{\mathop {\rightarrow }\limits ^{{\mathbb P}}}0 \,, \qquad n\rightarrow \infty , \end{aligned}$$

which verifies the condition in Proposition 3.10. Since \(N_n^{\rho }{\mathop {\rightarrow }\limits ^{d}}N\) by Corollary 3.9, we conclude the desired (3.7).

3.2 Sample covariances

An important field of current research is the estimation and testing of high-dimensional covariance structures. It finds application in genomics, social science and financial economics; see Cai (2017) for a detailed review and more references. Under quite general assumptions (Xiao and Wu 2013) investigated the maximum off-diagonal entry of a high-dimensional sample covariance matrix. We impose the same model assumptions (compare Xiao and Wu 2013, p. 2901-2903), but instead of the maximum we study the point process of off-diagonal entries.

We start by describing the model and spelling out the required assumptions. Let \(\textbf{x}_1,\ldots ,\textbf{x}_n\) be p-dimensional iid random vectors with \(\textbf{x}_i=(X_{1i},\ldots ,X_{pi})\), where \({\mathbb E}[X_{ji}]=0\) for \(1\le j\le p\) and \(\bar{X}_j:=\frac{1}{n}\sum _{k=1}^n X_{jk}\). Denote \(\Sigma =(\sigma _{i,j})_{1\le i,j\le p}\) as the covariance matrix of the vector \(\textbf{x}_1\) and assume \(\sigma _{i,i}=1\) for \(1\le i\le p\). The empirical covariance matrix \((\hat{\sigma }_{i,j})_{1\le i,j\le p}\) is given by

$$\begin{aligned} \hat{\sigma }_{i,j}=\frac{1}{n}\sum _{k=1}^n (X_{ik}-\bar{X}_{i})(X_{jk}-\bar{X}_j),\quad \quad 1\le i,j\le p. \end{aligned}$$

A fundamental problem in high-dimensional inference is to derive the asymptotic distribution of \(\max _{1\le i<j\le p} |\hat{\sigma }_{i,j}-\sigma _{i,j}|\). Since the \(\hat{\sigma }_{i,j}\)’s might have different variances we need to standardize \(\hat{\sigma }_{i,j}\) by \(\theta _{i,j}={\text {Var}}(X_{i1}X_{j1})\), which can be estimated by

$$\begin{aligned} \hat{\theta }_{i,j}=\frac{1}{n}\sum _{k=1}^n\Big [(X_{ik}-\bar{X}_{i})(X_{jk}-\bar{X}_j)-\hat{\sigma }_{i,j}\Big ]^2. \end{aligned}$$

We are interested in the points

$$\begin{aligned} M_{i,j}:=\frac{|\hat{\sigma }_{i,j}-\sigma _{i,j}|}{\sqrt{\hat{\theta }_{i,j}}},\quad \quad 1\le i<j\le p. \end{aligned}$$

Let \(\mathcal {I}_n=\{(i,j): 1\le i<j\le p\}\) be an index set. We use the following notations to formulate the required conditions:

$$\begin{aligned} \mathcal {K}_n(t,r)&=\sup _{1\le i\le p}{\mathbb E}[\exp (t|X_{i1}|^r)],\\ \mathcal {M}_n(r)&=\sup _{1\le i\le p}{\mathbb E}[|X_{i1}|^r],\\ \theta _n&=\inf _{1\le i<j\le p}\theta _{i,j},\\ \gamma _n&=\sup _{\begin{array}{c} \alpha ,\beta \in \mathcal {I}_n\\ \alpha \ne \beta \end{array}} |{\text {Cor}}(X_{i1}X_{j1},\,X_{k1}X_{l1})|,\quad \text {for}\, \alpha =(i,j),\,\beta =(k,l),\\ \gamma _n(b)&=\sup _{\alpha \in \mathcal {I}_n}\sup _{\begin{array}{c} A\subset \mathcal {I}_n\\ |A|=b \end{array}}\inf _{\beta \in A}|{\text {Cor}}(X_{i1}X_{j1},\,X_{k1}X_{l1})|\quad \text {for}\, \alpha =(i,j),\,\beta =(k,l). \end{aligned}$$

Now, we can draft the following conditions.

  • (B1) \(\liminf \limits _{n \rightarrow \infty }\theta _n>0\).

  • (B2) \(\limsup \limits _{n\rightarrow \infty }\gamma _n<1\).

  • (B3) \(\gamma _n(b_n)\log (b_n)=o(1)\) for any sequence \((b_n)\) such that \(b_n\rightarrow \infty\).

  • (B3’) \(\gamma _n(b_n)=o(1)\) for any sequence \((b_n)\) such that \(b_n\rightarrow \infty\) and for some \(\varepsilon >0\)

    $$\begin{aligned} \sum _{\alpha ,\beta \in \mathcal {I}_n} ({\text {Cov}}(X_{i1}X_{j1},\,X_{k1}X_{l1}))^2=O(p^{4-\varepsilon })\quad \text {for}\, \alpha =(i,j),\,\beta =(k,l). \end{aligned}$$
  • (B4) For some constants \(t>0\) and \(0<r\le 2\), \(\limsup \limits _{n\rightarrow \infty } \mathcal {K}_n(t,r)<\infty\), and

    $$\begin{aligned} \log p={\left\{ \begin{array}{ll}o(n^{r/(4+r)}),\quad &{}\text {if}\,\,0<r<2, \\ o(n^{1/3}(\log n)^{-2/3}),\quad &{}\text {if}\,\,r=2.\end{array}\right. } \end{aligned}$$
  • (B4’) \(\log p=o(n^{r/(4+3r)})\), \(\limsup \limits _{n\rightarrow \infty } \mathcal {K}_n(t,r)<\infty\) for some constants \(t>0\) and \(r>0\).

  • (B4”) \(p=O(n^q)\) and \(\limsup \limits _{n\rightarrow \infty }\mathcal {M}_n(4q+4+\delta )<\infty\) for some constants \(q>0\) and \(\delta >0\).

To be able to adopt parts of the proof of Theorem 2 in Xiao and Wu (2013) we consider (instead of \((M_{i,j})\)) the transformed points \((W_{i,j})\) given by

$$\begin{aligned} W_{i,j}:=\frac{1}{2}(n\,M_{i,j}^2-4\log p+\log \log p+\log 8\pi ),\quad \quad 1\le i<j\le p, \end{aligned}$$

and we define the point processes

$$\begin{aligned} N_n^{(W)}:=\sum _{1\le i<j\le p} \varepsilon _{W_{i,j}}\,. \end{aligned}$$

Theorem 3.12

Let \({\mathbb E}[\textbf{x}_1]=0\) and \(\sigma _{i,i}=1\) for all i, and assume (B1) and (B2). Then under any one of the following conditions:

  1. (i)

    (B3) and (B4),

  2. (ii)

    (B3’) and (B4’),

  3. (iii)

    (B3) and (B4”),

  4. (iv)

    (B3’) and (B4”),

it holds, that

$$\begin{aligned} N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N =\sum _{i=1}^{\infty } \varepsilon _{-\log \Gamma _i}\,,\qquad n\rightarrow \infty \,, \end{aligned}$$

where \(\Gamma _i= E_1+\cdots +E_i\), \(i\ge 1\), and \((E_i)\) are iid standard exponential, i.e., N is a Poisson random measure with mean measure \(\mu (x,\infty )={{\,\textrm{e}\,}}^{-x}\), \(x\in \mathbb {R}\).

Proof

Under condition (i) set \(\mathcal {E}_n=n^{-(2-r)/(4(r+4))}\) if \(0<r<2\), and \(\mathcal {E}_n=n^{-1/6}(\log n)^{1/3}(\log p)^{1/2}\) if \(r=2\). Under condition (ii) let \(\mathcal {E}_n=(\log p)^{1/2}n^{-r/(6r+8)}\). Under (i) or (ii) we set

$$\begin{aligned} \tilde{X}_{ik}=X_{ik}\mathbbm {1}_{\{|X_{ik}|\le T_n\}}-{\mathbb E}[X_{ik}\mathbbm {1}_{\{|X_{ik}|\le T_n\}}],\quad \quad 1\le i\le p,\,\,\, 1\le k\le n, \end{aligned}$$

where \(T_n=\mathcal {E}_n(n/(\log p)^3)^{1/4}\). Under conditions (iii) and (iv) we set

$$\begin{aligned} \tilde{X}_{ik}=X_{ik}\mathbbm {1}_{\{|X_{ik}|\le n^{1/4}/\log n\}},\quad \quad 1\le i\le p,\,\,\, 1\le k\le n. \end{aligned}$$

Additionally, we define \(\tilde{\sigma }_{i,j}={\mathbb E}[\tilde{X}_{i1}\tilde{X}_{j1}]\) and \(\tilde{\theta }_{i,j}={\text {Var}}[\tilde{X}_{i1}\tilde{X}_{j1}]\). We consider

$$\begin{aligned} M_{1;i,j}=\frac{1}{\sqrt{\tilde{\theta }_{i,j}}}\Big |\frac{1}{n}\sum _{k=1}^n\tilde{X}_{ik}\tilde{X}_{jk}-\tilde{\sigma }_{ij}\Big | \end{aligned}$$

and the transformed points

$$\begin{aligned} W_{1;i,j}=\frac{1}{2}(n\,M_{1;i,j}^2-4\log p+\log \log p+\log 8\pi ). \end{aligned}$$

We will show that \(N_n^{(W_1)}:=\sum _{1\le i<j\le p} \varepsilon _{W_{1;i,j}}{\mathop {\rightarrow }\limits ^{d}}N\) and thus by Proposition 3.10 \(N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N\).

Therefore, we first apply Kallenberg’s Theorem as in the proof of Theorem 2.1. We set

$$\begin{aligned} B=\bigcup _{k=1}^q B_k\subset \mathbb {R}\end{aligned}$$

with disjoint intervals \(B_k=(r_k,s_k]\) and show

  1. (1)

    \(\lim \limits _{n\rightarrow \infty } \mu _n^{(W_1)}(B)=\mu (B)\),

  2. (2)

    \(\lim \limits _{n\rightarrow \infty } {\mathbb P}(N_n^{(W_1)}(B)=0)={{\,\textrm{e}\,}}^{-\mu (B)}\),

where \(\mu _n^{(W_1)}(B)={\mathbb E}[N_n^{(W_1)}(B)]\) and \(\mu\) is defined by \(\mu (B_k)={{\,\textrm{e}\,}}^{-r_k}-{{\,\textrm{e}\,}}^{-s_k}\).

From the proof of Theorem 2 of Xiao and Wu (2013, p. 2910, 2913-2914) we know that the conditions of Xiao and Wu (2013, Lemma 6) are satisfied. Furthermore, from the proof of Lemma 6 (Xiao and Wu 2013, p. 2909-2910) we get that for \(z\in \mathbb {R}\) and

$$\begin{aligned} z_n=(4\log p-\log \log p -\log 8\pi +2z)^{1/2} \end{aligned}$$

and \(d\in \mathbb {N}\)

$$\begin{aligned}\lim _{n\rightarrow \infty }\sum _{\begin{array}{c} A\subset \mathcal {I}_n\\ |A|=d \end{array}}{\mathbb P}(\sqrt{n}M_{1;i_1,j_1}>z_n,\ldots ,\sqrt{n}M_{1;i_d,j_d}>z_n)=\frac{{{\,\textrm{e}\,}}^{-dz}}{d!}, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \lim _{n\rightarrow \infty }\sum _{\begin{array}{c} A\subset \mathcal {I}_n\\ |A|=d \end{array}}{\mathbb P}(W_{1;i_1,j_1}>z,\ldots ,W_{1;i_d,j_d}>z)=\frac{{{\,\textrm{e}\,}}^{-dz}}{d!}, \end{aligned}$$
(3.8)

where \(A=\{(i_1,j_1),\ldots ,(i_d,j_d)\}\). Therefore, we get for \(d=1\)

$$\begin{aligned} \lim _{n\rightarrow \infty }\mu _n^{(W_1)}(B)=\lim _{n\rightarrow \infty }\sum _{k=1}^q \sum _{(i,j)\in \mathcal {I}_n}{\mathbb P}(W_{1;i,j}\in B_k)=\sum _{k=1}^q ({{\,\textrm{e}\,}}^{-r_k}-{{\,\textrm{e}\,}}^{-s_k}) =\mu (B). \end{aligned}$$

which proves (1). Regarding (2), we use that \(1-{\mathbb P}(N_n^{(W_1)}(B)=0)={\mathbb P}\Big (\bigcup _{1\le i<j\le p} A_{i,j}\Big )\), where \(A_{i,j}=\{W_{1;i,j}\in B\}\). By Bonferroni’s inequality we have for every \(k\ge 1\),

$$\begin{aligned}&\sum _{d=1}^{2k}(-1)^{d-1}\sum _{\begin{array}{c} A\subset \mathcal {I}_n\\ |A|=d \end{array}}P_{A,B} \le {\mathbb P}\Big (\bigcup _{1\le i<j\le p} A_{i,j}\Big ) \le \sum _{d=1}^{2k-1}(-1)^{d-1}\sum _{\begin{array}{c} A\subset \mathcal {I}_n\\ |A|=d \end{array}}P_{A,B}, \end{aligned}$$
(3.9)

where \(A=\{(i_1,j_1),\ldots ,(i_d,j_d)\}\) and \(P_{A,B}={\mathbb P}(W_{1;i_1,j_1}\in B,\ldots ,W_{1;i_d,j_d}\in B)\). First letting \(n\rightarrow \infty\) and then \(k \rightarrow \infty\), we deduce from (3.8) and (3.9) that

$$\begin{aligned} \lim \limits _{n\rightarrow \infty } {\mathbb P}(N_n^{(W_1)}(B)=0)=1-\sum _{d=1}^\infty (-1)^{d-1}\frac{(\mu (B))^d}{d!}=\sum _{d=0}^\infty (-1)^{d}\frac{(\mu (B))^d}{d!}={{\,\textrm{e}\,}}^{-\mu (B)}. \end{aligned}$$

This proves (2) and we get \(N_n^{(W_1)}{\mathop {\rightarrow }\limits ^{d}}N\). By Proposition 3.10 it remains to show

$$\begin{aligned} \max _{1\le i<j\le p}|W_{1;i,j}-W_{i,j}|=\frac{n}{2} \max _{1\le i<j\le p}|M^2_{1;i,j}-M^2_{i,j}|{\mathop {\rightarrow }\limits ^{{\mathbb P}}}0. \end{aligned}$$

Fortunately, this is shown in the course of the proof of Theorem 2 of Xiao and Wu (2013, p. 2911-2916).\(\square\)

The following examples are motivated by Xiao and Wu (2013, p. 2903-2905).

Example 3.13

(Physical dependence). Assume that \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) is distributed as a stationary process of the following form. For a measurable function g and a sequence of iid random variables \((\epsilon _i)_{i\in \mathbb {Z}}\) we set \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) with

$$\begin{aligned} X_{i1}= g(\epsilon _i,\epsilon _{i-1},\ldots ),\quad \quad i\ge 1, \end{aligned}$$

and let \(\textbf{x}_k\), \(2\le k\le n\), be iid copies of \(\textbf{x}_1\). Moreover, for an iid copy \((\epsilon '_i)_{i\in \mathbb {Z}}\) of \((\epsilon _i)_{i\in \mathbb {Z}}\) and

$$\begin{aligned} X'_{i1}=g(\epsilon _i,\ldots ,\epsilon _1,\epsilon '_0,\epsilon _{-1},\ldots ) \end{aligned}$$

we define the physical dependence measure of order q (see Wu (2005))

$$\begin{aligned} \delta _q(i)={\mathbb E}\big [ |X_{i1}-X'_{i1}|^q\big ]^{1/q} \quad \text { and } \quad \Psi _q(k)=\Big [\sum _{i=k}^\infty (\delta _q(i))^2\Big ]^{1/2}. \end{aligned}$$

Then, we conclude from Lemma 3 of Xiao and Wu (2013) and Theorem 3.12 the following statement.

Assume that \(0<\Psi _4(0)<\infty\) and \({\text {Var}}(X_{i1}X_{j1})>0\) for all \(i,j\in \mathbb {Z}\) and \(|{\text {Cor}}(X_{i1}X_{j1},X_{k1}X_{l1})|<1\) for all ijkl, such that they are not all the same. Then, if either one of the conditions

  1. (i)

    \(\Psi _q(k)=o(1/\log k)\) as \(k\rightarrow \infty\) and one of the assumptions (B4) and (B4’) or

  2. (ii)

    \(\sum _{j=0}^p(\Psi _4(j))^2=O(p^{1-\delta })\) for some \(\delta >0\) and one of the assumptions (B4’) or (B4”)

is satisfied, we have

$$\begin{aligned} N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N,\qquad n\rightarrow \infty . \end{aligned}$$

As a special case we consider the linear process \(X_{i1}=\sum _{j=0}^\infty a_j\epsilon _{i-j}\), where the \(\epsilon _j\) are iid with \({\mathbb E}[\epsilon _j]=0\) and \({\mathbb E}[|\epsilon _j|^q]<\infty\) with \(q\ge 4\) and for \(a_j\in \mathbb {R}\) it holds that \(\sum _{j=0}^\infty a_j^2 \in (0,\infty )\). Then the physical dependence measure is given by \(\delta _q(j)=|a_j|\,{\mathbb E}\big [ |\epsilon _0-\epsilon '_0|^q\big ]^{1/q}\). Moreover, the conditions \(0<\Psi _4(0)<\infty\) and \({\text {Var}}(X_{i1}X_{j1})>0\) for all \(i,j\in \mathbb {Z}\) and \(|{\text {Cor}}(X_{i1}X_{j1},X_{k1}X_{l1})|<1\) for all ijkl, such that they are not all the same, are fulfilled. If \(a_j=j^{-\beta }\ell (j)\), where \(1/2<\beta <1\) and \(\ell\) is a slowly varying function, then \((X_{i1})\) is a long memory process. The smaller the value of \(\beta\), the stronger is the dependence between the \((X_{i1})\). If one of the assumptions (B4) or (B4’) is satisfied, then condition (i) is fulfilled for every \(\beta \in (1/2,1)\).

Example 3.14

(Non-stationary linear processes). As in the previous example, \(\textbf{x}_1, \ldots , \textbf{x}_n\) are iid random vectors. Now \(\textbf{x}_1=(X_{11},\ldots ,X_{p1})\) is given by

$$\begin{aligned} X_{i1}=\sum _{t\in \mathbb {Z}}f_{i,t}\epsilon _{i-t}, \qquad i\ge 1, \end{aligned}$$

where \((\epsilon _i)_{i\in \mathbb {Z}}\) is a sequence of iid random variables with mean zero, variance one and finite fourth moment and the sequences \((f_{i,t})_{t\in \mathbb {Z}}\) satisfy \(\sum _{t\in \mathbb {Z}}f_{i,t}^2=1\). Let \(\kappa _4\) be the fourth cumulant of \(\epsilon _0\) and

$$\begin{aligned} h_n(k)=\sup _{1\le i\le p}\Big (\sum _{|t|=\lfloor k/2\rfloor }^\infty f_{i,t}^2\Big )^{1/2}. \end{aligned}$$

Assume that \(\kappa _4>-2\) and

$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sup _{1\le i<j\le p}\Big |\sum _{t\in \mathbb {Z}}f_{i,i-t}f_{j,j-t}\Big |<1. \end{aligned}$$
(3.10)

By Section 3.2 of Xiao and Wu (2013, p. 2904-2905) and Theorem 3.12 we get the following result. If either

  1. (i)

    \(h_n(k_n)\log k_n=o(1)\) for any positive sequence \(k_n\) such that \(k_n\rightarrow \infty\) as \(n\rightarrow \infty\) and one of the assumptions (B4) and (B4’) or

  2. (ii)

    \(\sum _{k=1}^p(h_n(k))^2=O(p^{1-\delta })\) for some \(\delta >0\) and one of the assumptions (B4’) or (B4”)

holds, then we have \(N_n^{(W)}{\mathop {\rightarrow }\limits ^{d}}N\) as \(n\rightarrow \infty\).

To illustrate these assumptions we consider the special case \(\textbf{x}_1:=(\epsilon _1,\ldots ,\epsilon _p) A_n\), where \(A_n\in \mathbb {R}^{p\times p}\) is a deterministic, symmetric matrix with \((A_n)_{i,j}= a_{ij}\) for \(1\le i,j\le p\). We assume that \(\sum _{t=1}^p a_{it}^2=1\) for every \(1\le i\le p\).

The covariance matrix of \(\textbf{x}_1\) is given by \({\text {Cov}}(\textbf{x}_1)=A_nA_n^T\) with \((A_nA_n^T)_{ij}=\sum _{t=1}^p a_{it}a_{jt}\). Observe that the diagonal entries are equal to 1. To satisfy assumption (3.10) we have to assume that the entries except for the diagonal are asymptotically smaller than 1, i.e.

$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sup _{1\le i<j\le p}\Big |\sum _{t=1}^p a_{it}a_{jt}\Big |<1. \end{aligned}$$

We set

$$\begin{aligned} h_n(k)=\sup _{1\le i\le p}\Big (\sum _{t=1}^{i-\lfloor k/2 \rfloor } a_{it}^2+\sum _{t=\lfloor k/2 \rfloor +i}^{p}a_{it}^2\Big )^{1/2} \end{aligned}$$

as a measure of how close the matrices \(A_n\) are to diagonal matrices. For the point process convergence either (i) or (ii) has to be satisfied for \(h_n\).

4 Proofs of the results

4.1 Proofs of the results in Section 2

Proof of Theorem 2.1

. We will follow the lines of the proof of Theorem 2.1 in Dabrowski et al. (2002). Since the mean measure \(\eta\) has a density, the limit process M is simple and we can apply Kallenberg’s Theorem (see for instance Embrechts et al. (1997), p.233, Theorem 5.2.2) or Kallenberg 1983, p.35, Theorem 4.7). Therefore, it suffices to prove that for any finite union of bounded rectangles

$$\begin{aligned} R=\bigcup _{k=1}^q A_k\times B_k\subset S,\qquad \text {with}\qquad A_k=\mathbin {\mathop {\otimes }\limits _{l=1}^{m}} (r_k^{(l)},s_k^{(l)}],\quad B_k=(r_k^{(m+1)},s_k^{(m+1)}], \end{aligned}$$

it holds that

  1. (1)

    \(\lim \limits _{n\rightarrow \infty } \eta _n(R)=\eta (R)\),

  2. (2)

    \(\lim \limits _{n\rightarrow \infty } {\mathbb P}(M_n(R)=0)={{\,\textrm{e}\,}}^{-\eta (R)}\).

Without loss of generality we can assume that the \(A_k\)’s are chosen to be disjoint. First we will show (1). Set \(T:=T_{(1,2,\ldots , m)}=g_n(\textbf{x}_1,\textbf{x}_2,\ldots , \textbf{x}_m)\). If \(q=1\) we get

$$\begin{aligned} \eta _n(R)={\mathbb E}[M_n(A_1\times B_1)]&=\sum _{\textbf{i}:\textbf{i}/p \in A_1}{\mathbb P}(T_{\textbf{i}} \in B_1)\\&\sim p^{m}\, \prod _{l=1}^m (s_1^{(l)}-r_1^{(l)}){\mathbb P}(T\in B_1)\,. \end{aligned}$$

Since assumption (A1) implies \(p^m /(m!)\,{\mathbb P}(T\in B_1)\rightarrow \mu (B_1)\), we obtain the convergence \(\eta _n(R)\rightarrow \eta (R)\) as \(n\rightarrow \infty\). The case \(q\ge 1\) follows by

$$\begin{aligned} \eta _n(R)=\sum _{k=1}^q\eta _n(A_k\times B_k)\rightarrow \sum _{k=1}^q\eta (A_k\times B_k)=\eta (R),\qquad n\rightarrow \infty . \end{aligned}$$

To show (2), we let \(P_n\) be the probability mass function of the Poisson distribution with mean \(\eta _n(R)\). Then we have

$$\begin{aligned} |{\mathbb P}(M_n(R)=0)-{\mathbb P}(M(R)=0)|&\le |{\mathbb P}(M_n(R)=0)-P_n(0)|+|P_n(0)-{\mathbb P}(M(R)=0)|\\&=|{\mathbb P}(M_n(R)=0)-P_n(0)|+o(1), \end{aligned}$$

where the last equality holds by (1). Therefore, we only have to estimate \(|{\mathbb P}(M_n(R)=0)-P_n(0)|\). For this we employ the Stein-Chen method (see Barbour et al. (1992) for a discussion). The Stein equation for the Poisson distribution \(P_n\) with mean \(\eta _n(R)\) is given by

$$\begin{aligned} \eta _n(R) x(j+1)-jx(j)=\mathbbm {1}_{\{j=0\}}-P_n(0),\qquad j\ge 0. \end{aligned}$$
(4.11)

This equation is solved by the function

$$\begin{aligned} x(0)&=0\\ x(j+1)&=\frac{j!}{\eta _n(R)^{j+1}}{{\,\textrm{e}\,}}^{\eta _n(R)}(P_n(\{0\})-P_n(\{0\})P_n(\{0,\ldots ,j\})\,,\quad j=0,1,\ldots \end{aligned}$$

By (4.11) we see that

$$\begin{aligned} |{\mathbb P}(M_n(R)=0)-P_n(0)|=|{\mathbb E}[\eta _n(R) x(M_n(R)+1)-M_n(R)x(M_n(R))]|. \end{aligned}$$
(4.12)

Therefore, we only have to estimate the right hand side of (4.12) and to this end we set

$$\begin{aligned} D&:=\{\textbf{k}:\,\textbf{k}=(k_1, k_2,\ldots ,k_m), 1\le k_1<k_2<\ldots <k_m\le p\},\\ I_\textbf{k}&:=\sum _{i=1}^q\mathbbm {1}_{A_i}(\textbf{k}/p)\mathbbm {1}_{B_i}(T_\textbf{k}),\\ \eta _\textbf{k}&:={\mathbb E}[I_\textbf{k}]. \end{aligned}$$

For \(\textbf{k}\in D\) let

$$\begin{aligned} D_{1\textbf{k}}&:=\{\ell \in D:\, \ell _i\ne k_j,\, i,j=1,2,\ldots ,m\}\quad \text {and}\\ D_{2\textbf{k}}&:=\{\ell \in D:\, \ell \ne \textbf{k},\, \ell _i=k_j\,\,\text {for some}\,\, i,j=1,2,\ldots ,m\}. \end{aligned}$$

Then we have the disjoint union \(D=D_{1\textbf{k}}\overset{.}{\cup }\ D_{2\textbf{k}}\overset{.}{\cup }\ \{\textbf{k}\}\), and therefore,

$$\begin{aligned} M_n(R)=\sum _{\ell \in D}I_\ell =\sum _{\ell \in D_{1\textbf{k}}}I_\ell +\Big (I_\textbf{k}+\sum _{\ell \in D_{2\textbf{k}}}I_\ell \Big )=:M^{(1)}_n(\textbf{k})+M^{(2)}_n(\textbf{k}). \end{aligned}$$

Now, we bound (4.12) by

$$\begin{aligned} \begin{aligned}&\Big |\sum _{\textbf{k}\in D}{\mathbb E}[\eta _\textbf{k}x(M_n(R)+1)-I_\textbf{k}x(M_n(R))]\Big |\\&\le \Big |\sum _{\textbf{k}\in D}\eta _\textbf{k}{\mathbb E}[x(M_n(R)+1)-x(M_n^{(1)}(\textbf{k})+1)]\Big |+\Big |\sum _{\textbf{k}\in D}[{\mathbb E}[I_\textbf{k}x(M_n(R))]-\eta _\textbf{k}{\mathbb E}[x(M_n^{(1)}(\textbf{k})+1)]]\Big |. \end{aligned} \end{aligned}$$
(4.13)

It suffices to show that both terms in (4.13) tend to zero as \(n\rightarrow \infty\). From Barbour and Eagleson (1984, p.400) we have the following bound for the increments of the solution of Stein’s equation

$$\begin{aligned} \Delta x:=\sup _{j\in \mathbb {N}_0}|x(j+1)-x(j)|\le \min (1,1/\eta _n(R)). \end{aligned}$$
(4.14)

Using (4.14) the first term of (4.13) is bounded above by

$$\begin{aligned} \sum _{\textbf{k}\in D}\eta _\textbf{k}|{\mathbb E}[x(M_n^{(1)}(\textbf{k})+M_n^{(2)}(\textbf{k})+1)-x(M_n^{(1)}(\textbf{k})+1)]|\le \sum _{\textbf{k}\in D}\eta _\textbf{k}{\mathbb E}[M_n^{(2)}(\textbf{k})]. \end{aligned}$$

Using the definitions of \(\eta _\textbf{k}\) and \(M_n^{(2)}(\textbf{k})\), we get

$$\begin{aligned} \begin{aligned}&\sum _{\textbf{k}\in D}\eta _\textbf{k}{\mathbb E}[M_n^{(2)}(\textbf{k})]\\&=\sum _{\textbf{k}\in D}\Big (\sum _{i=1}^q\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p}){\mathbb P}(T\in B_i)\Big )\Big (\sum _{\ell \in D}\sum _{j=1}^q\mathbbm {1}_{A_j}\big (\tfrac{\ell }{p}\big ){\mathbb P}(T\in B_j)-\sum _{\ell \in D_{1\textbf{k}}}\sum _{j=1}^q\mathbbm {1}_{A_j}(\tfrac{\ell }{p}){\mathbb P}(T\in B_j)\Big )\\&=\sum _{i=1}^q\sum _{j=1}^q{\mathbb P}(T\in B_i){\mathbb P}(T\in B_j)\sum _{\textbf{k}\in D}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\sum _{\ell \in D}\mathbbm {1}_{A_j}(\tfrac{\ell }{p})\\&-\sum _{i=1}^q\sum _{j=1}^q{\mathbb P}(T\in B_i){\mathbb P}(T\in B_j)\sum _{\textbf{k}\in D}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\sum _{\ell \in D_{1\textbf{k}}}\mathbbm {1}_{A_j}(\tfrac{\ell }{p}). \end{aligned} \end{aligned}$$
(4.15)

Since by assumption (A1) it holds that \(p^m{\mathbb P}(T\in B_i)\rightarrow m!(\mu (r_i^{(m+1)})-\mu (s_i^{(m+1)}))\) as \(n\rightarrow \infty\), and

$$\begin{aligned} \frac{1}{p^m}\sum _{\textbf{k}\in D}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})&\rightarrow \lambda _m(A_i),\qquad n\rightarrow \infty \\ \frac{1}{p^{2m}}\sum _{\textbf{k}\in D}\sum _{\ell \in D_{1\textbf{k}}}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\mathbbm {1}_{A_j}(\tfrac{\ell }{p})&\rightarrow \lambda _m(A_i)\lambda _m(A_j)\qquad n\rightarrow \infty , \end{aligned}$$

(4.15) and thus the first term of (4.13) tend to zero as \(n\rightarrow \infty\). As every \(I_\textbf{k}\) only depends on \(T_\textbf{k}\) and because \(D_{1\ell }\) only contains elements which have no component in common with \(\ell\), \(M_n^{(1)}(\ell )\) and \(I_\ell\) are independent. Therefore, the second term of (4.13) equals

$$\begin{aligned} \Big |\sum _{\textbf{k}\in D}{\mathbb E}[I_\textbf{k}(x(M_n(R))-x(M_n^{(1)}(\textbf{k})+1)]\Big |\le \Delta x\sum _{\textbf{k}\in D}{\mathbb E}[I_\textbf{k}(M_n^{(2)}(\textbf{k})-1)], \end{aligned}$$
(4.16)

where the last inequality follows from (4.14). Since \(I_\textbf{k}\le 1\) because the \(A_i\) are disjoint, the right-hand side in (4.16) is bounded above by

$$\begin{aligned} \begin{aligned}&\Delta x\sum _{\textbf{k}\in D}{\mathbb E}[I_\textbf{k}\sum _{\ell \in D_{2\textbf{k}}}I_\ell ]\\&=\Delta x \sum _{\textbf{k}\in D}\sum _{\ell \in D_{2\textbf{k}}}{\mathbb E}\Big [\Big (\sum _{i=1}^q\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\mathbbm {1}_{B_i}(T_\textbf{k}) \Big )\Big (\sum _{j=1}^q\mathbbm {1}_{A_j}(\tfrac{\ell }{p})\mathbbm {1}_{B_j}(T_\ell )\Big )\Big ]\\&=\Delta x \sum _{i=1}^q\sum _{j=1}^q\sum _{\textbf{k}\in D}\sum _{\ell \in D_{2\textbf{k}}}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\mathbbm {1}_{A_j}(\tfrac{\ell }{p}){\mathbb P}(T_\textbf{k}\in B_i,T_\ell \in B_j) \end{aligned} \end{aligned}$$
(4.17)

We set \(D_{2\textbf{k},r}:=\{\ell \in D:|\{\ell _1,\ldots , \ell _m,k_1,\ldots ,k_m\}|=2m-r\}\). Notice that \(\dot{\bigcup }_{r=1}^{m-1} D_{2\textbf{k},r}=D_{2\textbf{k}}\). Therefore, (4.17) is equal to

$$\begin{aligned} \Delta x \sum _{i=1}^q\sum _{j=1}^q\sum _{r=1}^{m-1}\sum _{\textbf{k}\in D}\sum _{\ell \in D_{2\textbf{k},r}}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\mathbbm {1}_{A_j}(\tfrac{\ell }{p}){\mathbb P}(T_\textbf{k}\in B_i,T_\ell \in B_j). \end{aligned}$$

By assumption (A2), we have \(p^{2m-r}{\mathbb P}(T_\textbf{k}\in B_i,T_\ell \in B_j)\rightarrow 0\) for \(r=1, \ldots m-1\) as \(n\rightarrow \infty\). Additionally, it holds that

$$\begin{aligned} \frac{1}{p^{2m-r}}\sum _{\textbf{k}\in D}\sum _{\ell \in D_{2\textbf{k},r}}\mathbbm {1}_{A_i}(\tfrac{\textbf{k}}{p})\mathbbm {1}_{A_j}(\tfrac{\ell }{p})=O(1),\qquad n\rightarrow \infty . \end{aligned}$$

Consequently the second term of (4.13) tends to zero as \(n\rightarrow \infty\). This completes the proof.\(\square\)

Proof of Proposition 2.5

We proceed similarly to the proof of Proposition 4.20 of Resnick (2008). We want to show that \(Y_n{\mathop {\rightarrow }\limits ^{d}}Y\). Therefore, we define a map from the space of point measures \(\mathcal {M}(S)\) to D(0, 1], the space of right continuous functions on (0, 1] with finite limits existing from the left, and show that this map is continuous. Then, the Proposition follows by the continuous mapping theorem.

To this end, for a point measure \(\textbf{m}=\sum _{k=1}^\infty \varepsilon _{(t_k,y_k)} \in \mathcal {M}(S)\) we define \(V_1:\mathcal {M}(S)\rightarrow D(0,1]\) through

$$\begin{aligned} V_1(\textbf{m})=V_1\Big (\sum _{k=1}^\infty \varepsilon _{(t_k,y_k)}\Big )={\left\{ \begin{array}{ll} &{}v_\textbf{m}:(0,1]\rightarrow (v,w)\\ &{}v_\textbf{m}(t)={\left\{ \begin{array}{ll} \bigwedge \limits _{k:t_k\le t} y_k,\,\, \textbf{m}(((0,1]^{m-1}\times (0,t]\times (v,w))\cap S)>0\\ \bigwedge \limits _{k:t_k=t^*}y_k,\,\, \text {otherwise}, \end{array}\right. } \end{array}\right. } \end{aligned}$$

where \(t^*=\sup \{s>0:\textbf{m}(((0,1]^{m-1}\times (0,s]\times (v,w))\cap S)=0\}\). \(V_1\) is well-defined except at \(\textbf{m}\equiv 0\). Recalling the definition of \(N_n\) in (2.3), we note that \(V_1(N_n)(t)=Y_n(t)\) and \(V_1(N)(t)=Y(t)\) for \(0<t\le 1\).

We will start by proving the continuity of \(V_1\) in the case, where \(\mu (x)=-\log (H(x))\) and H is the Gumbel distribution. In this case, N has a.s. the following properties

$$\begin{aligned} N(((0,1]^{m-1}\times \{1\}\times (-\infty ,\infty ))\cap S)&=0,\\ N(((0,1]^{m-1}\times (0,t] \times (x,\infty ))\cap S)&<\infty ,\\ N(((0,1]^{m-1}\times [s,t] \times (-\infty ,x))\cap S)&=\infty , \end{aligned}$$

for any \(0<s<t<1\) and \(x\in \mathbb {R}\). Therefore, we only have to show continuity at \(\textbf{m}\in \mathcal {M}(S)\) with these properties. Let \((\textbf{m}_n)_n\) be a sequence of point measures in \(\mathcal {M}(S)\), which converges vaguely to \(\textbf{m}\) (\(\textbf{m}_n{\mathop {\rightarrow }\limits ^{v}}\textbf{m}\)) as \(n\rightarrow \infty\) (see Resnick 2008, p. 140). Since \(V_1(\textbf{m})\) is right continuous there exists a right continuous extension on [0, 1], which we denote with \(\widetilde{V_1(\textbf{m})}\). Now choose \(\beta <\widetilde{V_1(\textbf{m})}(0)\) such that \(\textbf{m}(S_1\times \{\beta \})=0\). As \(\textbf{m}_n{\mathop {\rightarrow }\limits ^{v}}\textbf{m}\), we can conclude from Resnick (2008, Proposition 3.12) that there exists a \(1\le q<\infty\) such that for n large enough

$$\begin{aligned} \textbf{m}_n(S_1\times (\beta ,\infty ))=\textbf{m}(S_1\times (\beta ,\infty ))=q. \end{aligned}$$

We enumerate and designate the q points in the following way \(((t_i^{(n)},j_i^{(n)}),\, 1\le i\le q)\) with \(0<t_{1,m}^{(n)}<\ldots<t_{q,m}^{(n)}<1\), where \(t_{i,m}^{(n)}\) is the m-th component of \(t_i^{(n)}\), such that by Resnick (2008, Proposition 3.13)

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }((t_i^{(n)},j_i^{(n)}),\, 1\le i\le q)=((t_i,j_i),\, 1\le i\le q), \end{aligned}$$

where \(((t_i,j_i),\, 1\le i\le q)\) is the analogous enumeration of points of \(\textbf{m}\) in \(S_1\times (\beta ,\infty )\). Now choose

$$\begin{aligned} \delta < \frac{1}{2} \min \limits _{1\le i,j\le q}\Vert t_i-t_j\Vert _2 \end{aligned}$$

small enough so that the \(\delta\)-spheres of the distinct points of the set \(\{(t_i,j_i)\}\) are disjoint and in \(S_1\times [\beta ,\infty )\). Pick n so large that every \(\delta\)-sphere contains only one point of \(\textbf{m}_n\). Then set \(\lambda _n:[0,1]\rightarrow [0,1]\) with \(\lambda _n(0)=0\), \(\lambda _n(1)=1\), \(\lambda _n(t_{i,m})=t_{i,m}^{(n)}\) and \(\lambda _n\) is linearly interpolated elsewhere on [0, 1]. For this \(\lambda _n\) it holds that

$$\begin{aligned} \sup \limits _{0\le t\le 1}|\widetilde{V_1(\textbf{m}_n)}(t)-\widetilde{V_1(\textbf{m})}(\lambda _n(t))|&<\delta \quad \text {and} \quad \sup \limits _{0\le t\le 1}|\lambda _n(t)-t|<\delta . \end{aligned}$$

Thereby, we get

$$\begin{aligned} \tilde{\mathcal {D}}(\widetilde{V_1(\textbf{m}_n)},\widetilde{V_1(\textbf{m})})=\mathcal {D}(V_1(\textbf{m}_n),V_1(\textbf{m}))<\delta , \end{aligned}$$

which finishes the proof. The Fréchet and the Weibull case follow by similar arguments.\(\square\)

Proof of Theorem 2.6

We will proceed similarly as in (Resnick 2008, p. 217-218) using the continuous mapping theorem again. Since Y is the restriction to (0, 1] of an extremal process (see Resnick 2008, Section 4.3), it is a nondecreasing function, which is constant between isolated jumps. Let \(D^\uparrow (0,1]\) be the subset of D(0, 1] that contains all functions with this property. Set

$$\begin{aligned} V_2:&D^\uparrow (0,1]\rightarrow \mathcal {M}(0,1]\\&x\mapsto \sum _{i=1}^\infty \varepsilon _{t_i}, \end{aligned}$$

where \(\{t_i\}\) are the discontinuity points of x. Then \(V_2(Y_n)=\sum _{k=1}^p\varepsilon _{p^{-1}L(k)}\) and \(V_2(Y)=\sum _{k=1}^\infty \varepsilon _{\tau _k}\), where \((\tau _k)_k\) is the sequence of discontinuity points of the extremal process generated by the Gumbel distribution \(H=\Lambda\), c.f. above Theorem 2.6. By (Embrechts et al. 1997, Theorem 5.4.7) the point process \(\sum _{k=1}^\infty \varepsilon _{\tau _k}\) is a PRM with mean measure \(\nu (a,b)=\log (b/a)\) for \(0<a<b\le 1\). According to Proposition 2.5, it suffices to show that \(V_2\) is continuous. Let \((x_n)_n\) be a sequence of functions in \(D^\uparrow (0,1]\) with \(\mathcal {D}(x_n,x)\rightarrow 0\) as \(n\rightarrow \infty\) for an \(x\in D^\uparrow (0,1]\). Then there exist \(\lambda _n\in \Lambda _{[0,1]}\) such that

$$\begin{aligned} \sup \limits _{0\le t\le 1}|\tilde{x}_n(\lambda _n(t))-\tilde{x}(t)|&\rightarrow 0\quad \text {and}\end{aligned}$$
(4.18)
$$\begin{aligned} \sup \limits _{0\le t\le 1}|\lambda _n(t)-t|&\rightarrow 0,\qquad n\rightarrow \infty \,, \end{aligned}$$
(4.19)

where \(\tilde{x}_n\) and \(\tilde{x}\) are the right continuous extensions of \(x_n\) and x on [0, 1]. We want to prove the vague convergence

$$\begin{aligned} V_2(x_n)=\sum _{i=1}^\infty \varepsilon _{t_i^{(n)}}{\mathop {\rightarrow }\limits ^{v}} V_2(x)=\sum _{i=1}^\infty \varepsilon _{t_i}, \end{aligned}$$

where \(\{t_i^{(n)}\}\) and \(\{t_i\}\) are the discontinuity points of \(x_n\) and x, respectively. Consider an arbitrary continuous function f on (0, 1] with compact support contained in an interval [ab] with \(0<a<b\le 1\), and x is continuous at a and b. It suffices to show that

$$\begin{aligned} \lim _{n\rightarrow \infty }\sum _{i=1}^\infty f(t_i^{(n)})\mathbbm {1}_{[a,b]}(t_i^{(n)})=\sum _{i=1}^\infty f(t_i)\mathbbm {1}_{[a,b]}(t_i). \end{aligned}$$
(4.20)

The functions \(x_n,x\in D^\uparrow (0,1]\) have only finitely many discontinuity points in [ab]. Therefore, only a finite number of terms in the sums are not equal to zero. Because of (4.18) and (4.19) the jump times on [ab] of \(x_n\) are close to those of x, which proves (4.20). Hence, \(V_2\) is continuous, which finishes the proof.

4.2 Proof of Proposition 3.7

Let \(\pi\) denote the permutation of \(\{1,\ldots ,n\}\) induced by the order statistics of \(X_{2 1}, \ldots , X_{2 n}\), i.e.,

$$\begin{aligned} X_{2 \pi (1)}> X_{2 \pi (2)}> \cdots > X_{2 \pi (n)} \quad \mathrm{a.s.}\,, \end{aligned}$$

where the continuity of the distribution of X was used to avoid ties. We can rewrite \(\tau _{12}\) as

$$\begin{aligned} \begin{aligned} \tau _{12}&= \tfrac{2}{n(n-1)} \sum _{1\le s<t\le n} \textrm{sign}(X_{1s}-X_{1t}) \textrm{sign}(X_{2s}-X_{2t})\\&= \tfrac{2}{n(n-1)} \sum _{1\le s<t\le n} \textrm{sign}(X_{1\pi (s)}-X_{1\pi (t)}) \underbrace{\textrm{sign}(X_{2\pi (s)}-X_{2\pi (t)})}_{=1 \quad \mathrm{a.s.}}\\&{\mathop {=}\limits ^{d}}\tfrac{2}{n(n-1)} \sum _{1\le s<t\le n} \textrm{sign}(X_{1s}-X_{1t}). \end{aligned} \end{aligned}$$
(4.21)

Let \(\textbf{q}_n=(q_1,\ldots ,q_n)\) be a permutation of the set \(\{1, \ldots ,n\}\). If \(i<j\) and \(q_i>q_j\), we call the pair \((q_i, q_j)\) an inversion of the permutation \(\textbf{q}_n\).

Since the \(X_{11},\ldots , X_{1n}\) are iid, the permutation

$$\begin{aligned} \textbf{q}_n= (Q_{11}, Q_{12}, \ldots ,Q_{1n}) \end{aligned}$$

consisting of the ranks is uniformly distributed on the set of the n! permutations of \(\{1,\ldots ,n\}\). By \(I_n\) we denote the number of inversions of \(\textbf{q}_n\). For \(s<t\), we have

$$\begin{aligned} \textrm{sign}(X_{1s}-X_{1t}) = \left\{ \begin{array}{cl} 1, &{} \quad \text { if } Q_{1s}<Q_{1t}\,, \\ -1, &{} \quad \text { if } Q_{1s}>Q_{1t} \, \Leftrightarrow \text { inversion at } (s,t)\,. \end{array} \right. \end{aligned}$$

In view of (4.21), this implies

$$\begin{aligned} {\begin{matrix} \tau _{12} {\mathop {=}\limits ^{d}}\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{-1} \sum _{1\le s<t\le n} \textrm{sign}(X_{1s}-X_{1t}) = \left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{-1} \Big [ \left( {\begin{array}{c}n\\ 2\end{array}}\right) - 2 \, I_n \Big ] = 1-\tfrac{4}{n(n-1)} I_n\,. \end{matrix}} \end{aligned}$$

By Kendall and Stuart (1973, p. 479) or Margolius (2001, p. 3) (see also Sachkov 1997) the moment generating function of \(I_n\) is

$$\begin{aligned} {\mathbb E}\Big [ {{\,\textrm{e}\,}}^{t I_n} \Big ] = \prod _{j=1}^n \frac{1-{{\,\textrm{e}\,}}^{jt}}{j(1-{{\,\textrm{e}\,}}^t)}\,, \quad t\in \mathbb {R}\,. \end{aligned}$$

We recognize that \(\frac{1-{{\,\textrm{e}\,}}^{jt}}{j(1-{{\,\textrm{e}\,}}^t)}\) is the moment generating function of a uniform distribution on the integers \(0, 1, \ldots , j-1\). Let \((U_i)_{i\ge 1}\) be a sequence of independent random variables such that \(U_i\) is uniformly distributed on the integers \(0, 1, \ldots , i\). We get

$$\begin{aligned} {\begin{matrix} 1-\tfrac{4}{n(n-1)} I_n&{\mathop {=}\limits ^{d}}1-\tfrac{4}{n(n-1)} \sum _{i=1}^{n-1} U_i {\mathop {=}\limits ^{d}}\tfrac{4}{n(n-1)} \sum _{i=1}^{n-1} (U_i-i/2)\,, \end{matrix}} \end{aligned}$$

establishing the desired result.

4.3 Proof of Proposition 3.10

Our idea is to transfer the convergence of \(N_n^X\) onto \(N_n^Y\). To this end, it suffices to show (see Kallenberg 1983, Theorem 4.2) that for any continuous function f on \(\mathbb {R}\) with compact support,

$$\begin{aligned} \int f \,\textrm{d}N_n^Y - \int f \,\textrm{d}N_n^X {\mathop {\rightarrow }\limits ^{{\mathbb P}}}0\,, \quad n\rightarrow \infty \,. \end{aligned}$$

Suppose the compact support of f is contained in \([K+\gamma _0, \infty )\) for some \(\gamma _0>0\) and \(K\in \mathbb {R}\). Since f is uniformly continuous, \(\omega (\gamma ):= \sup \{|f(x)-f(y)|: x,y \in \mathbb {R}, |x-y| \le \gamma \}\) tends to zero as \(\gamma \rightarrow 0\). We have to show that for any \(\varepsilon >0\),

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb P}\Big ( \Big | \sum _{i=1}^p \Big ( f(Y_{i,n})-f(X_{i,n}) \Big )\Big | >\varepsilon \Big ) =0\,. \end{aligned}$$
(4.22)

On the sets

$$\begin{aligned} A_{n,\gamma }= \Big \{ \max _{i=1,\ldots ,p} \big | Y_{i,n}-X_{i,n} \big | \le \gamma \Big \}\, ,\quad \gamma \in (0, \gamma _0) \,, \end{aligned}$$

we have

$$\begin{aligned} \big |f(Y_{i,n})-f(X_{i,n}) \big | \le \omega (\gamma ) \,\mathbbm {1}_{\{ X_{i,n} >K\}}\,. \end{aligned}$$

Therefore, we see that, for \(\gamma \in (0, \gamma _0)\),

$$\begin{aligned} \begin{aligned} {\mathbb P}&\Big ( \Big | \sum _{i=1}^p \Big ( f(Y_{i,n})-f(X_{i,n}) \Big )\Big |>\varepsilon , A_{n,\gamma } \Big )\\&\le {\mathbb P}\Big ( \omega (\gamma )\, \#\{1\le i \le p: X_{i,n}>K \}> \varepsilon \Big )\\&\le \frac{\omega (\gamma )}{\varepsilon } {\mathbb E}\big [ \#\{1\le i \le p: X_{i,n} >K \}\big ]\\&= \frac{\omega (\gamma )}{\varepsilon } {\mathbb E}N_n^X((K,\infty ])\\&\rightarrow \frac{\omega (\gamma )}{\varepsilon } {\mathbb E}N((K,\infty ]), \qquad n\rightarrow \infty . \end{aligned} \end{aligned}$$

By assumption, it holds \(\lim _{n\rightarrow \infty } {\mathbb P}(A_{n,\gamma }^c) =0\). Thus, letting \(\gamma \rightarrow 0\) establishes (4.22).