Point process convergence for symmetric functions of high-dimensional random vectors

The convergence of a sequence of point processes with dependent points, defined by a symmetric function of iid high-dimensional random vectors, to a Poisson random measure is proved. This also implies the convergence of the joint distribution of a fixed number of upper order statistics. As applications of the result a generalization of maximum convergence to point process convergence is given for simple linear rank statistics, rank-type U-statistics and the entries of sample covariance matrices.


Introduction
In classical extreme value theory the asymptotic distribution of the maximum of random points plays a central role.Maximum type statistics build popular tests on the dependency structure of high-dimensional data.Especially, against sparse alternatives those tests possess good power properties (see [15,18,34]).Closely related to the maxima of random points are point processes, which play an important role in stochastic geometry and data analysis.They have applications in statistical ecology, astrostatistics and spatial epidemiology [1].For a sequence (Y i ) i of real-valued random variables, we set where ε x is the Dirac measure in x.Let K := (0, 1)×(u, ∞) with u ∈ R.Then, M p (K) counts the number of exceedances of the threshold u by the random variables Y 1 , . . ., Y p .If Y (k)  denotes the k-th upper order statistic of Y 1 , . . ., Y p , it holds that { M p (K) < k} = {Y (k) ≤ u}, and in particular { M p (K) = 0} = {max i=1,...,p Y i ≤ u}.Therefore, the weak convergence of a sequence of point processes gives information about the joint asymptotic distribution of a fixed number of upper order statistics.If the sequence (Y i ) i consists of independent and identically distributed (iid) random variables, maximum convergence and point process convergence are equivalent, but if the random variables exhibit dependency, this equivalence does not necessarily hold anymore.In this sense, point process convergence is a substantial generalization of the maximum convergence.Additionally, the time components i/p deliver valuable information of the random time points when a record occurs, i.e., the time points when Y j > max i=1,...,j−1 Y i .Our main motivation comes from statistical inference for high-dimensional data, where the asymptotic distribution of the maximum of dependent random variables has found several applications in recent years (see for example [6,7,8,9,15,17,18,34]).The objective of this paper is to provide the methodology to extend meaningful results with reference to the convergence of the maximum of dependent random variables, to point process convergence.
To this end, we consider dependent points T i := g n,p (x i 1 , x i 2 , . . ., x im ), where the index i = (i 1 , i 2 , . . ., i m ) ∈ {1, . . ., p} m .The random vectors x 1 , . . ., x p are iid on R n and g n,p : R mn → R is a measurable, symmetric function.Important examples include U-statistics, simple linear rank statistics, rank-type U-statistics, the entries of sample covariance matrices or interpoint distances.
Additionally, we assume that the dimension of the points n is growing with the number of points p.Over the last decades the environment and therefore the requirements for statistical methods have changed fundamentally.Due to the huge improvement of computing power and data acquisition technologies one is confronted with large data sets, where the dimension of observations is as large or even larger than the sample size.These highdimensional data occur naturally in online networks, genomics, financial engineering, wireless communication or image analysis (see [11,14,21]).Hence, the analysis of high-dimensional data has developed as a meaningful and active research area.
We will show that the corresponding point process of the points T i converges to a Poisson random measure (PRM) with a mean measure that involves the m-dimensional Lebesgue measure and an additional measure µ.If we replace the points T i with iid random variables with the same distribution, the (non-degenerate) limiting distribution of the maximum will necessarily be an extreme value distribution of the form exp(−µ(x)).Moreover, the convergence of the corresponding point process will be equivalent to the condition However, since the random points T i are not independent, we additionally need the following assumption on the dependence structure P(g n,p (x 1 , x 2 , . . ., x m ) > x, g n,p (x m−l+1 , . . ., where l = 1, . . ., m − 1. In the finite-dimensional case where n is fixed, several results about point process convergence are available in similar settings.In [31], Silverman and Brown showed point process convergence for m = 2, n = 2 and g 2,p (x i , x j ) = a p ∥x i − x j ∥ 2  2 , where the x i have a bounded and almost everywhere continuous density, a p is a suitable scaling sequence and ∥ • ∥ 2 is the Euclidean norm on R 2 .In the Weibull case µ(x) = x α for x, α > 0, Dehling et al. [12] proved a generalization to points with a fixed dimension and g n,p (x i , x j ) = a p h(x i , x j ), where h is a measurable, symmetric function and a p is a suitable scaling sequence.
Also in the finite-dimensional case, under similar assumptions as in (1.1) with µ(x) = βx α for x, α > 0, β ∈ R and under condition (1.2), Schulte and Thäle [29] showed convergence in distribution of point processes towards a Weibull process.The points of these point processes are obtained by applying a symmetric function g n,p to all m-tuples of distinct points of a Poisson process on a standard Borel space.In [30], this result was extended to more general functions µ and to binomial processes so that other PRMs were possible limit processes.In [13], Decreusefond, Schulte and Thäle provided an upper bound of the Kantorovich-Rubinstein distance between a PRM and the point process induced in the aforementioned way by a Poisson or a binomial process on an abstract state space.Notice that convergence in Kantorovich-Rubinstein distance implies convergence in distribution (see [26,Theorem 2.2.1] or [13, p. 2149]).In [10] another point process result in a similar setting is given for the number of nearest neighbor balls in fixed dimension.Moreover, [4] presents a general framework for Poisson approximation of point processes on Polish spaces.
1.1.Structure of this paper.The remainder of this paper is structured as follows.In Section 2 we prove weak point process convergence for the dependent points T i in the high-dimensional case as tool for the generalization of the convergence of the maximum (Theorem 2.1).We provide popular representations of the limiting process in terms of the transformed points of a homogeneous Poisson process.Moreover, we derive point process convergence for the record times.In Section 3 these tools are applied to study statistics based on relative ranks like simple linear rank statistics or rank-type U-statistics.We also prove convergence of the point processes of the off-diagonal entries of large sample covariance matrices.The technical proofs are deferred to Section 4.

Point process convergence
We introduce the model that was briefly described in the introduction.Let x 1 , . . ., x p be iid R n -valued random vectors with x i = (X i1 , . . ., X in ) ⊤ , i = 1, . . ., p, where p = p n is some positive integer sequence tending to infinity as n → ∞.We consider the random points where i = (i 1 , i 2 , . . ., i m ) ∈ {1, . . ., p} m and g n = g n,p : R mn → R is a measurable and symmetric function, where symmetric means g n (y 1 , y 2 , . . ., y m ) = g n (y π(1) , y π(2) , . . ., y π(m) ) for all y 1 , y 2 , . . ., y m ∈ R n and all permutations π on {1, 2, . . ., m}.We are interested in the limit behavior of the point processes M n towards a PRM M , where i/p = (i 1 /p, . . ., i m /p).The limit M is a PRM with mean measure We consider the M n 's and M as random measures on the state space with values in M(S) the space of point measures on S, endowed with the vague topology (see [27]).The following result studies the convergence M n d → M , which denotes the convergence in distribution in M(S).
Theorem 2.1.Let x 1 , . . ., x p be n-dimensional, independent and identically distributed random vectors and p = p n is some sequence of positive integers tending to infinity as n → ∞.Additionally, let g = g n : R mn → (v, w) be a measurable and symmetric function, where v, w ∈ R = R ∪ {∞, −∞} and v < w.Assume that there exists a function µ : (v, w) → R + with lim x→v µ(x) = ∞ and lim x→w µ(x) = 0 such that, for x ∈ (v, w) and n → ∞, (A1) p m P(g n (x 1 , x 2 , . . ., x m ) > x) → µ(x) and (A2) Note that (A1) ensures the correct specification of the mean measure, while (A2) is an anti-clustering condition.Both conditions are standard in extreme value theory.It is worth mentioning that where we use the conventions µ(x) = 0 if x > w, µ(x) = ∞ if x < v, and exp(−∞) = 0.The typical distribution functions H are the Fréchet, Weibull and Gumbel distributions.In these cases, the limiting process M has a representation in terms of the transformed points of a homogeneous Poisson process.Let (U i ) i be an iid sequence of random vectors uniformly distributed on S 1 and Γ i = E 1 +. ..+E i , where (E i ) i is an iid sequence of standard exponentially distributed random variables, independent of (U i ) i .
It is well-known that N Γ := ∞ i=1 ε Γ i is a homogeneous Poisson process and hence it holds for every A ⊂ (0, ∞) that N Γ (A) is Poisson distributed with parameter λ 1 (A) (see for example [16,Example 5.1.10]).For the mean measure η of M we get for a product of intervals where we used in the second line that as U i is uniformly distributed on S 1 for every i and We get the following representations for the limiting processes M .
A direct consequence of the point process convergence is the convergence of the joint distribution of a fixed number of upper order statistics.In the Fréchet, Weibull and Gumbel cases the limit function can be described as the joint distribution function of transformations of the points Γ i .
Corollary 2.2.Let G n,(j) be the j-th upper order statistic of the random variables (g n (x i 1 , x i 2 , . . ., x im )), where 1 ≤ i 1 < i 2 < . . .< i m ≤ p.Under the conditions of Theorem 2.1 and for a fixed k ≥ 1 the distribution function as n → ∞.In particular, in the Fréchet, Weibull and Gumbel cases, it holds that By the representation of the limiting point process in the Fréchet, Weibull and Gumbel cases, (2.4) is equal to one of the three distribution functions in the corollary.□ One field, where point processes find many applications, is stochastic geometry.The paper [29], for example, considers order statistics for Poisson k-flats in R d , Poisson polytopes on the unit sphere and random geometric graphs.
Setting k = 1 in Corollary 2.2 we obtain the convergence in distribution of the maximum of the points T i .
Corollary 2.3.Under the conditions of Theorem 2.1 we get Example 2.4 (Interpoint distances).Let x i = (X i1 , . . ., X in ) ⊤ , i = 1, . . ., p be n-dimensional random vectors, whose components (X it ) i,t≥1 are independent and identically distributed random variables with zero mean and variance 1.We are interested in the asymptotic behavior of the largest interpoint distances where ∥•∥ 2 is the Euclidean norm on R n .Figure 1 shows the four largest interpoint distances of 500 points on R 2 with independent standard normal distributed components.Note that three of the largest four distances involve the same outlying vector x i .
We assume that there exists s > ).Additionally, we let (b n ) n and (c n ) n be sequences given by , as n → ∞ (see [19] for details).Therefore, the conditions (A1) and (A2) in Theorem 2.1 hold for m = 2, Record times.In Theorem 2.1 we showed convergence of point processes including time components.Therefore, we can additionally derive results for the record times L(k), k ≥ 1 of the running maxima of the points T i = g n (x i 1 , x i 2 , . . ., x im ) for i = (i 1 , . . ., i m ), which are recursively defined as follows: (c.f.Sections 5.4.3 and 5.4.4 of [16]).To prove point process convergence for the record times we need the convergence in distribution of the sequence of processes (Y n (t), 0 < t ≤ 1) in D(0, 1], the space of right continuous functions on (0, 1] with finite limits existing from the left, defined by where ⌊x⌋ = max{y ∈ Z : y ≤ x} for x ∈ R, towards an extremal process.We call Y = (Y (t)) t>0 an extremal process generated by the distribution function H, if the finitedimensional distributions are given by where [16,Definition 5.4.3]).To define convergence in distribution in D(0, 1] we first need to introduce a metric D on D(0, 1].To this end, let Λ [0,1] be a set of homeomorphisms , λ is continuous and strictly increasing}. Then for f, g ∈ D[0, 1] the Skorohod metric D is defined by (see [5,Section 12]) where f and g are the right continuous extensions of f and g on [0, 1].The space of functions D[0, 1] and therefore D(0, 1] is separable under the Skorohod metric but not complete.However, one can find an equivalent metric, i.e., a metric which generates the same Skorohod topology, under which D[0, 1] is complete (see [ , where (U i ) i is an iid sequence of random vectors uniformly distributed on S 1 and where Then the process Y has the finite dimensional distributions in (2.5) for k ≥ 1, 0 < t i ≤ 1, x i ∈ R and 1 ≤ i ≤ k.Therefore, Y is an extremal process generated by H restricted to the interval (0, 1].For these processes we can show the following invariance principle by application of the continuous mapping theorem (see [5,Theorem 2.7] or [27, p. 152]).Proposition 2.5.Under the conditions of Theorem 2.1 and if Since Y is a nondecreasing function, which is constant between isolated jumps, it has only countably many discontinuity points.Now let (τ n ) n be the sequence of these discontinuity points of Y .Notice that by [16,Theorem 5.4.7] the point process ∞ k=1 ε τ k is a PRM with mean measure ν(a, b) = log(b/a) for 0 < a < b ≤ 1.We are ready to state our result for the point process of record times.
Theorem 2.6.Under the conditions of Theorem 2.1 and if H(•) = exp(−µ(•)) is an extreme value distribution it holds that Based on Theorem 2.6 we can make statements about the time points of the last and second last record at or before p.Then the following statements hold for x, y ∈ (0, 1] as n → ∞. (1) (2) ( 1) is a direct consequence of the definitions of ζ and L. Part (2) follows by as n → ∞ and P(J(x, 1] = 0, J(y, 1] ≤ 1) = P(J(x, 1] = 0)P(J(y, x] ≤ 1) = y + y log(x/y).

Applications
3.1.Relative ranks.In recent years, maximum-type tests based on the convergence in distribution of the maximum of rank statistics of a data set have gained significant interest for statistical testing [18].Let y 1 , . . ., y n be p-dimensional iid random vectors with y t = (X 1t , . . ., X pt ) following a continuous distribution to avoid ties.We write Q it for the rank of X it among X i1 , . . ., X in .Additionally, let R (t) ij be the relative rank of the j-th entry compared to the i-th entry; that is R ij is that we look at the j-th and i-th rows of (Q it ) and find the location of t in the i-th row.Then we choose the value in the j-th row at this location.
Many important statistics are based on (relative) ranks; we consider two classes of such statistics in this section.First, we introduce the so-called simple linear rank statistics, which are of the form where g is a Lipschitz function (also called score function), and (c nt ) with c nt = n −1 f (t/(n + 1)) for a Lipschitz function f and n t=1 c 2 nt > 0 are called the regression constants.An example of such a simple linear rank statistic is Spearman's ρ, which will be discussed in detail in Section 3.1.2.For 1 ≤ i < j ≤ p the relative ranks (R t=1 depend on the vectors x i and x j , where x k = (X k1 , . . ., X kn ) for 1 ≤ k ≤ p.We assume that the vectors x 1 , . . .x p are independent.It is worth mentioning that the ranks (Q it ) remain the same if we transform the marginal distributions to the (say) standard uniform distribution.Thus, the joint distribution of (R t=1 , and thereby the distribution of V ij , does not depend on the distribution of x i or x j .Therefore, we may assume without loss of generality that the random vectors x 1 , . . ., x p are identically distributed.We can write V ij = g n,V (x i , x j ) for a measurable function g n,V : R 2n → R.
Next, we consider rank-type U -statistics of order m < n of the form where the symmetric kernel h is such that U ij depends only on (R An important example of a rank-type U -statistic is Kendall's τ , which will be studied in Section 3.1.1.For more examples we refer to [18] and references therein.As for simple linear rank statistics, we are able to write U ij = g n,U (x i , x j ), where g n,U : R 2n → R is a measurable function and x 1 , . . .x p are iid random vectors.
An interesting property of rank-based statistics is the following pairwise independence.We also note that they are generally not mutually independent.Lemma 3.1 (Lemma C4 in [18]).For 1 ≤ i < j ≤ p, let Ψ ij be a function of the relative ranks {R (t) ij , t = 1, . . ., n}.Assume x 1 , . . ., x p are independent.Then for any (i, j) ̸ = (k, l), i < j, k < l, the random variables Ψ ij and Ψ kl are independent.
As an immediate consequence we obtain pairwise independence of (U ij ) and (V ij ), respectively.
Lemma 3.2.For any (i, j) ̸ = (k, l), i < j, k < l, the random variables V ij and V kl are independent and identically distributed.Moreover, U ij and U kl are independent and identically distributed.
We now want to standardize U ij and V ij .By independence of (X it ), we have where g n = n −1 n t=1 g(t/(n + 1)) is the sample mean of g(Q 11 /(n + 1)), . . ., g(Q 1n /(n + 1)) and c n = n t=1 c nt .Expectation and variance of U ij can also be calculated analytically.We set ) , and define the standardized versions of U ij and V ij by It is well-known that V ij and U ij are asymptotically standard normal and the following lemma provides a complementary large deviation result.
Lemma 3.3.[23, p.404-405] Suppose that the kernel function h is bounded and non-degenerate.Then we have for x = o(n 1/6 ) that Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants (c nt ) t satisfy where C is some constant.Then it holds for x = o(n 1/6 ) For a discussion of (3.1), see [23, p.405].To proceed we need to find a suitable scaling and centering sequences for V ij and U ij , respectively, such that the conditions of Theorem 2.1 are fulfilled.For an iid standard normal sequence (X i ) it is known that where d p = √ 2 log p − log log p+log 4π 2(2 log p) 1/2 ; see Embrechts et al. [16,Example 3.3.29].Since we are dealing with p(p − 1)/2 random variables (V ij ) and (U ij ), respectively, which are asymptotically standard normal, d p = d p(p−1)/2 seems like a reasonable choice for scaling and centering sequences.
Our main result for rank-statistics is the following.
Theorem 3.4.(a) Suppose that the kernel function h is bounded and non-degenerate.If p = exp(o(n 1/3 )), the following point process convergence holds where and (E i ) are iid standard exponential, i.e., N is a Poisson random measure with mean measure µ(x, ∞) = e −x , x ∈ R.
(b) Assume that the score function g is differentiable with bounded Lipschitz constant and that the constants (c nt ) t satisfy (3.1).Then if p = exp(o(n 1/3 )), it holds that Proof.We start with the proof of (3.3) for which we will use Theorem 2.1, as x 1 , . . .x p are iid and g n,V is a measurable function.Therefore, we only have to show that for x ∈ R it holds (1) p(p−1) where x p = x/d p + d p .We will begin with the proof of (1).Since x p ∼ d p = o(n 1/6 ) we get by Lemma 3.3 and by Mill's ratio we have (writing p = p(p−1) log p e − log p+(log log p)/2+(log(4π))/2 e −x = e −x .
Regarding (2), we note that, by Lemma 3.2, V 12 and V 13 are independent.Thus, we get where we used Lemma 3.3 and Mill's ratio in the last two steps.That completes the proof of (3.3).The proof of (3.2) follows by analogous arguments.□ Remark 3.5.Theorem 3.4 is a generalization of Theorems 1 and 2 in [18] who proved under the conditions of Theorem 3.4 and if p = exp(o(n 1/3 )) that As in Theorem 2.6, we additionally conclude point process convergence for the record times of the maxima of V ij and U ij .To this end, we investigate the sequence (max 1≤i<j≤k U ij ) k≥1 .This sequence jumps at time k if one of the random variables U 1k , . . ., U k−1,k is larger than every U ij for 1 ≤ i < j ≤ k − 1. Between these jump (or record) times the sequence is constant.
Let L U be this sequence of record times defined by and let L V be constructed analogously.As in Corollary 2.7, we can draw conclusions on the index of the last and second last jump before or at p. Let ζ U (p) be the number of records among max 1≤i<j≤2 U ij , . . ., max 1≤i<j≤p U ij .Then, as n → ∞, we have for x, y ∈ (0, 1] (1) , where (3) gives information about how much time elapses between the second last and the last jump of (max 1≤i<j≤k U ij ) k≥1 before or at p.

3.1.1.
Kendall's tau.Kendall's tau is an example of a rank-type U-statistic with bounded kernel.For i ̸ = j Kendall's tau τ ij measures the ordinal association between the two sequences (X i1 , . . ., X in ) and (X j1 , . . ., X jn ).It is defined by where the function sign : R → {1, 0, −1} is given by sign(x) = x/|x| for x ̸ = 0 and sign(0) = 0.An interesting property of Kendall's tau is that there exists a representation as a sum of independent random variables.We could not find this representation in the literature.Therefore, we state it here.The proof can be found in Section 4. Proposition 3.7.We have where (D i ) i≥1 are independent random variables with D i being uniformly distributed on the numbers −i/2, −i/2 + 1, . . ., i/2.
Corollary 3.8.Under the conditions of Theorem 3.4 we have 3.1.2.Spearman's rho.An example of a simple linear rank statistic is Spearman's rho, which is a measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function.Recall that Q ik and Q jk are the ranks of X ik and X jk among {X i1 , . . ., X in } and {X j1 , . . ., X jn }, respectively, and write q n = (n + 1)/2 for the average rank.Then for 1 ≤ i ̸ = j ≤ p Spearman's rho is defined by For mean and variance we get Therefore, we obtain the following corollary of Theorem 3.4.
Corollary 3.9.Under the conditions of Theorem 3.4 it holds that The next auxiliary result allows us to transfer the weak convergence of a sequence of point processes to a another sequence of point processes, provided that the maximum distance between their points tends to zero in probability.Proposition 3.10.For arrays (X i,n ) i,n≥1 and (Y i,n ) i,n≥1 of real-valued random variables, let Example 3.11.It turns out that there is an interesting connection between Spearman's rho and Kendall's tau.By [20, p.318] we can write Spearman's rho as where is the major part of Spearman's rho.Therefore, r ij is a U-statistic of degree three with an asymmetric bounded kernel and with We now use Proposition 3.10 and Corollary 3.9 to show that For this purpose we consider the following difference .
By (3.4), (3.6) and (3.5) this expression is asymptotically equal to Since |τ ij | and |r ij | are bounded above by constants, we deduce that which verifies the condition in Proposition 3.10.Since N ρ n d → N by Corollary 3.9, we conclude the desired (3.7).

Sample covariances.
An important field of current research is the estimation and testing of high-dimensional covariance structures.It finds application in genomics, social science and financial economics; see [8] for a detailed review and more references.Under quite general assumptions Xiao et al. [33] investigated the maximum off-diagonal entry of a high-dimensional sample covariance matrix.We impose the same model assumptions (compare [33, p. 2901-2903]), but instead of the maximum we study the point process of off-diagonal entries.
We start by describing the model and spelling out the required assumptions.Let x 1 , . . ., x n be p-dimensional iid random vectors with x i = (X 1i , . . ., X pi ), where E[X ji ] = 0 for 1 ≤ j ≤ p and X j := 1 n n k=1 X jk .Denote Σ = (σ i,j ) 1≤i,j≤p as the covariance matrix of the vector x 1 and assume σ i,i = 1 for 1 ≤ i ≤ p.The empirical covariance matrix (σ i,j ) 1≤i,j≤p is given by σi A fundamental problem in high-dimensional inference is to derive the asymptotic distribution of max 1≤i<j≤p |σ i,j − σ i,j |.Since the σi,j 's might have different variances we need to standardize σi,j by θ i,j = Var(X i1 X j1 ), which can be estimated by We are interested in the points Let I n = {(i, j) : 1 ≤ i < j ≤ p} be an index set.We use the following notations to formulate the required conditions: Now, we can draft the following conditions.(B1) lim inf (B4) For some constants t > 0 and 0 < r ≤ 2, lim sup n→∞ K n (t, r) < ∞, and (B4') log p = o(n r/(4+3r) ), lim sup n→∞ K n (t, r) < ∞ for some constants t > 0 and r > 0.
Example 3.14 (Non-stationary linear processes).As in the previous example, x 1 , . . ., x n are iid random vectors.Now x 1 = (X 11 , . . ., X p1 ) is given by where (ϵ i ) i∈Z is a sequence of iid random variables with mean zero, variance one and finite fourth moment and the sequences (f i,t ) t∈Z satisfy t∈Z f 2 i,t = 1.Let κ 4 be the fourth cumulant of ϵ 0 and for any positive sequence k n such that k n → ∞ as n → ∞ and one of the assumptions (B4) and (B4') or (ii) p k=1 (h n (k)) 2 = O(p 1−δ ) for some δ > 0 and one of the assumptions (B4') or (B4") holds, then we have To illustrate these assumptions we consider the special case x 1 := (ϵ 1 , . . ., ϵ p )A n , where A n ∈ R p×p is a deterministic, symmetric matrix with (A n ) i,j = a ij for 1 ≤ i, j ≤ p.We assume that p t=1 a 2 it = 1 for every 1 ≤ i ≤ p.The covariance matrix of x 1 is given by Cov(x 1 ) = A n A T n with (A n A T n ) ij = p t=1 a it a jt .Observe that the diagonal entries are equal to 1.To satisfy assumption (3.10) we have to assume that the entries except for the diagonal are asymptotically smaller than 1, i.e.We set as a measure of how close the matrices A n are to diagonal matrices.For the point process convergence either (i) or (ii) has to be satisfied for h n .

4.1.
Proofs of the results in Section 2.
Proof of Theorem 2.1.We will follow the lines of the proof of Theorem 2.1 in [12].Since the mean measure η has a density, the limit process M is simple and we can apply Kallenberg's Theorem (see for instance [16, p.233, Theorem 5.2.2] or [22,p.35,Theorem 4.7]).Therefore, it suffices to prove that for any finite union of bounded rectangles ], it holds that (1) lim R) .Without loss of generality we can assume that the A k 's are chosen to be disjoint.First we will show (1).Set T := T (1,2,...,m) = g n (x 1 , x 2 , . . ., x m ).If q = 1 we get Since assumption (A1) implies p m /(m!) P(T ∈ B 1 ) → µ(B 1 ), we obtain the convergence η n (R) → η(R) as n → ∞.The case q ≥ 1 follows by To show (2), we let P n be the probability mass function of the Poisson distribution with mean η n (R).Then we have where the last equality holds by (1).Therefore, we only have to estimate |P(M n (R) = 0) − P n (0)|.For this we employ the Stein-Chen method (see [3] for a discussion).The Stein equation for the Poisson distribution P n with mean η n (R) is given by This equation is solved by the function x(0) = 0 x(j + 1) = j! η n (R) j+1 e ηn(R) (P n ({0}) − P n ({0})P n ({0, . . ., j}) , j = 0, 1, . . .By (4.11) we see that (4.12) Therefore, we only have to estimate the right hand side of (4.12) and to this end we set We will start by proving the continuity of V 1 in the case, where µ(x) = − log(H(x)) and H is the Gumbel distribution.In this case, N has a.s. the following properties for any 0 < s < t < 1 and x ∈ R. Therefore, we only have to show continuity at m ∈ M(S) with these properties.Let (m n ) n be a sequence of point measures in M(S), which converges vaguely to m (m n v → m) as n → ∞ (see [27, p. 140]).Since V 1 (m) is right continuous there exists a right continuous extension on [0, 1], which we denote with V 1 (m).Now choose → m, we can conclude from [27, Proposition 3.12] that there exists a 1 ≤ q < ∞ such that for n large enough We enumerate and designate the q points in the following way ((t small enough so that the δ-spheres of the distinct points of the set {(t i , j i )} are disjoint and in S 1 × [β, ∞).Pick n so large that every δ-sphere contains only one point of m n .Then set i,m and λ n is linearly interpolated elsewhere on [0, 1].For this λ n it holds that which finishes the proof.The Fréchet and the Weibull case follow by similar arguments.□ Proof of Theorem 2.6.We will proceed similarly as in [27, p. 217-218] using the continuous mapping theorem again.Since Y is the restriction to (0, 1] of an extremal process (see [27,Section 4.3]), it is a nondecreasing function, which is constant between isolated jumps.Let D ↑ (0, 1] be the subset of D(0, 1] that contains all functions with this property.Set where xn and x are the right continuous extensions of x n and x on [0, 1].We want to prove the vague convergence where {t  Let q n = (q 1 , . . ., q n ) be a permutation of the set {1, . . ., n}.If i < j and q i > q j , we call the pair (q i , q j ) an inversion of the permutation q n .Since the X 11 , . . ., X 1n are iid, the permutation consisting of the ranks is uniformly distributed on the set of the n! permutations of {1, . . ., n}.
By I n we denote the number of inversions of q n .For s < t, we have sign(X 1s − In view of (4.21), this implies By [24, p. 479] or [25, p. 3] (see also [28]) the moment generating function of I n is 1 − e jt j(1 − e t ) , t ∈ R .

1. 2 .
Notation.Convergence in distribution (resp.probability) is denoted by d → (resp.P →) and unless explicitly stated otherwise all limits are for n → ∞.For sequences (a n ) n and (b n ) n we write a n = O(b n ) if a n /b n ≤ C for some constant C > 0 and every n ∈ N, and a n = o(b n ) if lim n→∞ a n /b n = 0. Additionally, we use the notation a n ∼ b n if lim n→∞ a n /b n = 1 and a n ≲ b n if a n is smaller than or equal to b n up to a positive universal constant.We further write a ∧ b := min{a, b} for a, b ∈ R and for a set A we denote |A| as the number of elements in A.

Figure 1 .
Figure 1.Four largest distances between 500 normal distributed points

Theorem 3 . 6 .
Under the conditions of Theorem 3.4 it holds that 1], the space of point measures on (0, 1], where J is a Poisson random measure with mean measure ν(a, b) = log(b/a) for 0 < a < b ≤ 1.

2 and
] and µ is defined by µ(B k ) = e −r k − e −s k .From the proof of Theorem 2 of[33, p. 2910, 2913-2914]  we know that the conditions of[33, Lemma 6]  are satisfied.Furthermore, from the proof of Lemma 6[33, p. 2909-2910]  we get that for z ∈ R and z n = (4 log p − log log p − log 8π + 2z)1/

(W 1
) n d → N .By Proposition 3.10 it remains to show max 1≤i<j≤p

a
it a jt < 1.

i
} and {t i } are the discontinuity points of x n and x, respectively.Consider an arbitrary continuous function f on (0, 1] with compact support contained in an interval [a, b] with 0 < a < b ≤ 1, and x is continuous at a and b.It suffices to show that lim

3 .
Proof of Proposition 3.10.Our idea is to transfer the convergence of N X n onto N Y n .To this end, it suffices to show (see[22, Theorem 4.2]) that for any continuous function f on R with compact support,f dN Y n − f dN X n P → 0 , n → ∞ .Suppose the compact support of f is contained in [K + γ 0 , ∞) for some γ 0 > 0 and K ∈ R. Since f is uniformly continuous, ω(γ) := sup{|f (x) − f (y)| : x, y ∈ R, |x − y| ≤ γ} tends to zero as γ → 0. We have to show that for any ε > 0i,n ) − f (X i,n ) > ε = 0 .(4.22) On the sets A n,γ = max i=1,...,p Y i,n − X i,n ≤ γ , γ ∈ (0, γ 0 ) , we have [5,Theorem 12.2]).In particular, the Skorohod metric and the equivalent metric generate the same open sets and thus the σ-algebras of the Borel sets, which are generated by these open sets, are the same.Therefore, a sequence of probability measures on D(0, 1] is relatively compact if and only if it is tight [5,Section  13].Hence, for every tight sequence of probability measures on D(0, 1] the convergence of the finite dimensional distributions on all continuity points of the limit distribution implies convergence in distribution[5, Theorem 13.1].For the PRM M