Abstract
We give two asymptotic results for the empirical distance covariance on separable metric spaces without any iid assumption on the samples. In particular, we show the almost sure convergence of the empirical distance covariance for any measure with finite first moments, provided that the samples form a strictly stationary and ergodic process. We further give a result concerning the asymptotic distribution of the empirical distance covariance under the assumption of absolute regularity of the samples and extend these results to certain types of pseudometric spaces. In the process, we derive a general theorem concerning the asymptotic distribution of degenerate V-statistics of order 2 under a strong mixing condition.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In [12], Lyons introduced the concept of distance covariance for separable metric spaces, generalising the work done by Székely et al. [17]. In this very general case, the distance covariance of a measure \(\theta \) (on the product space \({\mathcal {X}} \times {\mathcal {Y}}\) of separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\)) with marginal distributions \(\mu \) on \({\mathcal {X}}\) and \(\nu \) on \({\mathcal {Y}}\) is defined as
for \(z = (x,y), z' = (x', y')\), where
To examine the properties of this object, Lyons made use of the concept of (strong) negative type. A metric space \({\mathcal {X}}\) is said to be of negative type, if there exists a mapping \(\phi : {\mathcal {X}} \rightarrow H\) to a Hilbert space H, such that \(d_{\mathcal {X}}(x,x') = \Vert \phi (x) - \phi (x')\Vert _H^2\) for all \(x,x' \in {\mathcal {X}}\). It is of strong negative type if it is of negative type and \(D(\mu _1 - \mu _2) = 0\) if and only if \(\mu _1 = \mu _2\) for all probability measures \(\mu _1, \mu _2\) with finite first moments. Lyons showed that the distance covariance is nonnegative if \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type, and that the property \(\mathrm {dcov}(\theta ) = 0 \Leftrightarrow \theta = \mu \otimes \nu \) holds if \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of strong negative type.
This means that the distance covariance completely characterises independence of random variables in metric spaces of strong negative type. Estimators for the distance covariance and their asymptotic behaviour are therefore of great interest for tests of independence.
A special case for real-valued random variables follows from choosing the embedding
with \(w_d(s) = \Gamma ((d+1)/2)\pi ^{-(d+1)/2}\Vert s\Vert _2^{-(d+1)}\), which Lyons in [12] refers to as the Fourier embedding. This results in the square of the distance covariance as introduced in [17], i.e.
where \(\varphi _Z\) denotes the characteristic function of a random variable Z, and the vector \((X,Y) \in {\mathbb {R}}^{p+q}\) has distribution \(\theta \).
Two of the main results of [12] are Proposition 2.6 and Theorem 2.7, which describe the asymptotic behaviour of \(\mathrm {dcov}(\theta _n)\), where \(\theta _n\) is the empirical measure from n iid-samples of \(\theta \). Theorem 2.7, under sufficient moment assumptions, describes the asymptotic distribution of the sequence \(n\mathrm {dcov}(\theta _n)\), if \(\theta = \mu \otimes \nu \). Proposition 2.6 gives the almost sure convergence \(\mathrm {dcov}(\theta _n) \xrightarrow []{a.s.} \mathrm {dcov}(\theta )\) for any measure \(\theta \) with finite first moments. However, as noted by Jakobsen in [8], Lyons’ proof of Proposition 2.6 was incorrect and actually required \(\theta \) to have finite 5/3-moments. Lyons later acknowledged this in [13] (iii), showing that Proposition 2.6 as written in [12] is still correct in the case of spaces of negative type, but leaving the question of whether finite first moments are sufficient in the general case of separable metric spaces unanswered. This problem was solved in [9], where the almost sure convergence is shown in the case of iid samples.
In Sect. 2, we show that one can obtain the almost sure convergence of the estimator \(\mathrm {dcov}(\theta _n)\) under finite first moment assumption while dropping the iid assumption regarding the samples which constitute the empirical measure \(\theta _n\). In Theorem 1, we show the almost sure convergence of \(\mathrm {dcov}(\theta _n)\) under assumption of ergodicity and finite first moments. In Theorem 3, we give an asymptotic result similar to Theorem 2.7 in [12], assuming absolute regularity. For this we make use of Theorem 2, which is a general result concerning the asymptotic distribution of degenerate V-statistics under the assumption of \(\alpha \)-mixing data. The definitions of \(\alpha \)-mixing and absolute regularity are recalled at the end of this section.
A further generalisation can be achieved by raising the metrics of the underlying metric spaces to the \(\beta \)-th power. We will denote this with \(\mathrm {dcov}_\beta \). Typically, \(\beta \) is chosen between 0 and 2, where the choice \(\beta = 1\) results in the regular distance covariance. An equivalent way of describing this is to use the regular definitions of distance covariance, but to consider pseudometric spaces of a particular kind instead of metric spaces, namely those which result from raising some metric to the \(\beta \)-th power. (Here, by a pseudometric, we refer to a metric for which the triangle inequality need not hold.) In Sect. 3, we generalise the results for metric spaces deduced in Sect. 2 to pseudometric spaces of this kind.
We now summarise some of the notation used in [12], as well as some basic properties of the distance covariance that will prove useful for our purposes.
Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively. We define \(Z := (X,Y)\) and write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).
If we consider \({\mathcal {X}}\) to be of negative type via an embedding \(\phi \), we denote the Bochner integral \(\int \phi ~\mathrm {d}\mu \) with \(\beta _\phi (\mu )\), and we write \({\hat{\phi }}\) for the centred embedding \(\phi - \beta _\phi (\mu )\). If \({\mathcal {Y}}\) is of negative type via \(\psi \), we define \(\beta _\psi (\nu )\) and \({\hat{\psi }}\) analogously. If both \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via embeddings \(\phi : {\mathcal {X}} \rightarrow H_1\) and \(\psi : {\mathcal {Y}} \rightarrow H_2\), we can consider the embedding
where \(H_1 \otimes H_2\) is the tensor product of the Hilbert spaces \(H_1\) and \(H_2\), equipped with the inner product \(\langle u_1 \otimes v_1, u_2 \otimes v_2\rangle _{H_1 \otimes H_2} := \langle u_1, u_2\rangle _{H_1}\langle v_1, v_2\rangle _{H_2}\).
By Proposition 3.5 in [12], we have that
for all \(z, z' \in {\mathcal {X}} \times {\mathcal {Y}}\), whenever \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via embeddings \(\phi \) and \(\psi \), respectively.
For the remainder of this paper, we will drop the indices of the metrics on \({\mathcal {X}}\) and \({\mathcal {Y}}\) and of the inner products on \(H_1\), \(H_2\) or \(H_1 \otimes H_2\), as it is clear from their arguments which metric or inner product we consider. More precisely, d will denote both a metric on \({\mathcal {X}}\) and a (possibly different) metric on \({\mathcal {Y}}\), and \(\langle ., .\rangle \) can denote one of three (possibly different) inner products on Hilbert spaces \(H_1\), \(H_2\) or \(H_1 \otimes H_2\).
Recall that for two \(\sigma \)-algebras \({\mathcal {A}}\) and \({\mathcal {B}}\) we define the \(\alpha \)- and \(\beta \)-coefficients of \({\mathcal {A}}\) and \({\mathcal {B}}\) as
and
respectively, where the second supremum is taken over all finite partitions \(A_1, \ldots , A_I\) and \(B_1, \ldots , B_J\) such that \(A_i \in {\mathcal {A}}\) and \(B_j \in {\mathcal {B}}\) for all i and j. For a process \((Z_k)_{k \in {\mathbb {N}}}\), we define
and
and we say that the process \((Z_k)_{k \in {\mathbb {N}}}\) is \(\alpha \)-mixing or \(\beta \)-mixing if \(\alpha (n) \xrightarrow [n \rightarrow \infty ]{} 0\) or \(\beta (n) \xrightarrow [n \rightarrow \infty ]{} 0\), respectively. \(\beta \)-mixing is also known as absolute regularity. These definitions are taken from [4], where many properties of \(\alpha \)-mixing and absolutely regular processes are established.
2 Results for Metric Spaces
We now present our results in the case of separable metric spaces. It should be kept in mind that while we consider the usual distance correlation, Theorems 1 and 3 also hold for \(\mathrm {dcov}_\beta \) (under appropriate moment conditions). However, we postpone discussion of this until Sect. 3, so as to avoid confusion by abstraction.
The following lemma is a variant of Theorem 3.5 in [3], where it is formulated for random variables.
Lemma 1
Let \({\mathcal {X}}\) be a metrisable topological space, \((\mu _n)_{n \in {\mathbb {N}}}\) a sequence of measures on \({\mathcal {X}}\) with weak limit \(\mu \) and \(h : {\mathcal {X}} \rightarrow {\mathbb {R}}\) a \(\mu \)-a.s. continuous function which fulfills the following uniform integrability condition:
Furthermore, we require h to be dominated by some \(\mu \)-integrable function g, i.e. \(|h| \le g\) \(\mu \)-a.s. Then \(\int h ~\mathrm {d}\mu _n \rightarrow \int h ~\mathrm {d}\mu \).
Proof
Without loss of generality, suppose that \({\mathcal {X}}\) is a metric space. We can decompose the integral with respect to \(\mu _n\) into a truncated part and a tail part:
The truncated integral converges, because it is the integral of an almost surely continuous and bounded function and \(\mu _n \Rightarrow \mu \), while the uniform integrability condition (2) implies that the tail integral vanishes in the limit \(M, n \rightarrow \infty \). More precisely, we have the inequality
The second summand vanishes by assumption due to (2). For the first summand, note that for any fixed M, the limes superior in n of the integral converges to \(\int _{\{|h| \le M\}} h ~\mathrm {d}\mu \), since \(h\mathbf{1 }_{\{|h| \le M\}}\) is bounded and almost surely continuous. Furthermore, since \(|h\mathbf{1 }_{\{|h| \le M\}}| \le |h| \le g\), we can employ the dominated convergence theorem to obtain
Therefore, the summands in (3) are indeed well-definded. This gives us
Since \(0 \le \liminf _{n \rightarrow \infty } \int _{\{|h|> M\}} |h|~\mathrm {d}\mu _n \le \limsup _{n \rightarrow \infty } \int _{\{|h| > M\}} |h|~\mathrm {d}\mu _n\) for any M, we can use an almost identical argument to obtain
and thus \(\lim _{n\rightarrow \infty } \int h~\mathrm {d}\mu _n\) exists and is equal to \(\int h~\mathrm {d}\mu \).\(\square \)
In proving Theorem 1, we will make use of the following general result, which is a generalisation of Theorem U (ii) from [1].
Lemma 2
Let \((Z_k)_{k \in {\mathbb {N}}}\) be a strictly stationary and ergodic process with values in a separable metrisable topological space \({\mathcal {Z}}\) and marginal distribution \({\mathcal {L}}(Z_1) = \theta \). Let \(h : {\mathcal {Z}}^d \rightarrow {\mathbb {R}}\) be a measurable function, and let \(f : {\mathcal {Z}} \rightarrow {\mathbb {R}}\) be integrable with respect to \(\theta \), so that \(|h| \le f \otimes \cdots \otimes f\), where the product denoted by \(\otimes \) is taken d times and \((f \otimes \cdots \otimes f)(z_1, \ldots , z_d) := \prod _{k = 1}^d f(z_k)\). If h is \(\theta ^d\)-a.e. continuous, then \(V_h(Z_1, \ldots , Z_n) \rightarrow \int h ~\mathrm {d}\theta ^d\) a.s., where \(V_h(Z_1, \ldots , Z_n)\) denotes the V-statistics with kernel h.
Proof
Without loss of generality, suppose that \({\mathcal {Z}}\) is a metric space. Let \(\theta _n := n^{-1} \sum _{k=1}^n \delta _{Z_k}\) denote the empirical measure of \(Z_1, \ldots , Z_n\). We have the representation \(V_h(Z_1, \ldots , Z_n) = \int h ~\mathrm {d}\theta _n^d\). Furthermore, \(\theta _n \Rightarrow \theta \) a.s., since \({\mathcal {Z}}\) is separable, and therefore \(\theta _n^d \Rightarrow \theta ^d\) a.s. by Theorem 2.8 (ii) in [3].
We now wish to employ Lemma 1. Hence, we need to show that the sequence of integrals fulfills the following uniform integrability condition:
We have
and since \(\{f \otimes \cdots \otimes f > M\} \subseteq \bigcup _{i=1}^d M_i\) with \(M_i := \{z \in {\mathcal {Z}}^d ~|~f(z_i) > M^{1/d}\}\), the right-hand side is dominated by
which, due to Birkhoff’s pointwise ergodic theorem, almost surely converges to \(d\left( {\mathbb {E}}_\theta f\right) ^{d-1}{\mathbb {E}}_\theta [\mathbf{1 }_{\{f > M^{1/d}\}} f]\), where \(\mathbf{1 }_A\) denotes the indicator function of a set A. Thus, almost surely,
since f is assumed to be integrable.
Lemma 1 therefore gives us
\(\square \)
Note that the following result does not require any assumptions beyond the separability of the metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) and the ergodicity of the samples generating the empirical measure \(\theta _n\). Thus, Proposition 2.6 in [12] and Theorem 4.4 in [9], both of which require iid samples, are consequences of our result.
Theorem 1
Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively, and \(Z := (X,Y)\). Write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).
If X and Y have finite first moments, i.e. \({\mathbb {E}}d(X,x_0), {\mathbb {E}}d(Y,y_0) < \infty \) for some fixed (but arbitrary) \(z_0 = (x_0, y_0) \in {\mathcal {X}} \times {\mathcal {Y}}\), then
Proof
We follow the idea of the proof of Proposition 2.6 in [12]. Consider the symmetric kernel \({\bar{h}}\), defined as the symmetrisation of h, where
and
As shown in the proof of Proposition 2.6 in [12], we have
Let \(z_0 = (x_0, y_0)\) be an arbitrary but fixed point in \({\mathcal {X}} \times {\mathcal {Y}}\). Since \(a+b \le ab\) for all real \(a, b \ge 2\), we have
for all \(x, x' \in {\mathcal {X}}\). Now, for \(z = (x,y) \in {\mathcal {X}} \times {\mathcal {Y}}\), let \(\varphi _i(z)\) be defined as \(2 \vee d(x,x_0)\) if \(i = 2,3\) and as \(2 \vee d(y,y_0)\) if \(i = 1,6\), and write \(\varphi \) for the maximum over all these \(\varphi _i\). Using (4), this gives us
The functions \(\varphi _i\) are continuous and measurable, since the underlying metric spaces are separable. They are also integrable because X and Y are assumed to have finite first moments. Using Lemma 2 therefore gives us \(V_{{\bar{h}}}(Z_1, \ldots , Z_n) \rightarrow \int {\bar{h}} ~\mathrm {d}\theta ^6\) almost surely, where \(V_{{\bar{h}}}(Z_1, \ldots , Z_n)\) denotes the V-statistics with kernel \({\bar{h}}\). Since the V-statistics with kernel \({\bar{h}}\) are equal to \(\mathrm {dcov}(\theta _n)\), and \(\int {\bar{h}} ~\mathrm {d}\theta ^6 = \mathrm {dcov}(\theta )\) (cf. [12]), this is what we wanted to show.\(\square \)
Theorem 2
Let \({\mathcal {Z}}\) be a \(\sigma \)-compact metrisable topological space, \((Z_k)_{k \in {\mathbb {N}}}\) a strictly stationary sequence of \({\mathcal {Z}}\)-valued random variables with marginal distribution \({\mathcal {L}}(Z_1) = \theta \). Consider a continuous, symmetric, degenerate and positive semidefinite kernel \(h : {\mathcal {Z}}^2 \rightarrow {\mathbb {R}}\) with finite \((2+\varepsilon )\)-moments with respect to \(\theta ^2\) and finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal, i.e. \({\mathbb {E}}|h(Z_1,Z_1)|^{1+\varepsilon /2} < \infty \). Furthermore, let the sequence \((Z_k)_{k \in {\mathbb {N}}}\) satisfy an \(\alpha \)-mixing condition such that \(\alpha (n) = O(n^{-r})\) for some \(r > 1+2\varepsilon ^{-1}\). Then, with \(V = V_h(Z_1, \ldots , Z_n)\) denoting the V-statistics with kernel h,
where \((\lambda _k, \varphi _k)\) are pairs of the nonnegative eigenvalues and matching eigenfunctions of the integral operator
and \((\zeta _k)_{k \in {\mathbb {N}}}\) is a sequence of centred Gaussian random variables whose covariance structure is given by
Proof
We note that the conditions of Theorem 2 in [16] are satisfied by Propositions 1–3 and Assumption 1 ibid., the latter of which is a consequence of \({\mathbb {E}}|h(Z_1,Z_1)|^{1+\varepsilon /2} < \infty \). Hence, we get
for all \(z,z' \in \mathrm {supp}(\theta )\). The \(\varphi _k\) are centred and form an orthonormal basis of \(L^2(\theta )\). Adopting the notation \(V^{(K)}\) for the V-statistics for the truncated kernel \(\sum _{k=1}^K \lambda _k \varphi _k(z)\varphi _k(z')\), we note that \(nV^{(K)} = \sum _{k=1}^K \lambda _k \zeta _{n,k}^2\), where \(\zeta _{n,k} := n^{-1/2} \sum _{t=1}^n \varphi _k(Z_t)\). Using the Cramér-Wold theorem, we will now show that, for any \(K \in {\mathbb {N}}\), \((\zeta _{n,k})_{1 \le k \le K}\) weakly converges to \((\zeta _k)_{1 \le k \le K}\), where the \(\zeta _k\) are centred Gaussian variables with their covariances given in (5).
Let \(c_1, \ldots , c_K\) be real constants and set \(\xi _t := \sum _{k=1}^K c_k\varphi _k(Z_t)\). Then the \(\xi _t\) are centred random variables with \({\mathbb {E}}\xi _t^2 = \sum _{k=1}^K c_k^2\).
Note that, by definition, \(\varphi _k(z) = \lambda _k^{-1}{\mathbb {E}}[h(z,Z_1)\varphi _k(Z_1)]\) and thus
Here, we have used the Cauchy–Schwarz inequality and the fact that the eigenfunctions \(\varphi _k\) form an orthonormal basis of \(L^2(\theta )\). This gives us
by Jensen’s inequality, which implies \(\Vert \varphi _k\Vert _{2+\varepsilon } \le \lambda _k^{-1}\Vert h\Vert _{2+\varepsilon }\). Since our kernel h has finite \((2+\varepsilon )\)-moments by assumption, this property translates to the eigenfunctions \(\varphi _k\). Using Theorem 3.7 and Remark 1.8 in [4] therefore gives us
for all \(1 \le k,l \le K\), where C is a positive constant depending on the corresponding eigenfunctions and -values. From this and the fact that \(\alpha (n) = O(n^{-r})\) with \(r > 1+2\varepsilon ^{-1}\) it follows that, for any k, l, the infinite series \(\sum _{d=1}^\infty \mathrm {Cov}(\varphi _k(Z_1), \varphi _l(Z_{1+d}))\) and \(\lim _n n^{-1}\sum _{d=1}^{n-1} d\mathrm {Cov}(\varphi _k(Z_1), \varphi _l(Z_{1+d}))\) converge, since \(d/n < 1\) for all \(1 \le d < n\). Thus, with \(S_n\) denoting the sum over \(\xi _1, \ldots , \xi _n\), we have that
where we have made use of the stationarity of the process \((Z_k)_{k \in {\mathbb {N}}}\) and the fact that the eigenfunctions \(\varphi _k\) form an orthonormal basis of \(L^2\). If \(\zeta _1, \ldots , \zeta _K\) are Gaussian random variables with their covariance function given by (5), the limit \(\sigma ^2\) is the variance of the linear combination \(\sum _{k=1}^K c_k\zeta _k\).
We now show the uniform integrability of the sequence \((S_n^2\sigma _n^{-2})_{n\in {\mathbb {N}}}\). It suffices to show that \({\mathbb {E}}|S_n\sigma _n^{-1}|^{2+\delta }\) is uniformly bounded in n for some \(\delta > 0\). Since h has finite \((2+\varepsilon )\)-moments, we get
Here, we have made use of (6) and the stationarity of the sequence \((Z_n)\), which ensures that the upper bound \(M(\varepsilon )\) is indeed uniform in n. Since \(\alpha (n) = O(n^{-r})\) with \(r > 1 + 2\varepsilon ^{-1} \) and \(\sigma _n\) has rate of growth \(\theta (\sqrt{n})\), Theorem 2.1 in [15] gives us \({\mathbb {E}}|S_n\sigma _n^{-1}|^{2+\delta } = O(1)\) for some \(\delta > 0\). This implies uniform integrability of \((S_n^2\sigma _n^{-2})_{n \in {\mathbb {N}}}\).
Using Theorem 10.2 from [4] therefore gives us
and so, by the Cramér-Wold theorem, the vectors \((\zeta _{n,k})_{1 \le k\le K}\) converge to Gaussian vectors \((\zeta _k)_{1 \le k \le K}\) with the covariance stucture described in (5) for any \(K \in {\mathbb {N}}\).
Now, applying the continuous mapping theorem gives us
and the summability of the eigenvalues \(\lambda _k\), which is due to the identity \(\sum _{k=1}^\infty \lambda _k = {\mathbb {E}}h(Z_1, Z_1) < \infty \), implies that
We will now show that
We consider the Hilbert space H of all real-valued sequences \((a_k)_{k \in {\mathbb {N}}}\) for which the series \(\sum _k \lambda _k a_k^2\) converges, equipped with the inner product given by \(\langle (a_k), (b_k)\rangle _H := \sum _k \lambda _k a_k b_k\). Then, writing \(T_K(Z_t)\) for the H-valued random variable \((0^K, (\varphi _k(Z_t))_{k > K})\), where \(0^K\) denotes the K-dimensional zero vector, we get
Here, we define the covariance of two H-valued random variables X and Y as the real number \(\mathrm {Cov}(X,Y) := {\mathbb {E}}\langle X,Y\rangle _H - \langle {\mathbb {E}}X, {\mathbb {E}}Y\rangle _H\). We aim to employ a covariance inequality for Hilbert-space valued random variables.
For this, let us first consider the \((2+\varepsilon )\)-moments of \(T_K(Z_1)\). For any \(p >0\), we get
Since h has finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal by assumption, this implies the \((2+\varepsilon )\)-integrability of \(T_K(Z_1)\).
Lemma 2.2 in [7] and the stationarity of the process \((Z_t)_{t \in {\mathbb {N}}}\) therefore give us
and we have shown before that \(n^{-1}\sum _{s,t=1}^n \alpha (|s-t|)^{\varepsilon /(2+\varepsilon )}\) converges to a finite limit c. Furthermore, from \(\Vert T_K(Z_1)\Vert _2^2 = \sum _{k>K} \lambda _k \xrightarrow [K \rightarrow \infty ]{} 0\) and \(\Vert T_K(Z_1)\Vert _{2+\varepsilon } \le \Vert T_1(Z_1)\Vert _{2+\varepsilon }\) (i.e. the sequence \((T_K(Z_1))_{K \in {\mathbb {N}}}\) is uniformly \((2+\varepsilon )\)-integrable) it follows by Vitali’s Theorem that \(T_K(Z_1) \xrightarrow [K \rightarrow \infty ]{(2+\varepsilon )} 0\). Putting all of the above together, we get
By Theorem 3.2 in [3], (7), (8) and (9), the latter of which we have just shown, imply \(nV \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \zeta \).\(\square \)
Lemma 3
If \((X_k)_{k \in {\mathbb {N}}}\) is a strictly stationary sequence of random variables whose marginal distribution \(\mu \) has finite q-moments, then there exists an upper bound \(M \in {\mathbb {R}}\) such that, for any collection of indices \(i_1, \ldots , i_4\),
for any \(p < q\), where f is the function from the proof of Theorem 1.
Proof
First, consider any two indices \(i_1, i_2\). Then, due to (17), we have
where \(x_0\) is some arbitrary point in \({\mathcal {X}}\).
Now, let \(i_1, \ldots , i_4\) be fixed but arbitrary indices. Then, with a similar bound to the one used in Lemma 5,
We use Lemma 1 from [18] for the function \(h(x_1, \ldots , x_4) := d(x_1, x_2)^p d(x_3, x_4)^p\) and the reordered collection \((i_2, i_3, i_1, i_4)\). Their assumptions are satisfied with \(\delta := \frac{q}{p} - 1\), because
due to (10). Thus, Lemma 1 in [18] gives us
where \(\beta (n)\) is the \(\beta \)-mixing coefficient of the sequence \((Z_k)_{k \in {\mathbb {N}}}\). Because \(\beta (n) \le 1\) for all \(n \in {\mathbb {N}}\), (10), (11) and (12) give us
\(\square \)
The following lemma is an adaptation of Lemma 2 in [18] in the sense that our result is implicitly contained in their proof. Another variant of this lemma (for U-statistics) can be found in [2]. Since both of these lemmas are slightly different from our version, we include a proof for the sake of completeness. However, it should be noted that all three proofs apply the same technique.
Lemma 4
Let h be a symmetric and degenerate kernel of order \(c \ge 2\). Here, we understand degeneracy as \({\mathbb {E}}h(z_1, \ldots , z_{c-1}, Z_c) = 0\) almost surely. If, for some \(p > 2\), the p-th moments of \(h(Z_{i_1}, \ldots , Z_{i_c})\) are uniformly bounded and \((Z_n)_{n \in {\mathbb {N}}}\) is strictly stationary and absolutely regular with mixing coefficients \(\beta (n) = O(n^{-r})\), where \(r > cp/(p-2)\), then \({\mathbb {E}}[V^2] = O(n^{-c})\), where \(V = V_h(Z_1, \ldots , Z_n)\) is the V-statistic with kernel h.
Proof
We will follow the basic idea of the proof of Lemma 2 in [18]. First, consider the special case of \(c = 2\). We have
Now due to the degeneracy of our kernel h, we can employ Lemma 1 in [18] to obtain
whenever \((i_1, i_2) \ne (i_3, i_4)\). Here, M is some constant uniform in \(i_1, \ldots , i_4\) and n.
Let us first assume that \(k := |i_2 - i_1| \ge |i_4 - i_3| =: l\). For any fixed value of k, we have at most \(2(n-k)\) possible values for \(i_1\). Furthermore, since \(k \ge l \ge 0\), we have \(k+1\) possible values for l and, for any fixed l, at most \(2(n-l)\) possible values for \(i_3\). Writing
this gives us
The sum converges due to our assumptions on \(\beta (n)\). The same bound can be established for the cases where \(|i_4 - i_3| \ge |i_2 - i_1|\). The only combinations missing are those where \((i_1, i_2) = (i_3, i_4)\), of which there are \(n^2\). We can combine these results to get
which proves the lemma in the case \(c = 2\).
The proof for arbitrary c follows the same idea. We then obtain an upper bound of
which again is \(O(n^c)\) due to our bounds on \(\beta (n)\).\(\square \)
Theorem 3
Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively, and \(Z := (X,Y)\). Write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).
Suppose that \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via mappings \(\phi \) and \(\psi \), respectively, and that \({\mathcal {X}} \times {\mathcal {Y}}\) is \(\sigma \)-compact. If X and Y are independent, have finite \((1+\varepsilon )\)-moments for some \(\varepsilon > 0\), and the sequence \((Z_k)_{k \in {\mathbb {N}}}\) is absolutely regular with mixing coefficients \(\beta (n) = O(n^{-r})\) for some \(r > 6(1 + 2\varepsilon ^{-1})\), then
where the \(\zeta _k\) are centred Gaussian random variables whose covariance function given in (5) is determined by the dependence structure of the sequence \((Z_k)_{k \in {\mathbb {N}}}\), and the parameters \(\lambda _k > 0\) are determined by the underlying distribution \(\theta \).
Proof
Consider the identity \(\mathrm {dcov}(\theta _n) = V_{{\bar{h}}}(Z_1, \ldots , Z_n) =: V\) as given in Theorem 1. We will employ Hoeffding decomposition, i.e.
where
for \(0 \le c \le 6\). It can be readily seen that under the assumption of independence of X and Y, \({\bar{h}}_1 = 0\) almost surely, and so the Hoeffding decomposition reduces to
We will show that the kernel \({\bar{h}}_2\) satisfies the conditions of Theorem 2 and that, under our assumptions,
Application of some algebra shows that \({\bar{h}}_2 = \delta _\theta /15\), proceeding in the following way:
It can be easily checked that under independence of X and Y, \({\bar{h}}\) is a degenerate kernel, since integrating over all but one argument of f (with respect to either of the marginal distributions of \(\theta \)) yields a function which is 0 almost surely. Therefore,
where \({\mathfrak {S}}_6\) is the symmetric group of all permutations operating on \(\{1, \ldots , 6\}\). Notice that the summands are equal to \(\delta _\theta (z_{\sigma (1)}, z_{\sigma (2)})\) if \(\sigma (1), \sigma (2) \in \{1,2\}\). This follows directly from the definitions of \(d_\mu \) and \(d_\nu \). Moreover, 1 and 2 are the only indices appearing in both \(f(X_1, \ldots , X_4)\) and \(f(Y_1, Y_2, Y_5, Y_6)\), so any permutation \(\sigma \) with \(\sigma (1), \sigma (2) \notin \{1,2\}\) results in taking the integral of f over all or all but one argument, either with respect to \(\mu \) or with respect to \(\nu \). But we have seen before that these integrals are 0 almost surely, and so, due to the independence of X and Y, the same is true for the integral of h with respect to \(\theta \).
There are \(2\cdot 4!\) permutations of this kind, and so
We can therefore consider the object \(\delta _\theta \) instead of \({\bar{h}}_2\).
By identity (1) we have, for any real constants \(c_1, \ldots , c_m\) and \(z_1, \ldots , z_m \in {\mathcal {X}} \times {\mathcal {Y}}\),
so our kernel is positive semidefinite. It is furthermore continuous. By Lemma 5, \(\delta _\theta \) has finite \((2+\varepsilon )\)-moments with respect to \(\theta ^2\) and finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal. Since \(2\alpha (n) \le \beta (n)\) (cf. [4]), we have
by Theorem 2.
We will now prove (14). For this, we will first note that under our assumptions, the kernel \({\bar{h}}\) has finite \((2+\varepsilon )\)-moments with respect to \(\theta ^6\). This can be seen with a similar approach as in the proof of Lemma 5. Furthermore, Lemma 3 together with the independence of X and Y gives us the existence of an upper bound \(M \in {\mathbb {R}}\) such that
for any collection of indices \(1 \le i_1, \ldots , i_6 \le n\).
Employing Lemma 4 therefore gives us
for all \(c \ge 2\). Now, together with (13), we have
This implies (14), which together with (15) proves the Theorem.\(\square \)
Using these two results, we can generalise Corallary 2.8 from [12].
Corollary 1
Under the assumptions of Theorem 3, we have
with \({\mathbb {E}}Q = 1\). If \(\mathrm {dcov}(\theta ) > 0\), i.e. \(\theta \) is not the product measure of its marginal distributions \(\mu \) and \(\nu \), the left-hand side converges to \(\infty \) almost surely.
Proof
We have the identity \(D(\mu _n) = n^{-2} \sum _{k,l=1}^n d(X_k, X_l)\), and thus by Lemma 2\(D(\mu _n) \xrightarrow {a.s.} D(\mu )\). The same holds for \(D(\nu _n)\), and thus the convergence in distribution follows with the Slutsky theorem. Since \(D(\mu )D(\nu ) = {\mathbb {E}}\delta _\theta (Z_1, Z_1) = \sum _{k=1}^\infty \lambda _k\), the expected value of the limiting distribution is equal to 1.
If \(\mathrm {dcov}(\theta ) > 0\), the almost sure convergence follows by Theorem 1.\(\square \)
Remark 1
It would be desirable to achieve a result similar to Theorem 3 under the assumption of just \(\alpha \)-mixing. For example, Theorem 3.2 in [5] gives such a result under the supposition that X and Y are real-valued random vectors.
For our more general setting of (pseudo-)metric spaces, one only needs to show that (14) still holds in the case of \(\alpha \)-mixing, since Theorem 2 does not require absolute regularity. We consider it likely that this can indeed be derived from the amicable properties of the distance covariance.
3 Generalisation to Pseudometric Spaces
Let \(({\mathcal {X}}, d)\) be a metric space and consider \(d^\beta \) for \(\beta \in (0,2]\). Then \(d^\beta \) is a pseudometric, i.e. the triangle inequality does not necessarily hold for \(d^\beta \). We will develop parts of the theory of [12] for pseudometric spaces of this particular kind, which we will refer to as \(\beta \)-pseudometric spaces. This is of interest if one considers \(\mathrm {dcov}_\beta \), a generalisation of the usual distance covariance, which results from using the \(\beta \)-th power of the metrics on \({\mathcal {X}}\) and \({\mathcal {Y}}\) for the definition of \(d_\mu \) and \(d_\nu \). That is, \(\mathrm {dcov}_\beta \) with respect to \(({\mathcal {X}}, d)\) and \(({\mathcal {Y}}, d)\) is equivalent to the regular distance covariance with respect to the \(\beta \)-pseudometric spaces \(({\mathcal {X}}, d^\beta )\) and \(({\mathcal {Y}}, d^\beta )\). Obviously, for any constant \(\beta > 0\), \(d^\beta \) induces the same topology (and thus, the same Borel \(\sigma \)-algebra) as the original metric d. This means that any \(\beta \)-pseudometric space is a metrisable topological space.
This approach of viewing \(\mathrm {dcov}_\beta \) not as a different object on the same space, but as the same object on a different space might not be very intuitive at first. However, since the concept of (strong) negative type does not require a metric space, this characterisation allows us to still use the relation between (strong) negative type of the underlying space and the distance covariance. This leads to the question of whether \(({\mathcal {X}}, d^\beta )\) is of (strong) negative type, given the original metric space \(({\mathcal {X}}, d)\), for which some criteria are known—see for example Corollary 3 or, more generally, [11] and [14].
Note that if \(\beta \in (0, 1]\), \(d^\beta \) is indeed still a metric, and we can rely on the already developed theory for separable metric spaces. Thus, we get the following result.
Corollary 2
Let \(\beta \in (0,1]\). Theorems 1 and 3 still hold for \(\mathrm {dcov}_\beta \) if we replace the finite first moment condition of Theorem 1 and the finite \((1+\varepsilon )\)-moment condition of Theorem 3 by finite \(\beta \)- and \((1+\varepsilon )\beta \)-moment assumptions, respectively.
Proof
Theorem 1 follows immediately. For Theorem 2, we note that \(d^\beta \) induces the same Borel \(\sigma \)-algebra as d. Furthermore, by Remark 3.19 in [12], the resulting metric spaces are still of negative type.\(\square \)
For \(\beta \in (1,2)\), while we cannot rely on the triangle inequality, the Jensen inequality gives us a result which we will call the weak triangle inequality. Specifically, for any \(\beta \in [1,2]\):
for all \(x, x', x_0 \in {\mathcal {X}}\). This can be further bounded by replacing the factor \(2^{\beta -1}\) by 2.
Like in the metric case, we say that a probability measure \(\mu \) has finite first moment if there exists an element \(x_0 \in {\mathcal {X}}\) such that \(\int d(x,x_0) ~\mathrm {d}\mu (x) < \infty \). Again, the choice of \(x_0\) is arbitrary due to the weak triangle inequality. Thus, we can define the objects \(a_\mu \), \(D(\mu )\) and \(d_\mu \) as in the metric case.
Lemma 5
If \(\mu \) has finite \(\beta p\)-moment, then \(d_\mu ^{(\beta )}\) has finite 2p-moment with respect to \(\mu ^2\) and finite p-moment on the diagonal for any \(p \ge 1\).
Proof
We take inspiration from the proof of Proposition 2.6 in [12]. Define the functions
and
We have
and, using the weak triangle inequality, \(|f_+| \le 4d^\beta (x_2,x_3)\). Similarly, we have
Again, \(|f_-| \le 4d^\beta (x_2, x_3)\), and thus \(|f(x_1, \ldots , x_4)| \le 4d^\beta (x_2, x_3)\). In the same way, one shows that the absolute value of \(f(x_1, \ldots , x_4)\) can also be bounded by \(4d^\beta (x_1, x_4)\). Therefore \(|h(x_1, \ldots , x_6)| \le 16 d^\beta (x_2, x_3)d^\beta (x_1, x_4)\), and so
Furthermore, we have
i.e. \(d_\mu ^{(\beta )}\) has finite p-moment on the diagonal.\(\square \)
We can now define \(\delta _\theta \) and \(\mathrm {dcov}(\theta )\) analogously to the metric case. Since the relevant proofs do not make use of the triangle inequality, it follows from [12] that for pseudometric spaces of strong negative type \(\theta = \mu \otimes \nu \) if and only if \(\mathrm {dcov}(\theta ) = 0\). This, together with the next Lemma, gives a very easy proof of Theorem 4.2 in [6].
Lemma 6
If \((H, \Vert .\Vert )\) is a separable Hilbert space, then \((H, \Vert .\Vert ^\beta )\) is of negative type for all \(\beta \in (0,2]\), and of strong negative type for all \(\beta \in (0,2)\).
Proof
Without loss of generality, assume H to be equal to \(L^2[0,1]\). By Theorem 5 in [14], for any \(\beta \in (0,2]\), there exists an embedding \(\Phi : H \rightarrow L^2[0,1]\) with \(\Vert x-x'\Vert _2^{\beta /2} = \Vert \Phi (x) - \Phi (x')\Vert _2\) for all \(x, x' \in H\), which implies that \((H, \Vert .\Vert ^\beta )\) is of negative type. By Remark 3.19 in [12] (which, along with all its auxiliary results, also holds for pseudometric spaces), the space \((H, \Vert .\Vert ^\beta )\) therefore has strong negative type for all \(\beta \in (0,2)\).\(\square \)
We can use this Lemma to adapt Corollary 5.9 from [11].
Corollary 3
Let \(({\mathcal {X}}, d)\) be a metric space. If there exists an isometric embedding from \({\mathcal {X}}\) into a separable Hilbert space H, then \(({\mathcal {X}}, d^\beta )\) is of negative type for all \(\beta \in (0,2]\) and of strong negative type for all \(\beta \in (0,2)\).
Proof
Fix \(\beta \in (0,2]\), and let \(\varphi : {\mathcal {X}} \rightarrow L^2[0,1]\) be an isometric embedding. By Lemma 6, \((H, \Vert .\Vert _H^\beta )\) is of negative type via some embedding \(\Phi \), which implies that \(({\mathcal {X}}, d^\beta )\) is of negative type via \((\Phi \circ \varphi )\). If \(\beta < 2\), then \((H, \Vert .\Vert _H^\beta )\) is of strong negative type, and so, for any two probability measures \(\mu _1, \mu _2\) on \({\mathcal {X}}\), we have that
where \(\mu _i^\varphi \) denotes the pushforward of \(\mu _i\) via \(\varphi \). We can extend the last integral to the entire space H, because the pushforward measures vanish on \(\varphi ({\mathcal {X}})^C\). Using the strong negative type of \((H, \Vert .\Vert _H^\beta )\), this gives us \(\mu _1^\varphi = \mu _2^\varphi \), which implies \(\mu _1 = \mu _2\), since \(\varphi \) is injective.\(\square \)
Corollary 4
Let \(\beta \in (1,2)\). Then, if we replace the finite first moment condition of Theorem 1 by a finite \(\beta \)-moment assumption, Theorem 1 still holds for \(\mathrm {dcov}_\beta \). If we furthermore assume \({\mathcal {X}}\) and \({\mathcal {Y}}\) to be isometrically embeddable into separable Hilbert spaces, and replace the finite \((1+\varepsilon )\)-condition with a finite \((1+\varepsilon )\beta \)-moment assumption, then Theorem 3 still holds for \(\mathrm {dcov}_\beta \).
Proof
We first consider Theorem 1. We can replace (4) by
as we have done in the proof of Lemma 5. This changes the original bound only by constant, which does not affect the remainder of the proof.
If \({\mathcal {X}}\) and \({\mathcal {Y}}\) are isometrically embeddable into separable Hilbert spaces, then by Corollary 3 the spaces resulting from raising their metrics to the power \(\beta \) are of negative type. By Lemma 5, the proof of Theorem 3 still holds for \(\beta \)-pseudometric spaces. We can therefore apply Theorem 3 to the spaces \(({\mathcal {X}}, d^\beta )\) and \(({\mathcal {Y}}, d^\beta )\).\(\square \)
4 Further Work
The limiting distribution established in Theorem 3 is dependent both on the marginal distribution \(\theta \) (through the eigenvaleus \(\lambda _k\)) and the dependence structure of the process \((Z_k)_{k \in {\mathbb {N}}}\) (through the Gaussian process \((\zeta _k)_{k \in {\mathbb {N}}}\)). Thus, one cannot directly use this result to construct a test of independence, since the critical values of this test would in general be unknown.
Such a dependence of the limiting distribution on unknown parameters is not unusual—indeed, in the iid case, there are many well-established ways to approximate the asymptotic distribution of a random variable, even if it may depend on unknown parameters. The authors of [17], for instance, propose a permutation test to approximate the asymptotic distribution of the distance covariance for real-valued iid data.
In the case of dependent data, such as we have examined in this paper, one cannot employ methods that would alter the dependence structure of the original sequence \((Z_k)_{k \in {\mathbb {N}}}\), since this in turn would result in a different Gaussian process \((\zeta _k)_{k \in {\mathbb {N}}}\) and thus a different limiting distribution. A feasible approach might be a type of block bootstrap (cf. [10], sections 2.5–2.7), where the resampling occurs from a collection of blocks, each consisting of a certain number of consecutive observations, thus leaving the dependence structure of the original process unchanged. We are currently working on proving the consistency of such a block bootstrap for the distance covariance.
References
Aaronson, J., Burton, R., Dehling, H., Gilat, D., Hill, T., Weiss, B.: Strong laws for l- and u-statistics. Trans. Am. Math. Soc. 348(7), 2845–2866 (1996)
Arcones, M.A.: The law of large numbers for u-statistics under absolute regularity. Electron. Commun. Probab. 3, 13–19 (1998)
Billingsley, P.: Convergence of Probability Measures, 2nd edn. Wiley, New York (1999)
Bradley, R.C.: Introduction to Strong Mixing Conditions, vol. 1. Kendrick Press, Heber City (2007)
Davis, R.A., Matsui, M., Mikosch, T., Wan, P.: Applications of distance correlation to time series. Bernoulli 24(4A), 3087–3116 (2018)
Dehling, H., Matsui, M., Mikosch, T., Samorodnitsky, G., Tafakori, L.: Distance covariance for discretized stochastic processes. Bernoulli 26(4), 2758–2789 (2020)
Dehling, H., Philipp, W.: Almost sure invariance principles for weakly dependent vector-valued random variables. Ann. Probab. 10(3), 689–701 (1982)
Jakobsen, M.E.: Distance covariance in metric spaces: non-parametric independence testing in metric spaces. Master’s thesis, University of Copenhagen, 2017. arXiv:1706.03490
Janson, S.: On distance covariance in metric and Hilbert spaces. arXiv:1910.13358
Lahiri, S.N.: Resampling Methods for Dependent Data, 1st edn. Springer, New York (2003)
Li, H., Weston, A.: Strict p-negative type of a metric space. Positivity 14(3), 529–545 (2010)
Lyons, R.: Distance covariance in metric spaces. Ann. Probab. 41(5), 3284–3305 (2013)
Lyons, R.: Errata to distance covariance in metric spaces. Ann. Probab. 46(4), 2400–2405 (2018)
Schoenberg, I.J.: Metric spaces and positive definite functions. Trans. Am. Math. Soc. 44, 522–536 (1938)
Sotres, D.A., Ghosh, M.: Strong convergence of linear rank statistics for mixing processes. Indian J. Stat. 39, 1–11 (1977)
Sun, H.: Mercer theorem for rkhs on noncompact sets. J. Complex. 214, 337–349 (2005)
Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35(6), 2769–2794 (2007)
Yoshihara, K.: Limiting behavior of u-statistics for stationary, absolutely regular processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 35, 237–252 (1976)
Acknowledgements
The author was supported by the German Research Council (DFG) via Research Training Group RTG 2131 (High dimensional phenomena in probability—fluctuations and discontinuity). The author would like to express his gratitude to the editor and the referee for their insightful comments which proved most helpful in improving this article.
Funding
Open Access funding enabled and organized by Projekt DEAL. The author was supported by the German Research Council (DFG) via Research Training Group RTG 2131 (High dimensional phenomena in probability—fluctuations and discontinuity)
Author information
Authors and Affiliations
Contributions
No additional authors were involved in the creation of this article.
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kroll, M. Asymptotic Behaviour of the Empirical Distance Covariance for Dependent Data. J Theor Probab 35, 1226–1246 (2022). https://doi.org/10.1007/s10959-021-01073-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-021-01073-w