1 Introduction

In [12], Lyons introduced the concept of distance covariance for separable metric spaces, generalising the work done by Székely et al. [17]. In this very general case, the distance covariance of a measure \(\theta \) (on the product space \({\mathcal {X}} \times {\mathcal {Y}}\) of separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\)) with marginal distributions \(\mu \) on \({\mathcal {X}}\) and \(\nu \) on \({\mathcal {Y}}\) is defined as

$$\begin{aligned} \mathrm {dcov}(\theta ) := \int \delta _\theta (z,z') ~\mathrm {d}\theta ^2(z,z') \end{aligned}$$

for \(z = (x,y), z' = (x', y')\), where

$$\begin{aligned}&\delta _\theta (z,z') := d_\mu (x,x')d_\nu (y,y'), \\&\quad d_\mu (x,x') := d_{\mathcal {X}}(x,x') - a_\mu (x) - a_\mu (x') + D(\mu ), \\&\quad a_\mu (x) := \int d_{\mathcal {X}}(x,x') ~\mathrm {d}\mu (x'), \\&\quad D(\mu ) := \int d_{\mathcal {X}}(x,x') ~\mathrm {d}\mu ^2(x,x'). \end{aligned}$$

To examine the properties of this object, Lyons made use of the concept of (strong) negative type. A metric space \({\mathcal {X}}\) is said to be of negative type, if there exists a mapping \(\phi : {\mathcal {X}} \rightarrow H\) to a Hilbert space H, such that \(d_{\mathcal {X}}(x,x') = \Vert \phi (x) - \phi (x')\Vert _H^2\) for all \(x,x' \in {\mathcal {X}}\). It is of strong negative type if it is of negative type and \(D(\mu _1 - \mu _2) = 0\) if and only if \(\mu _1 = \mu _2\) for all probability measures \(\mu _1, \mu _2\) with finite first moments. Lyons showed that the distance covariance is nonnegative if \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type, and that the property \(\mathrm {dcov}(\theta ) = 0 \Leftrightarrow \theta = \mu \otimes \nu \) holds if \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of strong negative type.

This means that the distance covariance completely characterises independence of random variables in metric spaces of strong negative type. Estimators for the distance covariance and their asymptotic behaviour are therefore of great interest for tests of independence.

A special case for real-valued random variables follows from choosing the embedding

$$\begin{aligned}&\phi : {\mathbb {R}}^d \rightarrow L^2(w_d) := \left\{ f : {\mathbb {R}}^d \rightarrow {\mathbb {C}} ~\Big |~ \int |f|^2 w_d ~\mathrm {d}\lambda ^d < \infty \right\} \\&\quad x \mapsto \frac{1}{\sqrt{2}} (1 - \exp (i\langle .,x\rangle )) \end{aligned}$$

with \(w_d(s) = \Gamma ((d+1)/2)\pi ^{-(d+1)/2}\Vert s\Vert _2^{-(d+1)}\), which Lyons in [12] refers to as the Fourier embedding. This results in the square of the distance covariance as introduced in [17], i.e.

$$\begin{aligned} \mathrm {dcov}(\theta ) = \int |\varphi _{X,Y}(s,t) - \varphi _X(s)\varphi _Y(t)|^2 w_p(s)w_q(t) ~\mathrm {d}(s,t), \end{aligned}$$

where \(\varphi _Z\) denotes the characteristic function of a random variable Z, and the vector \((X,Y) \in {\mathbb {R}}^{p+q}\) has distribution \(\theta \).

Two of the main results of [12] are Proposition 2.6 and Theorem 2.7, which describe the asymptotic behaviour of \(\mathrm {dcov}(\theta _n)\), where \(\theta _n\) is the empirical measure from n iid-samples of \(\theta \). Theorem 2.7, under sufficient moment assumptions, describes the asymptotic distribution of the sequence \(n\mathrm {dcov}(\theta _n)\), if \(\theta = \mu \otimes \nu \). Proposition 2.6 gives the almost sure convergence \(\mathrm {dcov}(\theta _n) \xrightarrow []{a.s.} \mathrm {dcov}(\theta )\) for any measure \(\theta \) with finite first moments. However, as noted by Jakobsen in [8], Lyons’ proof of Proposition 2.6 was incorrect and actually required \(\theta \) to have finite 5/3-moments. Lyons later acknowledged this in [13] (iii), showing that Proposition 2.6 as written in [12] is still correct in the case of spaces of negative type, but leaving the question of whether finite first moments are sufficient in the general case of separable metric spaces unanswered. This problem was solved in [9], where the almost sure convergence is shown in the case of iid samples.

In Sect. 2, we show that one can obtain the almost sure convergence of the estimator \(\mathrm {dcov}(\theta _n)\) under finite first moment assumption while dropping the iid assumption regarding the samples which constitute the empirical measure \(\theta _n\). In Theorem 1, we show the almost sure convergence of \(\mathrm {dcov}(\theta _n)\) under assumption of ergodicity and finite first moments. In Theorem 3, we give an asymptotic result similar to Theorem 2.7 in [12], assuming absolute regularity. For this we make use of Theorem 2, which is a general result concerning the asymptotic distribution of degenerate V-statistics under the assumption of \(\alpha \)-mixing data. The definitions of \(\alpha \)-mixing and absolute regularity are recalled at the end of this section.

A further generalisation can be achieved by raising the metrics of the underlying metric spaces to the \(\beta \)-th power. We will denote this with \(\mathrm {dcov}_\beta \). Typically, \(\beta \) is chosen between 0 and 2, where the choice \(\beta = 1\) results in the regular distance covariance. An equivalent way of describing this is to use the regular definitions of distance covariance, but to consider pseudometric spaces of a particular kind instead of metric spaces, namely those which result from raising some metric to the \(\beta \)-th power. (Here, by a pseudometric, we refer to a metric for which the triangle inequality need not hold.) In Sect. 3, we generalise the results for metric spaces deduced in Sect. 2 to pseudometric spaces of this kind.

We now summarise some of the notation used in [12], as well as some basic properties of the distance covariance that will prove useful for our purposes.

Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively. We define \(Z := (X,Y)\) and write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).

If we consider \({\mathcal {X}}\) to be of negative type via an embedding \(\phi \), we denote the Bochner integral \(\int \phi ~\mathrm {d}\mu \) with \(\beta _\phi (\mu )\), and we write \({\hat{\phi }}\) for the centred embedding \(\phi - \beta _\phi (\mu )\). If \({\mathcal {Y}}\) is of negative type via \(\psi \), we define \(\beta _\psi (\nu )\) and \({\hat{\psi }}\) analogously. If both \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via embeddings \(\phi : {\mathcal {X}} \rightarrow H_1\) and \(\psi : {\mathcal {Y}} \rightarrow H_2\), we can consider the embedding

$$\begin{aligned}&\phi \otimes \psi : {\mathcal {X}} \times {\mathcal {Y}} \rightarrow H_1 \otimes H_2 \\&\quad (x,y) \mapsto \phi (x) \otimes \psi (y), \end{aligned}$$

where \(H_1 \otimes H_2\) is the tensor product of the Hilbert spaces \(H_1\) and \(H_2\), equipped with the inner product \(\langle u_1 \otimes v_1, u_2 \otimes v_2\rangle _{H_1 \otimes H_2} := \langle u_1, u_2\rangle _{H_1}\langle v_1, v_2\rangle _{H_2}\).

By Proposition 3.5 in [12], we have that

$$\begin{aligned} \delta _\theta (z,z') = 4\langle ({\hat{\phi }} \otimes {\hat{\psi }})(z),({\hat{\phi }} \otimes {\hat{\psi }})(z')\rangle _{H_1 \otimes H_2} \end{aligned}$$
(1)

for all \(z, z' \in {\mathcal {X}} \times {\mathcal {Y}}\), whenever \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via embeddings \(\phi \) and \(\psi \), respectively.

For the remainder of this paper, we will drop the indices of the metrics on \({\mathcal {X}}\) and \({\mathcal {Y}}\) and of the inner products on \(H_1\), \(H_2\) or \(H_1 \otimes H_2\), as it is clear from their arguments which metric or inner product we consider. More precisely, d will denote both a metric on \({\mathcal {X}}\) and a (possibly different) metric on \({\mathcal {Y}}\), and \(\langle ., .\rangle \) can denote one of three (possibly different) inner products on Hilbert spaces \(H_1\), \(H_2\) or \(H_1 \otimes H_2\).

Recall that for two \(\sigma \)-algebras \({\mathcal {A}}\) and \({\mathcal {B}}\) we define the \(\alpha \)- and \(\beta \)-coefficients of \({\mathcal {A}}\) and \({\mathcal {B}}\) as

$$\begin{aligned} \alpha ({\mathcal {A}}, {\mathcal {B}}) := \sup _{A \in {\mathcal {A}}, B \in {\mathcal {B}}} \left| {\mathbb {P}}(A \cap B) - {\mathbb {P}}(A){\mathbb {P}}(B)\right| \end{aligned}$$

and

$$\begin{aligned} \beta ({\mathcal {A}}, {\mathcal {B}}) := \sup \frac{1}{2} \sum _{i=1}^I\sum _{j=1}^J |{\mathbb {P}}(A_i \cap B_j) - {\mathbb {P}}(A_i){\mathbb {P}}(B_j)|, \end{aligned}$$

respectively, where the second supremum is taken over all finite partitions \(A_1, \ldots , A_I\) and \(B_1, \ldots , B_J\) such that \(A_i \in {\mathcal {A}}\) and \(B_j \in {\mathcal {B}}\) for all i and j. For a process \((Z_k)_{k \in {\mathbb {N}}}\), we define

$$\begin{aligned} \alpha (n) := \sup _{l \in {\mathbb {N}}} \alpha (\sigma (Z_1, \ldots , Z_l), \sigma (Z_{l+n}, Z_{l+n+1}, \ldots )) \end{aligned}$$

and

$$\begin{aligned} \beta (n) := \sup _{l \in {\mathbb {N}}} \beta (\sigma (Z_1, \ldots , Z_l), \sigma (Z_{l+n}, Z_{l+n+1}, \ldots )), \end{aligned}$$

and we say that the process \((Z_k)_{k \in {\mathbb {N}}}\) is \(\alpha \)-mixing or \(\beta \)-mixing if \(\alpha (n) \xrightarrow [n \rightarrow \infty ]{} 0\) or \(\beta (n) \xrightarrow [n \rightarrow \infty ]{} 0\), respectively. \(\beta \)-mixing is also known as absolute regularity. These definitions are taken from [4], where many properties of \(\alpha \)-mixing and absolutely regular processes are established.

2 Results for Metric Spaces

We now present our results in the case of separable metric spaces. It should be kept in mind that while we consider the usual distance correlation, Theorems 1 and 3 also hold for \(\mathrm {dcov}_\beta \) (under appropriate moment conditions). However, we postpone discussion of this until Sect. 3, so as to avoid confusion by abstraction.

The following lemma is a variant of Theorem 3.5 in [3], where it is formulated for random variables.

Lemma 1

Let \({\mathcal {X}}\) be a metrisable topological space, \((\mu _n)_{n \in {\mathbb {N}}}\) a sequence of measures on \({\mathcal {X}}\) with weak limit \(\mu \) and \(h : {\mathcal {X}} \rightarrow {\mathbb {R}}\) a \(\mu \)-a.s. continuous function which fulfills the following uniform integrability condition:

$$\begin{aligned} \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| > M\}} |h| ~\mathrm {d}\mu _n = 0. \end{aligned}$$
(2)

Furthermore, we require h to be dominated by some \(\mu \)-integrable function g, i.e. \(|h| \le g\) \(\mu \)-a.s. Then \(\int h ~\mathrm {d}\mu _n \rightarrow \int h ~\mathrm {d}\mu \).

Proof

Without loss of generality, suppose that \({\mathcal {X}}\) is a metric space. We can decompose the integral with respect to \(\mu _n\) into a truncated part and a tail part:

$$\begin{aligned} \int h ~\mathrm {d}\mu _n&= \int _{\{|h| \le M\}} h ~\mathrm {d}\mu _n + \int _{\{|h| > M\}} h ~\mathrm {d}\mu _n. \end{aligned}$$

The truncated integral converges, because it is the integral of an almost surely continuous and bounded function and \(\mu _n \Rightarrow \mu \), while the uniform integrability condition (2) implies that the tail integral vanishes in the limit \(M, n \rightarrow \infty \). More precisely, we have the inequality

$$\begin{aligned} \begin{aligned} \limsup _{n \rightarrow \infty } \int h ~\mathrm {d}\mu _n&\le \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| \le M\}} h ~\mathrm {d}\mu _n \\&\quad + \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| > M\}} h ~\mathrm {d}\mu _n. \end{aligned} \end{aligned}$$
(3)

The second summand vanishes by assumption due to (2). For the first summand, note that for any fixed M, the limes superior in n of the integral converges to \(\int _{\{|h| \le M\}} h ~\mathrm {d}\mu \), since \(h\mathbf{1 }_{\{|h| \le M\}}\) is bounded and almost surely continuous. Furthermore, since \(|h\mathbf{1 }_{\{|h| \le M\}}| \le |h| \le g\), we can employ the dominated convergence theorem to obtain

$$\begin{aligned} \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| \le M\}} h ~\mathrm {d}\mu _n = \int h ~\mathrm {d}\mu . \end{aligned}$$

Therefore, the summands in (3) are indeed well-definded. This gives us

$$\begin{aligned} \limsup _{n \rightarrow \infty } \int h ~\mathrm {d}\mu _n \le \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| \le M\}} h ~\mathrm {d}\mu _n + 0 = \int h ~\mathrm {d}\mu . \end{aligned}$$

Since \(0 \le \liminf _{n \rightarrow \infty } \int _{\{|h|> M\}} |h|~\mathrm {d}\mu _n \le \limsup _{n \rightarrow \infty } \int _{\{|h| > M\}} |h|~\mathrm {d}\mu _n\) for any M, we can use an almost identical argument to obtain

$$\begin{aligned} \liminf _{n \rightarrow \infty } \int h ~\mathrm {d}\mu _n \ge \lim _{M \rightarrow \infty } \liminf _{n \rightarrow \infty } \int _{\{|h| \le M\}} h ~\mathrm {d}\mu _n + 0 = \int h ~\mathrm {d}\mu , \end{aligned}$$

and thus \(\lim _{n\rightarrow \infty } \int h~\mathrm {d}\mu _n\) exists and is equal to \(\int h~\mathrm {d}\mu \).\(\square \)

In proving Theorem 1, we will make use of the following general result, which is a generalisation of Theorem U (ii) from [1].

Lemma 2

Let \((Z_k)_{k \in {\mathbb {N}}}\) be a strictly stationary and ergodic process with values in a separable metrisable topological space \({\mathcal {Z}}\) and marginal distribution \({\mathcal {L}}(Z_1) = \theta \). Let \(h : {\mathcal {Z}}^d \rightarrow {\mathbb {R}}\) be a measurable function, and let \(f : {\mathcal {Z}} \rightarrow {\mathbb {R}}\) be integrable with respect to \(\theta \), so that \(|h| \le f \otimes \cdots \otimes f\), where the product denoted by \(\otimes \) is taken d times and \((f \otimes \cdots \otimes f)(z_1, \ldots , z_d) := \prod _{k = 1}^d f(z_k)\). If h is \(\theta ^d\)-a.e. continuous, then \(V_h(Z_1, \ldots , Z_n) \rightarrow \int h ~\mathrm {d}\theta ^d\) a.s., where \(V_h(Z_1, \ldots , Z_n)\) denotes the V-statistics with kernel h.

Proof

Without loss of generality, suppose that \({\mathcal {Z}}\) is a metric space. Let \(\theta _n := n^{-1} \sum _{k=1}^n \delta _{Z_k}\) denote the empirical measure of \(Z_1, \ldots , Z_n\). We have the representation \(V_h(Z_1, \ldots , Z_n) = \int h ~\mathrm {d}\theta _n^d\). Furthermore, \(\theta _n \Rightarrow \theta \) a.s., since \({\mathcal {Z}}\) is separable, and therefore \(\theta _n^d \Rightarrow \theta ^d\) a.s. by Theorem 2.8 (ii) in [3].

We now wish to employ Lemma 1. Hence, we need to show that the sequence of integrals fulfills the following uniform integrability condition:

$$\begin{aligned} \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h| > M\}} |h| ~\mathrm {d}\theta _n^d = 0. \end{aligned}$$

We have

$$\begin{aligned} \int _{\{|h|> M\}} |h| ~\mathrm {d}\theta _n^d \le \int _{\{f \otimes \cdots \otimes f > M\}} f \otimes \cdots \otimes f ~\mathrm {d}\theta _n^d, \end{aligned}$$

and since \(\{f \otimes \cdots \otimes f > M\} \subseteq \bigcup _{i=1}^d M_i\) with \(M_i := \{z \in {\mathcal {Z}}^d ~|~f(z_i) > M^{1/d}\}\), the right-hand side is dominated by

$$\begin{aligned} \sum _{i=1}^d \int _{M_i} f \otimes \cdots \otimes f ~\mathrm {d}\theta _n^d = d\left( \int f ~\mathrm {d}\theta _n\right) ^{d-1}\int _{\{f > M^{1/d}\}} f ~\mathrm {d}\theta _n, \end{aligned}$$

which, due to Birkhoff’s pointwise ergodic theorem, almost surely converges to \(d\left( {\mathbb {E}}_\theta f\right) ^{d-1}{\mathbb {E}}_\theta [\mathbf{1 }_{\{f > M^{1/d}\}} f]\), where \(\mathbf{1 }_A\) denotes the indicator function of a set A. Thus, almost surely,

$$\begin{aligned} \lim _{M \rightarrow \infty } \limsup _{n \rightarrow \infty } \int _{\{|h|> M\}} |h| ~\mathrm {d}\theta _n^d \le \lim _{M \rightarrow \infty } d\left( {\mathbb {E}}_\theta f\right) ^{d-1}{\mathbb {E}}_\theta [\mathbf{1 }_{\{f > M^{1/d}\}} f] = 0 \end{aligned}$$

since f is assumed to be integrable.

Lemma 1 therefore gives us

$$\begin{aligned} V_h(Z_1, \ldots , Z_n) = \int h ~\mathrm {d}\theta _n^d \xrightarrow [n \rightarrow \infty ]{a.s.} \int h~\mathrm {d}\theta ^d. \end{aligned}$$

\(\square \)

Note that the following result does not require any assumptions beyond the separability of the metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) and the ergodicity of the samples generating the empirical measure \(\theta _n\). Thus, Proposition 2.6 in [12] and Theorem 4.4 in [9], both of which require iid samples, are consequences of our result.

Theorem 1

Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively, and \(Z := (X,Y)\). Write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).

If X and Y have finite first moments, i.e. \({\mathbb {E}}d(X,x_0), {\mathbb {E}}d(Y,y_0) < \infty \) for some fixed (but arbitrary) \(z_0 = (x_0, y_0) \in {\mathcal {X}} \times {\mathcal {Y}}\), then

$$\begin{aligned} \mathrm {dcov}(\theta _n) \xrightarrow [n \rightarrow \infty ]{a.s.} \mathrm {dcov}(\theta ). \end{aligned}$$

Proof

We follow the idea of the proof of Proposition 2.6 in [12]. Consider the symmetric kernel \({\bar{h}}\), defined as the symmetrisation of h, where

$$\begin{aligned} h(z_1, \ldots , z_6) := f(x_1, \ldots , x_4)f(y_1, y_2, y_5, y_6) \end{aligned}$$

and

$$\begin{aligned} f(x_1, \ldots , x_4) := d(x_1, x_2) - d(x_1, x_3) - d(x_2, x_4) + d(x_3, x_4). \end{aligned}$$

As shown in the proof of Proposition 2.6 in [12], we have

$$\begin{aligned} |h(z_1, \ldots , z_6)| \le 4 d(x_2, x_3) d(y_1, y_6). \end{aligned}$$
(4)

Let \(z_0 = (x_0, y_0)\) be an arbitrary but fixed point in \({\mathcal {X}} \times {\mathcal {Y}}\). Since \(a+b \le ab\) for all real \(a, b \ge 2\), we have

$$\begin{aligned} d(x,x') \le d(x,x_0) + d(x',x_0) \le (2 \vee d(x,x_0))(2 \vee d(x',x_0)) \end{aligned}$$

for all \(x, x' \in {\mathcal {X}}\). Now, for \(z = (x,y) \in {\mathcal {X}} \times {\mathcal {Y}}\), let \(\varphi _i(z)\) be defined as \(2 \vee d(x,x_0)\) if \(i = 2,3\) and as \(2 \vee d(y,y_0)\) if \(i = 1,6\), and write \(\varphi \) for the maximum over all these \(\varphi _i\). Using (4), this gives us

$$\begin{aligned} |h(z_1, \ldots , z_6)| \le 4 \varphi (z_1)\varphi (z_2)\varphi (z_3)\varphi (z_6). \end{aligned}$$

The functions \(\varphi _i\) are continuous and measurable, since the underlying metric spaces are separable. They are also integrable because X and Y are assumed to have finite first moments. Using Lemma 2 therefore gives us \(V_{{\bar{h}}}(Z_1, \ldots , Z_n) \rightarrow \int {\bar{h}} ~\mathrm {d}\theta ^6\) almost surely, where \(V_{{\bar{h}}}(Z_1, \ldots , Z_n)\) denotes the V-statistics with kernel \({\bar{h}}\). Since the V-statistics with kernel \({\bar{h}}\) are equal to \(\mathrm {dcov}(\theta _n)\), and \(\int {\bar{h}} ~\mathrm {d}\theta ^6 = \mathrm {dcov}(\theta )\) (cf. [12]), this is what we wanted to show.\(\square \)

Theorem 2

Let \({\mathcal {Z}}\) be a \(\sigma \)-compact metrisable topological space, \((Z_k)_{k \in {\mathbb {N}}}\) a strictly stationary sequence of \({\mathcal {Z}}\)-valued random variables with marginal distribution \({\mathcal {L}}(Z_1) = \theta \). Consider a continuous, symmetric, degenerate and positive semidefinite kernel \(h : {\mathcal {Z}}^2 \rightarrow {\mathbb {R}}\) with finite \((2+\varepsilon )\)-moments with respect to \(\theta ^2\) and finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal, i.e. \({\mathbb {E}}|h(Z_1,Z_1)|^{1+\varepsilon /2} < \infty \). Furthermore, let the sequence \((Z_k)_{k \in {\mathbb {N}}}\) satisfy an \(\alpha \)-mixing condition such that \(\alpha (n) = O(n^{-r})\) for some \(r > 1+2\varepsilon ^{-1}\). Then, with \(V = V_h(Z_1, \ldots , Z_n)\) denoting the V-statistics with kernel h,

$$\begin{aligned} nV \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \sum _{k=1}^\infty \lambda _k \zeta _k^2, \end{aligned}$$

where \((\lambda _k, \varphi _k)\) are pairs of the nonnegative eigenvalues and matching eigenfunctions of the integral operator

$$\begin{aligned} f \mapsto \int h(.,z)f(z)~\mathrm {d}\theta (z) \end{aligned}$$

and \((\zeta _k)_{k \in {\mathbb {N}}}\) is a sequence of centred Gaussian random variables whose covariance structure is given by

$$\begin{aligned} \mathrm {Cov}(\zeta _i, \zeta _j) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{t,u=1}^n \mathrm {Cov}(\varphi _i(Z_t), \varphi _j(Z_u)). \end{aligned}$$
(5)

Proof

We note that the conditions of Theorem 2 in [16] are satisfied by Propositions 1–3 and Assumption 1 ibid., the latter of which is a consequence of \({\mathbb {E}}|h(Z_1,Z_1)|^{1+\varepsilon /2} < \infty \). Hence, we get

$$\begin{aligned} h(z,z') = \sum _{k=1}^\infty \lambda _k \varphi _k(z)\varphi _k(z') \end{aligned}$$

for all \(z,z' \in \mathrm {supp}(\theta )\). The \(\varphi _k\) are centred and form an orthonormal basis of \(L^2(\theta )\). Adopting the notation \(V^{(K)}\) for the V-statistics for the truncated kernel \(\sum _{k=1}^K \lambda _k \varphi _k(z)\varphi _k(z')\), we note that \(nV^{(K)} = \sum _{k=1}^K \lambda _k \zeta _{n,k}^2\), where \(\zeta _{n,k} := n^{-1/2} \sum _{t=1}^n \varphi _k(Z_t)\). Using the Cramér-Wold theorem, we will now show that, for any \(K \in {\mathbb {N}}\), \((\zeta _{n,k})_{1 \le k \le K}\) weakly converges to \((\zeta _k)_{1 \le k \le K}\), where the \(\zeta _k\) are centred Gaussian variables with their covariances given in (5).

Let \(c_1, \ldots , c_K\) be real constants and set \(\xi _t := \sum _{k=1}^K c_k\varphi _k(Z_t)\). Then the \(\xi _t\) are centred random variables with \({\mathbb {E}}\xi _t^2 = \sum _{k=1}^K c_k^2\).

Note that, by definition, \(\varphi _k(z) = \lambda _k^{-1}{\mathbb {E}}[h(z,Z_1)\varphi _k(Z_1)]\) and thus

$$\begin{aligned} |\varphi _k(z)| \le |\lambda _k|^{-1}\Vert h(z,.)\Vert _2. \end{aligned}$$
(6)

Here, we have used the Cauchy–Schwarz inequality and the fact that the eigenfunctions \(\varphi _k\) form an orthonormal basis of \(L^2(\theta )\). This gives us

$$\begin{aligned} \int |\varphi _k(z)|^{2+\varepsilon } ~\mathrm {d}\theta (z)&\le \lambda _k^{-(2+\varepsilon )} \int \Vert h(z,.)\Vert ^{2+\varepsilon } ~\mathrm {d}\theta (z) \\&= \lambda _k^{-(2+\varepsilon )} \int \left( \int |h(z,z')|^2~\mathrm {d}\theta (z')\right) ^\frac{2+\varepsilon }{2}~\mathrm {d}\theta (z) \\&\le \lambda _k^{-(2+\varepsilon )} \int |h(z,z')|^{2+\varepsilon } ~\mathrm {d}\theta ^2(z,z') \end{aligned}$$

by Jensen’s inequality, which implies \(\Vert \varphi _k\Vert _{2+\varepsilon } \le \lambda _k^{-1}\Vert h\Vert _{2+\varepsilon }\). Since our kernel h has finite \((2+\varepsilon )\)-moments by assumption, this property translates to the eigenfunctions \(\varphi _k\). Using Theorem 3.7 and Remark 1.8 in [4] therefore gives us

$$\begin{aligned} |\mathrm {Cov}(\varphi _k(Z_t), \varphi _l(Z_u))| \le C \alpha (\sigma (Z_t), \sigma (Z_u))^{\varepsilon /(2+\varepsilon )} \le C \alpha (|t-u|)^{\varepsilon /(2+\varepsilon )} \end{aligned}$$

for all \(1 \le k,l \le K\), where C is a positive constant depending on the corresponding eigenfunctions and -values. From this and the fact that \(\alpha (n) = O(n^{-r})\) with \(r > 1+2\varepsilon ^{-1}\) it follows that, for any kl, the infinite series \(\sum _{d=1}^\infty \mathrm {Cov}(\varphi _k(Z_1), \varphi _l(Z_{1+d}))\) and \(\lim _n n^{-1}\sum _{d=1}^{n-1} d\mathrm {Cov}(\varphi _k(Z_1), \varphi _l(Z_{1+d}))\) converge, since \(d/n < 1\) for all \(1 \le d < n\). Thus, with \(S_n\) denoting the sum over \(\xi _1, \ldots , \xi _n\), we have that

$$\begin{aligned} n^{-1}\sigma _n^2 := n^{-1}{\mathbb {E}}S_n^2&= n^{-1} \sum _{t,u=1}^n \sum _{k,l=1}^K c_k c_l \mathrm {Cov}(\varphi _k(Z_t), \varphi _l(Z_u)) \\&= \sum _{k=1}^K c_k^2 + n^{-1}\sum _{t \ne u}^n \sum _{k,l=1}^K c_k c_l \mathrm {Cov}(\varphi _k(Z_t), \varphi _l(Z_u)) \\&= \sum _{k=1}^K c_k^2 + n^{-1} 2\sum _{d=1}^{n-1} (n-d) \sum _{k,l=1}^K c_k c_l \mathrm {Cov}(\varphi _k(Z_1), \varphi _l(Z_{1+d})) \\&\quad \xrightarrow [n \rightarrow \infty ]{} \sigma ^2 < \infty , \end{aligned}$$

where we have made use of the stationarity of the process \((Z_k)_{k \in {\mathbb {N}}}\) and the fact that the eigenfunctions \(\varphi _k\) form an orthonormal basis of \(L^2\). If \(\zeta _1, \ldots , \zeta _K\) are Gaussian random variables with their covariance function given by (5), the limit \(\sigma ^2\) is the variance of the linear combination \(\sum _{k=1}^K c_k\zeta _k\).

We now show the uniform integrability of the sequence \((S_n^2\sigma _n^{-2})_{n\in {\mathbb {N}}}\). It suffices to show that \({\mathbb {E}}|S_n\sigma _n^{-1}|^{2+\delta }\) is uniformly bounded in n for some \(\delta > 0\). Since h has finite \((2+\varepsilon )\)-moments, we get

$$\begin{aligned} \sup _{n \in {\mathbb {N}}}{\mathbb {E}}\left| \sum _{k=1}^K c_k\varphi _k(Z_n)\right| ^{2+\varepsilon } \le \sup _{n \in {\mathbb {N}}}\left\{ K^{1+\varepsilon } \sum _{k=1}^K {\mathbb {E}}\left[ |c_k\varphi _k(Z_n)|^{2+\varepsilon }\right] \right\}< M(\varepsilon ) < \infty . \end{aligned}$$

Here, we have made use of (6) and the stationarity of the sequence \((Z_n)\), which ensures that the upper bound \(M(\varepsilon )\) is indeed uniform in n. Since \(\alpha (n) = O(n^{-r})\) with \(r > 1 + 2\varepsilon ^{-1} \) and \(\sigma _n\) has rate of growth \(\theta (\sqrt{n})\), Theorem 2.1 in [15] gives us \({\mathbb {E}}|S_n\sigma _n^{-1}|^{2+\delta } = O(1)\) for some \(\delta > 0\). This implies uniform integrability of \((S_n^2\sigma _n^{-2})_{n \in {\mathbb {N}}}\).

Using Theorem 10.2 from [4] therefore gives us

$$\begin{aligned} \sum _{k=1}^K c_k \zeta _{n,k} = \frac{S_n}{\sqrt{n}} = \frac{S_n}{\sigma _n} \cdot \frac{\sigma _n}{\sqrt{n}} \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} {\mathcal {N}}(0, \sigma ^2) = {\mathcal {L}}\left( \sum _{k=1}^K c_k \zeta _k\right) , \end{aligned}$$

and so, by the Cramér-Wold theorem, the vectors \((\zeta _{n,k})_{1 \le k\le K}\) converge to Gaussian vectors \((\zeta _k)_{1 \le k \le K}\) with the covariance stucture described in (5) for any \(K \in {\mathbb {N}}\).

Now, applying the continuous mapping theorem gives us

$$\begin{aligned} nV^{(K)} = \sum _{k=1}^K \lambda _k \zeta _{n,k}^{2} \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \sum _{k=1}^K \lambda _k \zeta _k^2 =: \zeta ^{(K)} \end{aligned}$$
(7)

and the summability of the eigenvalues \(\lambda _k\), which is due to the identity \(\sum _{k=1}^\infty \lambda _k = {\mathbb {E}}h(Z_1, Z_1) < \infty \), implies that

$$\begin{aligned} {\mathbb {E}}\left| \zeta - \zeta ^{(K)}\right| = \sum _{k>K}\lambda _k \xrightarrow [K \rightarrow \infty ]{} 0. \end{aligned}$$
(8)

We will now show that

$$\begin{aligned} \lim _{K \rightarrow \infty }\limsup _{n \rightarrow \infty } {\mathbb {E}}|nV - nV^{(K)}| = 0. \end{aligned}$$
(9)

We consider the Hilbert space H of all real-valued sequences \((a_k)_{k \in {\mathbb {N}}}\) for which the series \(\sum _k \lambda _k a_k^2\) converges, equipped with the inner product given by \(\langle (a_k), (b_k)\rangle _H := \sum _k \lambda _k a_k b_k\). Then, writing \(T_K(Z_t)\) for the H-valued random variable \((0^K, (\varphi _k(Z_t))_{k > K})\), where \(0^K\) denotes the K-dimensional zero vector, we get

$$\begin{aligned} {\mathbb {E}}|nV - nV^{(K)}|&= {\mathbb {E}}\left[ \sum _{k>K} \lambda _k \left( \frac{1}{\sqrt{n}} \sum _{t=1}^n \varphi _k(Z_t)\right) ^2\right] \\&= {\mathbb {E}}\left\| \frac{1}{\sqrt{n}} \sum _{t=1}^n T_K(Z_t)\right\| _H^2 = \mathrm {Var}\left( \frac{1}{\sqrt{n}} \sum _{t=1}^n T_K(Z_t)\right) \\&= \frac{1}{n} \sum _{s,t=1}^n \mathrm {Cov}(T_K(Z_s), T_K(Z_t)). \end{aligned}$$

Here, we define the covariance of two H-valued random variables X and Y as the real number \(\mathrm {Cov}(X,Y) := {\mathbb {E}}\langle X,Y\rangle _H - \langle {\mathbb {E}}X, {\mathbb {E}}Y\rangle _H\). We aim to employ a covariance inequality for Hilbert-space valued random variables.

For this, let us first consider the \((2+\varepsilon )\)-moments of \(T_K(Z_1)\). For any \(p >0\), we get

$$\begin{aligned} \Vert T_K(Z_1)\Vert _p^p&= \int \Vert T_K(z)\Vert _H^p ~\mathrm {d}\theta (z) = \int \left( \sum _{k>K} \lambda _k \varphi _k(z)^2\right) ^{p/2}~\mathrm {d}\theta (z) \\&\le \int \left( \sum _{k=1}^\infty \lambda _k \varphi _k(z)^2\right) ^{p/2}~\mathrm {d}\theta (z) = \int h(z,z)^{p/2} ~\mathrm {d}\theta (z) \\&= \Vert h(Z_1, Z_1)\Vert _{p/2}^{p/2}. \end{aligned}$$

Since h has finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal by assumption, this implies the \((2+\varepsilon )\)-integrability of \(T_K(Z_1)\).

Lemma 2.2 in [7] and the stationarity of the process \((Z_t)_{t \in {\mathbb {N}}}\) therefore give us

$$\begin{aligned} |\mathrm {Cov}(T_K(Z_s), T_K(Z_t)| \le 15 \Vert T_K(Z_1)\Vert _{2+\varepsilon }^2 \alpha (|s-t|)^{\varepsilon /(2+\varepsilon )} \end{aligned}$$

and we have shown before that \(n^{-1}\sum _{s,t=1}^n \alpha (|s-t|)^{\varepsilon /(2+\varepsilon )}\) converges to a finite limit c. Furthermore, from \(\Vert T_K(Z_1)\Vert _2^2 = \sum _{k>K} \lambda _k \xrightarrow [K \rightarrow \infty ]{} 0\) and \(\Vert T_K(Z_1)\Vert _{2+\varepsilon } \le \Vert T_1(Z_1)\Vert _{2+\varepsilon }\) (i.e. the sequence \((T_K(Z_1))_{K \in {\mathbb {N}}}\) is uniformly \((2+\varepsilon )\)-integrable) it follows by Vitali’s Theorem that \(T_K(Z_1) \xrightarrow [K \rightarrow \infty ]{(2+\varepsilon )} 0\). Putting all of the above together, we get

$$\begin{aligned} \lim _{K \rightarrow \infty }\limsup _{n \rightarrow \infty }{\mathbb {E}}|nV - nV^{(K)}|&\le 15c\lim _{K \rightarrow \infty }\Vert T_K(Z_1)\Vert _{2+\varepsilon }^2 = 0. \end{aligned}$$

By Theorem 3.2 in [3], (7), (8) and (9), the latter of which we have just shown, imply \(nV \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \zeta \).\(\square \)

Lemma 3

If \((X_k)_{k \in {\mathbb {N}}}\) is a strictly stationary sequence of random variables whose marginal distribution \(\mu \) has finite q-moments, then there exists an upper bound \(M \in {\mathbb {R}}\) such that, for any collection of indices \(i_1, \ldots , i_4\),

$$\begin{aligned} {\mathbb {E}}\left[ f(X_{i_1}, \ldots , X_{i_4})^{2p}\right] \le M(p) < \infty \end{aligned}$$

for any \(p < q\), where f is the function from the proof of Theorem 1.

Proof

First, consider any two indices \(i_1, i_2\). Then, due to (17), we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[d(X_{i_1}, X_{i_2})^q]&\le 2^{q-1} {\mathbb {E}}[d(X_{i_1}, x_0)^q + d(x_0, X_{i_2})^q ]\\&= 2^q \int d(x,x_0)^q ~\mathrm {d}\mu (x) =: M_0 < \infty , \end{aligned} \end{aligned}$$
(10)

where \(x_0\) is some arbitrary point in \({\mathcal {X}}\).

Now, let \(i_1, \ldots , i_4\) be fixed but arbitrary indices. Then, with a similar bound to the one used in Lemma 5,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}[f(X_{i_1}, \ldots , X_{i_4})^{2p}] \le 4^p {\mathbb {E}}[d(X_{i_2}, X_{i_3})^{p} d(X_{i_1}, X_{i_4})^{p}] \\&\quad \le 4^{p} \left| {\mathbb {E}}[d(X_{i_2}, X_{i_3})^{p} d(X_{i_1}, X_{i_4})^{p}] - {\mathbb {E}}[d(X_{i_2}, X_{i_3})^{p}]{\mathbb {E}}[d(X_{i_1}, X_{i_4})^{p}]\right| \\&\qquad + 4^{p} {\mathbb {E}}[d(X_{i_2}, X_{i_3})^p]{\mathbb {E}}[d(X_{i_1}, X_{i_4})^p]. \end{aligned} \end{aligned}$$
(11)

We use Lemma 1 from [18] for the function \(h(x_1, \ldots , x_4) := d(x_1, x_2)^p d(x_3, x_4)^p\) and the reordered collection \((i_2, i_3, i_1, i_4)\). Their assumptions are satisfied with \(\delta := \frac{q}{p} - 1\), because

$$\begin{aligned} \int h^{1+\delta } ~\mathrm {d}\left( {\mathcal {L}}(X_{i_2}, X_{i_3}) \otimes {\mathcal {L}}(X_{i_1}, X_{i_4})\right) = {\mathbb {E}}[d(X_{i_2}, X_{i_3})^q]{\mathbb {E}}[d(X_{i_1}, X_{i_4})^q] \le M_0^2 \end{aligned}$$

due to (10). Thus, Lemma 1 in [18] gives us

$$\begin{aligned} \begin{aligned}&|{\mathbb {E}}[d(X_{i_2}, X_{i_3})^{p} d(X_{i_1}, X_{i_4})^{p}] - {\mathbb {E}}[d(X_{i_2}, X_{i_3})^{p}]{\mathbb {E}}[d(X_{i_1}, X_{i_4})^{p}]| \\&\quad \le 4 M_0^{\frac{2}{1+\delta }} \beta (|i_1 - i_3|)^\frac{\delta }{1 + \delta }, \end{aligned} \end{aligned}$$
(12)

where \(\beta (n)\) is the \(\beta \)-mixing coefficient of the sequence \((Z_k)_{k \in {\mathbb {N}}}\). Because \(\beta (n) \le 1\) for all \(n \in {\mathbb {N}}\), (10), (11) and (12) give us

$$\begin{aligned} {\mathbb {E}}[f(X_{i_1}, \ldots , X_{i_4})^{2p}] \le 4^{p+1}M_0^\frac{2}{1+\delta } + 4^p M_0^2 =: M(p) < \infty . \end{aligned}$$

\(\square \)

The following lemma is an adaptation of Lemma 2 in [18] in the sense that our result is implicitly contained in their proof. Another variant of this lemma (for U-statistics) can be found in [2]. Since both of these lemmas are slightly different from our version, we include a proof for the sake of completeness. However, it should be noted that all three proofs apply the same technique.

Lemma 4

Let h be a symmetric and degenerate kernel of order \(c \ge 2\). Here, we understand degeneracy as \({\mathbb {E}}h(z_1, \ldots , z_{c-1}, Z_c) = 0\) almost surely. If, for some \(p > 2\), the p-th moments of \(h(Z_{i_1}, \ldots , Z_{i_c})\) are uniformly bounded and \((Z_n)_{n \in {\mathbb {N}}}\) is strictly stationary and absolutely regular with mixing coefficients \(\beta (n) = O(n^{-r})\), where \(r > cp/(p-2)\), then \({\mathbb {E}}[V^2] = O(n^{-c})\), where \(V = V_h(Z_1, \ldots , Z_n)\) is the V-statistic with kernel h.

Proof

We will follow the basic idea of the proof of Lemma 2 in [18]. First, consider the special case of \(c = 2\). We have

$$\begin{aligned} {\mathbb {E}}\left[ \left( \sum _{1 \le i_1, i_2 \le n} h(Z_{i_1}, Z_{i_2})\right) ^2\right] = \sum _{1 \le i_1, \ldots , i_4 \le n} {\mathbb {E}}[h(Z_{i_1}, Z_{i_2})h(Z_{i_3}, Z_{i_4})]. \end{aligned}$$

Now due to the degeneracy of our kernel h, we can employ Lemma 1 in [18] to obtain

$$\begin{aligned} {\mathbb {E}}[h(Z_{i_1}, Z_{i_2})h(Z_{i_3}, Z_{i_4})] \le M \cdot \beta \left( \max \{|i_2 - i_1|, |i_4 - i_3|\}\right) ^{(p-2)/p} \end{aligned}$$

whenever \((i_1, i_2) \ne (i_3, i_4)\). Here, M is some constant uniform in \(i_1, \ldots , i_4\) and n.

Let us first assume that \(k := |i_2 - i_1| \ge |i_4 - i_3| =: l\). For any fixed value of k, we have at most \(2(n-k)\) possible values for \(i_1\). Furthermore, since \(k \ge l \ge 0\), we have \(k+1\) possible values for l and, for any fixed l, at most \(2(n-l)\) possible values for \(i_3\). Writing

$$\begin{aligned} {\mathcal {I}} := \{(i_1, \ldots , i_4) ~|~ 1 \le i_1, \ldots , i_4 \le n, |i_2 - i_1| \ge |i_4 - i_3|, (i_1, i_2) \ne (i_3, i_4)\} \end{aligned}$$

this gives us

$$\begin{aligned} \sum _{i_1, \ldots , i_4 \in {\mathcal {I}}} {\mathbb {E}}[h(Z_{i_1},Z_{i_2})h(Z_{i_3}, Z_{i_4})]&\le \sum _{k=0}^{n-1} \sum _{i_1 = 1}^{n-k} \sum _{l=0}^{k} \sum _{i_3 = 1}^{n-l} M \beta (k)^{(p-2)/p} \\&\le 4M n^2 \sum _{k=0}^{n-1} (k+1)\beta (k)^{(p-2)/p} \\&= O(n^2). \end{aligned}$$

The sum converges due to our assumptions on \(\beta (n)\). The same bound can be established for the cases where \(|i_4 - i_3| \ge |i_2 - i_1|\). The only combinations missing are those where \((i_1, i_2) = (i_3, i_4)\), of which there are \(n^2\). We can combine these results to get

$$\begin{aligned} \sum _{1 \le i_1, \ldots , i_4 \le n} {\mathbb {E}}[h(Z_{i_1}, Z_{i_2})h(Z_{i_3}, Z_{i_4})] = O(n^2), \end{aligned}$$

which proves the lemma in the case \(c = 2\).

The proof for arbitrary c follows the same idea. We then obtain an upper bound of

$$\begin{aligned} 2^c M n^c \sum _{k=0}^{n-1} (k+1)^{c-1} \beta (k)^{(p-2)/p} \le 2^{2c-1} M n^c \sum _{k=0}^{n-1} (k^{c-1} + 1) \beta (k)^{(p-2)/p} \end{aligned}$$

which again is \(O(n^c)\) due to our bounds on \(\beta (n)\).\(\square \)

Theorem 3

Let X and Y be random variables with values in separable metric spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\), respectively, and \(Z := (X,Y)\). Write \(\theta := {\mathcal {L}}(Z)\), \(\mu := {\mathcal {L}}(X)\) and \(\nu := {\mathcal {L}}(Y)\), and denote by \(\theta _n\) the empirical measure of \(Z_1, \ldots , Z_n\), where \((Z_k)_{k \in {\mathbb {N}}}\) is a strictly stationary and ergodic sequence with \({\mathcal {L}}(Z_1) = \theta \).

Suppose that \({\mathcal {X}}\) and \({\mathcal {Y}}\) are of negative type via mappings \(\phi \) and \(\psi \), respectively, and that \({\mathcal {X}} \times {\mathcal {Y}}\) is \(\sigma \)-compact. If X and Y are independent, have finite \((1+\varepsilon )\)-moments for some \(\varepsilon > 0\), and the sequence \((Z_k)_{k \in {\mathbb {N}}}\) is absolutely regular with mixing coefficients \(\beta (n) = O(n^{-r})\) for some \(r > 6(1 + 2\varepsilon ^{-1})\), then

$$\begin{aligned} n\cdot \mathrm {dcov}(\theta _n) \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \zeta := \sum _{k=1}^\infty \lambda _k \zeta _k^2, \end{aligned}$$

where the \(\zeta _k\) are centred Gaussian random variables whose covariance function given in (5) is determined by the dependence structure of the sequence \((Z_k)_{k \in {\mathbb {N}}}\), and the parameters \(\lambda _k > 0\) are determined by the underlying distribution \(\theta \).

Proof

Consider the identity \(\mathrm {dcov}(\theta _n) = V_{{\bar{h}}}(Z_1, \ldots , Z_n) =: V\) as given in Theorem 1. We will employ Hoeffding decomposition, i.e.

$$\begin{aligned} V = \sum _{c=0}^6 {6 \atopwithdelims ()c} V_{{\bar{h}}_c}(Z_1, \ldots , Z_n), \end{aligned}$$

where

$$\begin{aligned} {\bar{h}}_c(z_1, \ldots , z_c) = \sum _{A \subset \{1, \ldots , 6\}} (-1)^{6 - \#A} \int {\bar{h}}(z_1, \ldots , z_6) ~\mathrm {d}\theta ^{6-c}(z_{c+1}, \ldots , z_6) \end{aligned}$$

for \(0 \le c \le 6\). It can be readily seen that under the assumption of independence of X and Y, \({\bar{h}}_1 = 0\) almost surely, and so the Hoeffding decomposition reduces to

$$\begin{aligned} V = \sum _{c=2}^6 {6 \atopwithdelims ()c} V_{{\bar{h}}_c}(Z_1, \ldots , Z_n). \end{aligned}$$
(13)

We will show that the kernel \({\bar{h}}_2\) satisfies the conditions of Theorem 2 and that, under our assumptions,

$$\begin{aligned} nV - nV_{{\bar{h}}_2}(Z_1, \ldots , Z_n)) \xrightarrow [n \rightarrow \infty ]{{\mathbb {P}}} 0. \end{aligned}$$
(14)

Application of some algebra shows that \({\bar{h}}_2 = \delta _\theta /15\), proceeding in the following way:

It can be easily checked that under independence of X and Y, \({\bar{h}}\) is a degenerate kernel, since integrating over all but one argument of f (with respect to either of the marginal distributions of \(\theta \)) yields a function which is 0 almost surely. Therefore,

$$\begin{aligned} {\bar{h}}_2(z_1, z_2) = \frac{1}{6!}\sum _{\sigma \in {\mathfrak {S}}_6} \int h(z_{\sigma (1)}, \ldots , z_{\sigma (6)}) ~\mathrm {d}\theta ^4(z_3, \ldots , z_6), \end{aligned}$$

where \({\mathfrak {S}}_6\) is the symmetric group of all permutations operating on \(\{1, \ldots , 6\}\). Notice that the summands are equal to \(\delta _\theta (z_{\sigma (1)}, z_{\sigma (2)})\) if \(\sigma (1), \sigma (2) \in \{1,2\}\). This follows directly from the definitions of \(d_\mu \) and \(d_\nu \). Moreover, 1 and 2 are the only indices appearing in both \(f(X_1, \ldots , X_4)\) and \(f(Y_1, Y_2, Y_5, Y_6)\), so any permutation \(\sigma \) with \(\sigma (1), \sigma (2) \notin \{1,2\}\) results in taking the integral of f over all or all but one argument, either with respect to \(\mu \) or with respect to \(\nu \). But we have seen before that these integrals are 0 almost surely, and so, due to the independence of X and Y, the same is true for the integral of h with respect to \(\theta \).

There are \(2\cdot 4!\) permutations of this kind, and so

$$\begin{aligned} {\bar{h}}_2(z_1, z_2) = \frac{2\cdot 4!}{6!}\sum _{\sigma \in {\mathfrak {S}}_6} \delta _\theta (z_{\sigma (1)}, z_{\sigma (2)}) = \frac{1}{15} \delta _\theta (z_1, z_2). \end{aligned}$$

We can therefore consider the object \(\delta _\theta \) instead of \({\bar{h}}_2\).

By identity (1) we have, for any real constants \(c_1, \ldots , c_m\) and \(z_1, \ldots , z_m \in {\mathcal {X}} \times {\mathcal {Y}}\),

$$\begin{aligned} \sum _{i,j=1}^m c_i c_j \delta _\theta (z_i, z_j)&= 4\sum _{i,j=1}^m c_i c_j \langle ({\hat{\phi }} \otimes {\hat{\psi }})(z_i), {\hat{\phi }} \otimes {\hat{\psi }})(z_j)\rangle \\&= 4 \left\langle \sum _{i=1}^m c_i ({\hat{\phi }} \otimes {\hat{\psi }})(z_i), \sum _{i=1}^m c_i ({\hat{\phi }} \otimes {\hat{\psi }})(z_i)\right\rangle \\&= \left\| 2\sum _{i=1}^m c_i ({\hat{\phi }} \otimes {\hat{\psi }})(z_i)\right\| ^2 \ge 0, \end{aligned}$$

so our kernel is positive semidefinite. It is furthermore continuous. By Lemma 5, \(\delta _\theta \) has finite \((2+\varepsilon )\)-moments with respect to \(\theta ^2\) and finite \((1+\frac{\varepsilon }{2})\)-moments on the diagonal. Since \(2\alpha (n) \le \beta (n)\) (cf. [4]), we have

$$\begin{aligned} nV_{{\bar{h}}_2}(Z_1, \ldots , Z_n) \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \sum _{k=1}^\infty \lambda _k \zeta _k^2 \end{aligned}$$
(15)

by Theorem 2.

We will now prove (14). For this, we will first note that under our assumptions, the kernel \({\bar{h}}\) has finite \((2+\varepsilon )\)-moments with respect to \(\theta ^6\). This can be seen with a similar approach as in the proof of Lemma 5. Furthermore, Lemma 3 together with the independence of X and Y gives us the existence of an upper bound \(M \in {\mathbb {R}}\) such that

$$\begin{aligned} {\mathbb {E}}\left[ {\bar{h}}(Z_{i_1}, \ldots , Z_{i_6})^{2+\varepsilon }\right] \le M < \infty \end{aligned}$$

for any collection of indices \(1 \le i_1, \ldots , i_6 \le n\).

Employing Lemma 4 therefore gives us

$$\begin{aligned} {\mathbb {E}}\left[ V_{{\bar{h}}_c}(Z_1, \ldots , Z_n)^2\right] = O(n^{-c}) \end{aligned}$$

for all \(c \ge 2\). Now, together with (13), we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ (nV - nV_{{\bar{h}}_2}(Z_1, \ldots , Z_n))^2\right]&= {\mathbb {E}}\left[ \left( n\sum _{c=3}^6 {6 \atopwithdelims ()c} V_{{\bar{h}}_c}(Z_1, \ldots , Z_n)\right) ^2\right] \\&\quad \le 4n^2 \sum _{c=3}^6 {\mathbb {E}}\left[ V_{{\bar{h}}_c}(Z_1, \ldots , Z_n)^2\right] \\&= \sum _{c=3}^6 O(n^{2-c}) = O(n^{-1}). \end{aligned} \end{aligned}$$
(16)

This implies (14), which together with (15) proves the Theorem.\(\square \)

Using these two results, we can generalise Corallary 2.8 from [12].

Corollary 1

Under the assumptions of Theorem 3, we have

$$\begin{aligned} n\frac{\mathrm {dcov}(\theta _n)}{D(\mu _n)D(\nu _n)} \xrightarrow [n \rightarrow \infty ]{{\mathcal {D}}} \frac{\sum _{k=1}^\infty \lambda _k \zeta _k^2}{D(\mu )D(\nu )} =:Q \end{aligned}$$

with \({\mathbb {E}}Q = 1\). If \(\mathrm {dcov}(\theta ) > 0\), i.e. \(\theta \) is not the product measure of its marginal distributions \(\mu \) and \(\nu \), the left-hand side converges to \(\infty \) almost surely.

Proof

We have the identity \(D(\mu _n) = n^{-2} \sum _{k,l=1}^n d(X_k, X_l)\), and thus by Lemma 2\(D(\mu _n) \xrightarrow {a.s.} D(\mu )\). The same holds for \(D(\nu _n)\), and thus the convergence in distribution follows with the Slutsky theorem. Since \(D(\mu )D(\nu ) = {\mathbb {E}}\delta _\theta (Z_1, Z_1) = \sum _{k=1}^\infty \lambda _k\), the expected value of the limiting distribution is equal to 1.

If \(\mathrm {dcov}(\theta ) > 0\), the almost sure convergence follows by Theorem 1.\(\square \)

Remark 1

It would be desirable to achieve a result similar to Theorem 3 under the assumption of just \(\alpha \)-mixing. For example, Theorem 3.2 in [5] gives such a result under the supposition that X and Y are real-valued random vectors.

For our more general setting of (pseudo-)metric spaces, one only needs to show that (14) still holds in the case of \(\alpha \)-mixing, since Theorem 2 does not require absolute regularity. We consider it likely that this can indeed be derived from the amicable properties of the distance covariance.

3 Generalisation to Pseudometric Spaces

Let \(({\mathcal {X}}, d)\) be a metric space and consider \(d^\beta \) for \(\beta \in (0,2]\). Then \(d^\beta \) is a pseudometric, i.e. the triangle inequality does not necessarily hold for \(d^\beta \). We will develop parts of the theory of [12] for pseudometric spaces of this particular kind, which we will refer to as \(\beta \)-pseudometric spaces. This is of interest if one considers \(\mathrm {dcov}_\beta \), a generalisation of the usual distance covariance, which results from using the \(\beta \)-th power of the metrics on \({\mathcal {X}}\) and \({\mathcal {Y}}\) for the definition of \(d_\mu \) and \(d_\nu \). That is, \(\mathrm {dcov}_\beta \) with respect to \(({\mathcal {X}}, d)\) and \(({\mathcal {Y}}, d)\) is equivalent to the regular distance covariance with respect to the \(\beta \)-pseudometric spaces \(({\mathcal {X}}, d^\beta )\) and \(({\mathcal {Y}}, d^\beta )\). Obviously, for any constant \(\beta > 0\), \(d^\beta \) induces the same topology (and thus, the same Borel \(\sigma \)-algebra) as the original metric d. This means that any \(\beta \)-pseudometric space is a metrisable topological space.

This approach of viewing \(\mathrm {dcov}_\beta \) not as a different object on the same space, but as the same object on a different space might not be very intuitive at first. However, since the concept of (strong) negative type does not require a metric space, this characterisation allows us to still use the relation between (strong) negative type of the underlying space and the distance covariance. This leads to the question of whether \(({\mathcal {X}}, d^\beta )\) is of (strong) negative type, given the original metric space \(({\mathcal {X}}, d)\), for which some criteria are known—see for example Corollary 3 or, more generally, [11] and [14].

Note that if \(\beta \in (0, 1]\), \(d^\beta \) is indeed still a metric, and we can rely on the already developed theory for separable metric spaces. Thus, we get the following result.

Corollary 2

Let \(\beta \in (0,1]\). Theorems 1 and 3 still hold for \(\mathrm {dcov}_\beta \) if we replace the finite first moment condition of Theorem 1 and the finite \((1+\varepsilon )\)-moment condition of Theorem 3 by finite \(\beta \)- and \((1+\varepsilon )\beta \)-moment assumptions, respectively.

Proof

Theorem 1 follows immediately. For Theorem 2, we note that \(d^\beta \) induces the same Borel \(\sigma \)-algebra as d. Furthermore, by Remark 3.19 in [12], the resulting metric spaces are still of negative type.\(\square \)

For \(\beta \in (1,2)\), while we cannot rely on the triangle inequality, the Jensen inequality gives us a result which we will call the weak triangle inequality. Specifically, for any \(\beta \in [1,2]\):

$$\begin{aligned} d^\beta (x, x') \le 2^{\beta -1} \{d^\beta (x,x_0) + d^\beta (x_0, x')\} \end{aligned}$$
(17)

for all \(x, x', x_0 \in {\mathcal {X}}\). This can be further bounded by replacing the factor \(2^{\beta -1}\) by 2.

Like in the metric case, we say that a probability measure \(\mu \) has finite first moment if there exists an element \(x_0 \in {\mathcal {X}}\) such that \(\int d(x,x_0) ~\mathrm {d}\mu (x) < \infty \). Again, the choice of \(x_0\) is arbitrary due to the weak triangle inequality. Thus, we can define the objects \(a_\mu \), \(D(\mu )\) and \(d_\mu \) as in the metric case.

Lemma 5

If \(\mu \) has finite \(\beta p\)-moment, then \(d_\mu ^{(\beta )}\) has finite 2p-moment with respect to \(\mu ^2\) and finite p-moment on the diagonal for any \(p \ge 1\).

Proof

We take inspiration from the proof of Proposition 2.6 in [12]. Define the functions

$$\begin{aligned} f(x_1, \ldots , x_4) := d^\beta (x_1, x_2) - d^\beta (x_1, x_3) - d^\beta (x_2, x_4) + d^\beta (x_3, x_4) \end{aligned}$$

and

$$\begin{aligned} h(x_1, \ldots , x_6) := f(x_1, \ldots , x_4)f(x_1, x_2, x_5, x_6) \end{aligned}$$

We have

$$\begin{aligned} f(x_1, \ldots x_4) \le 2d^\beta (x_1, x_2) - d^\beta (x_1, x_3) - d^\beta (x_2, x_4) + 2d^\beta (x_3, x_4) =: f_+ \end{aligned}$$

and, using the weak triangle inequality, \(|f_+| \le 4d^\beta (x_2,x_3)\). Similarly, we have

$$\begin{aligned} f(x_1, \ldots , x_4) \ge d^\beta (x_1, x_2) - 2d^\beta (x_1, x_3) - 2d^\beta (x_2, x_4) + d^\beta (x_3, x_4) =: f_-. \end{aligned}$$

Again, \(|f_-| \le 4d^\beta (x_2, x_3)\), and thus \(|f(x_1, \ldots , x_4)| \le 4d^\beta (x_2, x_3)\). In the same way, one shows that the absolute value of \(f(x_1, \ldots , x_4)\) can also be bounded by \(4d^\beta (x_1, x_4)\). Therefore \(|h(x_1, \ldots , x_6)| \le 16 d^\beta (x_2, x_3)d^\beta (x_1, x_4)\), and so

$$\begin{aligned} \int |d_\mu ^{(\beta )}(x_1, x_2)|^{2p} ~\mathrm {d}\mu ^2(x_1, x_2)&= \int \left| \int h(x_1, \ldots , x_6) ~\mathrm {d}\mu ^4(x_3, \ldots , x_6)\right| ^p~\mathrm {d}\mu ^2(x_1, x_2) \\&\le 16^p\int d^{\beta p}(x_2, x_3)d^{\beta p}(x_1, x_4) ~\mathrm {d}\mu ^4(x_1, \ldots , x_4) \\&= \left( 4^{p/2}\int d^{\beta p}(x,x') ~\mathrm {d}^2(x,x')\right) ^2 < \infty . \end{aligned}$$

Furthermore, we have

$$\begin{aligned} \int |d_\mu ^{(\beta )}(x,x)|^p ~\mathrm {d}\mu (x)&= \int \left| \int f(x, x, x_3, \ldots , x_6) ~\mathrm {d}\mu ^2(x_3, x_4)\right| ^p~\mathrm {d}\mu (x) \\&\quad \le 4^p \int d^{\beta p}(x, x_3) ~\mathrm {d}\mu ^2(x,x_3) < \infty , \end{aligned}$$

i.e. \(d_\mu ^{(\beta )}\) has finite p-moment on the diagonal.\(\square \)

We can now define \(\delta _\theta \) and \(\mathrm {dcov}(\theta )\) analogously to the metric case. Since the relevant proofs do not make use of the triangle inequality, it follows from [12] that for pseudometric spaces of strong negative type \(\theta = \mu \otimes \nu \) if and only if \(\mathrm {dcov}(\theta ) = 0\). This, together with the next Lemma, gives a very easy proof of Theorem 4.2 in [6].

Lemma 6

If \((H, \Vert .\Vert )\) is a separable Hilbert space, then \((H, \Vert .\Vert ^\beta )\) is of negative type for all \(\beta \in (0,2]\), and of strong negative type for all \(\beta \in (0,2)\).

Proof

Without loss of generality, assume H to be equal to \(L^2[0,1]\). By Theorem 5 in [14], for any \(\beta \in (0,2]\), there exists an embedding \(\Phi : H \rightarrow L^2[0,1]\) with \(\Vert x-x'\Vert _2^{\beta /2} = \Vert \Phi (x) - \Phi (x')\Vert _2\) for all \(x, x' \in H\), which implies that \((H, \Vert .\Vert ^\beta )\) is of negative type. By Remark 3.19 in [12] (which, along with all its auxiliary results, also holds for pseudometric spaces), the space \((H, \Vert .\Vert ^\beta )\) therefore has strong negative type for all \(\beta \in (0,2)\).\(\square \)

We can use this Lemma to adapt Corollary 5.9 from [11].

Corollary 3

Let \(({\mathcal {X}}, d)\) be a metric space. If there exists an isometric embedding from \({\mathcal {X}}\) into a separable Hilbert space H, then \(({\mathcal {X}}, d^\beta )\) is of negative type for all \(\beta \in (0,2]\) and of strong negative type for all \(\beta \in (0,2)\).

Proof

Fix \(\beta \in (0,2]\), and let \(\varphi : {\mathcal {X}} \rightarrow L^2[0,1]\) be an isometric embedding. By Lemma 6, \((H, \Vert .\Vert _H^\beta )\) is of negative type via some embedding \(\Phi \), which implies that \(({\mathcal {X}}, d^\beta )\) is of negative type via \((\Phi \circ \varphi )\). If \(\beta < 2\), then \((H, \Vert .\Vert _H^\beta )\) is of strong negative type, and so, for any two probability measures \(\mu _1, \mu _2\) on \({\mathcal {X}}\), we have that

$$\begin{aligned} D(\mu _1 - \mu _2)&= \int \Vert \varphi (x) - \varphi (x')\Vert _H^\beta ~\mathrm {d}(\mu _1^2 - \mu _2^2)(x,x') \\&=\int _{\varphi ({\mathcal {X}})^2} \Vert x-x'\Vert _H^\beta ~\mathrm {d}\left( (\mu _1^\varphi )^2 - (\mu _2^\varphi )^2\right) (x,x') =D(\mu _1^\varphi - \mu _2^\varphi ), \end{aligned}$$

where \(\mu _i^\varphi \) denotes the pushforward of \(\mu _i\) via \(\varphi \). We can extend the last integral to the entire space H, because the pushforward measures vanish on \(\varphi ({\mathcal {X}})^C\). Using the strong negative type of \((H, \Vert .\Vert _H^\beta )\), this gives us \(\mu _1^\varphi = \mu _2^\varphi \), which implies \(\mu _1 = \mu _2\), since \(\varphi \) is injective.\(\square \)

Corollary 4

Let \(\beta \in (1,2)\). Then, if we replace the finite first moment condition of Theorem 1 by a finite \(\beta \)-moment assumption, Theorem 1 still holds for \(\mathrm {dcov}_\beta \). If we furthermore assume \({\mathcal {X}}\) and \({\mathcal {Y}}\) to be isometrically embeddable into separable Hilbert spaces, and replace the finite \((1+\varepsilon )\)-condition with a finite \((1+\varepsilon )\beta \)-moment assumption, then Theorem 3 still holds for \(\mathrm {dcov}_\beta \).

Proof

We first consider Theorem 1. We can replace (4) by

$$\begin{aligned} |h(z_1, \ldots , z_6)| \le 16 d^\beta (x_2, x_3)d^\beta (y_1, y_4) \end{aligned}$$

as we have done in the proof of Lemma 5. This changes the original bound only by constant, which does not affect the remainder of the proof.

If \({\mathcal {X}}\) and \({\mathcal {Y}}\) are isometrically embeddable into separable Hilbert spaces, then by Corollary 3 the spaces resulting from raising their metrics to the power \(\beta \) are of negative type. By Lemma 5, the proof of Theorem 3 still holds for \(\beta \)-pseudometric spaces. We can therefore apply Theorem 3 to the spaces \(({\mathcal {X}}, d^\beta )\) and \(({\mathcal {Y}}, d^\beta )\).\(\square \)

4 Further Work

The limiting distribution established in Theorem 3 is dependent both on the marginal distribution \(\theta \) (through the eigenvaleus \(\lambda _k\)) and the dependence structure of the process \((Z_k)_{k \in {\mathbb {N}}}\) (through the Gaussian process \((\zeta _k)_{k \in {\mathbb {N}}}\)). Thus, one cannot directly use this result to construct a test of independence, since the critical values of this test would in general be unknown.

Such a dependence of the limiting distribution on unknown parameters is not unusual—indeed, in the iid case, there are many well-established ways to approximate the asymptotic distribution of a random variable, even if it may depend on unknown parameters. The authors of [17], for instance, propose a permutation test to approximate the asymptotic distribution of the distance covariance for real-valued iid data.

In the case of dependent data, such as we have examined in this paper, one cannot employ methods that would alter the dependence structure of the original sequence \((Z_k)_{k \in {\mathbb {N}}}\), since this in turn would result in a different Gaussian process \((\zeta _k)_{k \in {\mathbb {N}}}\) and thus a different limiting distribution. A feasible approach might be a type of block bootstrap (cf. [10], sections 2.5–2.7), where the resampling occurs from a collection of blocks, each consisting of a certain number of consecutive observations, thus leaving the dependence structure of the original process unchanged. We are currently working on proving the consistency of such a block bootstrap for the distance covariance.