1 Introduction

In recent years, the use of multi-dimensional Lévy processes for modeling purposes has become very popular in many areas, especially in the field of finance (e.g. Cont and Tankov 2004; see also Sato 1999 for a comprehensive study). The distribution of a Lévy process is usually specified by its characteristic triplet (drift, Gaussian component, and Lévy measure) rather than by the distribution of its independent increments. Indeed, the exact distribution of these increments is most often intractable or without closed formula. For this reason, an important task is to provide estimation methods for the characteristic triplet.

Such estimation methods depend on the way observations are performed. In our model, two-dimensional Lévy process \( \mathbf{X }_{t} \) is observed at high frequency, i.e., the time between two consecutive observations is \(\frac{1}{n}\). The characteristic function of such a two-dimensional Lévy process is given by

$$\begin{aligned} \phi _{n}(\mathbf{u }_{n}) := {\mathbb {E}}[\exp (i\langle \mathbf{u }_{n}, X_{1/n}\rangle )] = \exp \bigg \{\frac{1}{n}\Psi (\mathbf{u }_{n};\mathbf{b }, C, F)\bigg \}, \quad { \mathbf{u }_{n} \in {\mathbb {R}}^{2}}, \end{aligned}$$
(1)

where

$$\begin{aligned} \begin{aligned} \Psi (\mathbf{u }_{n})&= \Psi (\mathbf{u }_{n};\mathbf{b }, C, F) = i\left\langle \mathbf{u }_{n},\mathbf{b }\right\rangle -\frac{\left\langle C \mathbf{u }_{n},\mathbf{u }_{n}\right\rangle }{2}+\int _{{\mathbb {R}}^{2}}\big (\exp (i\left\langle \mathbf{u }_{n}, \mathbf{x }\right\rangle )\\&\quad -1-i\left\langle \mathbf{u }_{n}, \mathbf{x }\right\rangle \mathbb {1}_{\left\{ ||\mathbf{x }||_{{\mathbb {R}}^{2}}\le 1\right\} }\big )F(d\mathbf{x }), \end{aligned} \end{aligned}$$
(2)

\(\mathbf{b } \in {\mathbb {R}}^{2}\) is the drift, is the covariance matrix, and \(F \in {\mathcal {P}}({\mathbb {R}}^{2}) \) is the jump measure. The triplet \( (\mathbf{b }, C, F) \) is called Lévy Khinchine characteristic. By virtue of simplicity, we consider the characteristic function on the diagonal and concentrate primarily on a two-dimensional regime, but extensions to the general multi-dimensional setting are straightforward to obtain as well.

Nonparametric inference from high-frequency data on the triplet of a Lévy process has been considered by Barndorff-Nielsen and Shephard (2002), Aït-Sahalia et al (2010), Jacod and Reiß (2014), Bibinger et al (2014), Mancini (2017), Belomestny and Trabs (2018), and the references therein. In addition, minimax estimation of the covariance has been the subject of Papagiannouli (2020). In this work, the author develops a family of covariance estimators \( \widehat{C}_{n}^{12}(U_{n}) \) to infer \( C^{12} \). Although this contribution proves that \( \widehat{C}^{12}_{n}(U_{n}) \) achieves minimax rates for the estimation of \( C^{12} \), this approach nevertheless presents a drawback insofar as \( U_{n} \) depends on a number of unknown parameters, such as the co-jump activity index \( r \in (0, 2]\). Co-jumps refer to the case when the underlying processes jump at the same time with the same direction. r refers to the Blumenthal-Getoor index for co-jumps. To overcome this shortcoming, a data-driven choice \( \widehat{U} \) is needed which ensures near-minimax rates for the estimation error.

A natural way to extend minimax theory to an adaptation theory is to construct estimators which simultaneously achieve near-minimax rates over many subsets of parameter space. Starting with the work of Lepskiĭ (1991), the design of minimax-adaptive estimators for linear functionals has been widely covered in the literature, e.g. Efromovich and Low (1994) and Birgé (2001). Lepskiĭ designed a strategy for choosing a data-dependent parameter which uses only differences between estimators. His stopping rule considered only the monotonicity of the deterministic and stochastic errors. This method is widely applied in learning theory, where supervised learning algorithms depend on some tuning parameter, correct choice of which is crucial to ensure optimal performance.

Although it is no easy task, the implementation of Lepskiĭ-type stopping rule has been used in the literature as a recipe for adaptive procedures, e.g. De Vito et al (2010) and the references therein. What interests us particularly in the present context is the fact that we have to deal with the problem of adaptation to the unknown characteristic function appearing in the denominator of the stochastic error. A behavior which also occurs in the deconvolution problem, e.g. Neumann and Hössjer (1997), Comte and Lacour (2011), Dattner et al (2016) In our case, the unknown characteristic function in the denominator leads to the stochastic error behaving irregularly. In order to apply Lepskiĭ’s rule, it is crucial to overcome this irregular behavior.

The main contribution of the present work is to construct adaptive estimators and extend the minimax result obtained in Papagiannouli (2020). We provide a remedy for the irregular behavior of the stochastic error. The unknown characteristic function in the denominator leads to a U-shaped stochastic error. This behavior prevents us from applying Lepskiĭ’s rule. So it is crucial to find an index for the oracle start of our parameter. As a result, a monotonically increasing bound for the stochastic error is constructed. Finally, the convergence rate of the adaptive estimator is proven to be near-minimax.

The remainder of the paper is organized as follows. Section 2 provides general results for the uniform control of the deviation of the empirical characteristic function on \( {\mathbb {R}}^{2}\), so that it also can be read as an independent contribution. Section 3 introduces Lepskiĭ’s strategy for devising a stopping rule algorithm for the parameter U. In Sect. 4, we present theoretical guarantees for the adaptive estimation. Hence, we are able to construct a monotonically increasing upper bound for the stochastic error. In Sect. 5, we devise a balancing principle for the optimal choice of U and present the convergence rates of the adaptive estimator. Section 6 summarizes the results. A short illustration of the behavior of the estimator and stopping rules is then provided in Sect. 6 by means of empirical simulations from synthetic data. Finally, proofs for Sect. 2 are given in Sect. 7.

2 Estimating the characteristic function

Here, we discuss technical tools which provide a uniform control of the deviations of the empirical characteristic function on \( {\mathbb {R}}^{2}\). The interesting point here is that the decay of the characteristic function is not assumed to be explicitly known but comes in by implication. To keep the exposition intuitive and free from technicalities, the proofs of lemmas have been postponed to Sect. 7. Throughout this section, we use the letter C to denote a constant that may change from line to line.

For the sake of keeping the calculations simple, we will restrict ourselves to estimating the characteristic function on the diagonal. For this purpose, let us introduce the following definition.

Definition 2.1

We define the subsets of the diagonal as

$$\begin{aligned} \begin{aligned}&{\mathcal {A}} := \{\mathbf{u } \in {\mathbb {R}}^{2}: \mathbf{u } = (U, U), U \in {\mathbb {R}}\}\\&\tilde{{\mathcal {A}}} := \{\tilde{\mathbf{u }} \in {\mathbb {R}}^{2}: \tilde{\mathbf{u }} = (U, -U), U \in {\mathbb {R}} \}. \end{aligned} \end{aligned}$$

Let a probability space \( (\Omega , {\mathcal {F}},\left( {\mathcal {F}}_{t}\right) _{t\ge 0}, {\mathbb {P}})\) be given. We assume that \(\mathbf{X }_{t} = (X^{(1)}, X^{(2)}) \) is a bivariate Lévy process observed at n equidistant time points \( \Delta , \ldots ,n \Delta = T \), where \( \Delta = \frac{i}{n} \) for \( i = 1, \ldots , n \) and \( T = 1 \). We denote by

$$\begin{aligned} {\widehat{\phi }}_{n}(\mathbf{u }) := \frac{1}{n}\sum _{j= 1}^{n} e^{i\langle \mathbf{u }, \Delta _{j}^{n}\mathbf{X }\rangle } \end{aligned}$$

the empirical characteristic function, and by

$$\begin{aligned} \sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) ):= \frac{1}{\sqrt{n}}\bigg (\sum _{j= 1}^{n} e^{i\langle \mathbf{u }, \Delta _{j}^{n}\mathbf{X }\rangle } - {\mathbb {E}}[e^{i\langle \mathbf{u }, \mathbf{X }_{1/n}\rangle }]\bigg ) \end{aligned}$$
(3)

the normalized empirical characteristic function process, where \( \mathbf{u } \in {\mathcal {A}}\). For an appropriate weight function \( w:{\mathbb {R}} \rightarrow (0, 1]\), we consider

$$\begin{aligned} {\mathbb {E}}\Vert \sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }))\Vert _{L_{\infty }(w)}:={\mathbb {E}}\sup _{\mathbf{u }\in {\mathcal {A}}}\big \{\sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }))w(U)\big \}. \end{aligned}$$
(4)

Recall that \(\sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) ) \) converges weakly to a Gaussian process if and only if \( \big \{\mathbf{x }\rightarrow e^{i\langle \mathbf{u }, \mathbf{x }\rangle }, \mathbf{u } \in {\mathcal {A}} \big \} \) is a functional Donsker class for \( {\mathbb {P}}\).

We start by defining a weight function that was introduced in Neumann and Reiß (2009) and is the key for the uniform convergence of the empirical characteristic function.

Definition 2.2

For some \( \delta >0 \), let the weight function w be defined as

$$\begin{aligned} w(U) := \big (\log (e + |U|)\big )^{-\frac{1}{2}-\delta }. \end{aligned}$$

The above definition is meaningful under the following, rather general assumption concerning the characteristic function.

Assumption 1

There is a function g which is non-decreasing on \( {\mathbb {R}}^{-} \) and non-increasing on \( {\mathbb {R}}^{+} \). There exist positive constants C and \( C'\), such that

$$\begin{aligned} \begin{aligned}&\forall \mathbf{u } \in {\mathcal {A}}: C g (U)\le |\phi _{n}(\mathbf{u })|\le C'g(U)\\&\forall \tilde{\mathbf{u }} \in \mathcal {\tilde{A}}: C g (U)\le |\phi _{n}(\tilde{\mathbf{u }})|\le C'g(U). \end{aligned} \end{aligned}$$

Some remarks are in order here: The following cases may be considered for the characteristic function.

(a):

Gaussian decay Under some boundedness condition for the covariance matrix and the activity of jumps, we can prove that

$$\begin{aligned} |\phi _{n}(\mathbf{u })|\ge e^{-\frac{CU^{2}}{2n}}, \qquad {{ \forall \mathbf{u }\in {\mathcal {A}} }}. \end{aligned}$$
(b):

Exponential decay Here,the characteristic function \( \phi _{n} \) decays at most exponentially, that is, for some \( a>0 \), \( C>0 \),

$$\begin{aligned} |\phi _{n}(\mathbf{u })|\ge C e^{-a|U|/n}, \qquad {{ \forall \mathbf{u }\in {\mathcal {A}} }}. \end{aligned}$$

Examples of distributions with this property include normal inverse Gaussian and generalized tempered stable distributions.

(c):

Polynomial decay In this case the characteristic function satisfies for some \( \beta \ge 0 \), \( C> 0\),

$$\begin{aligned} |\phi _{n}(\mathbf{u })|\ge C(1+|U|)^{-\beta /n}, \qquad {\forall \mathbf{u }\in {\mathcal {A}}}. \end{aligned}$$

Typical examples for this property are the compound Poisson distribution, gamma distribution, and variance gamma distribution. Contrary to the properties formulated above, our reasoning does not rely on any semiparametric assumption about the shape of the characteristic function. The only thing needed is the quasi-monotonicity of Assumption 1 which is fairly general.

We receive the following result, extending Theorem 4.1 of Neumann and Reiß (2009) in two dimensions.

Theorem 2.3

Suppose that \( (X_{t})_{t\in {\mathbb {N}}} \) are i.i.d. random vectors in \( {\mathbb {R}}^{2} \) with \( {\mathbb {E}}|X_{1}|^{2+\gamma }<\infty \) for some \( \gamma >0 \), and let the weight function w be defined as in Definition 2.2. Then

$$\begin{aligned} \sup _{n\ge 1}{\mathbb {E}}\Vert \sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) )\Vert _{L_{\infty }(w)} <\infty . \end{aligned}$$

Let us mention that the logarithmic decay of the weight function w is in accordance with the well-known results of Csörgő and Totik (1983), where

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\sqrt{n}({\widehat{\phi }}_{n}((T_{n}, T_{n})) - \phi _{n}((T_{n}, T_{n}) ) = 0 \end{aligned}$$

almost surely on intervals \( [-T_{n}, T_{n}] \) whenever \( \log T_{n}/n \rightarrow \infty \). We are now ready to prove a uniform bound for the deviation of the empirical characteristic function from the true one. First, we establish a Talagrand inequality using Lemma A.2 from “Appendix A”.

Lemma 2.4

Let \( {\mathcal {U}}\) be some countable index set. Then for arbitrary \( \epsilon >0 \), there are positive constants \( c_{1}, c_{2} = c_{2}(\epsilon ) \), such that for every \( \kappa >0 \) we obtain

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\bigg [\sup _{\mathbf{u }\in {\mathcal {U}}}|{\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u })|&\ge (1+\epsilon ){\mathbb {E}}\big [\sup _{\mathbf{u }\in {\mathcal {U}}} |{\widehat{\phi }}_{n}(\mathbf{u })- \phi _{n}(\mathbf{u })|\big ] +\kappa \bigg ]\\&\le 2 \exp \bigg (-n\bigg (\frac{\kappa ^{2}}{c_{1}}\wedge \frac{\kappa }{c_{2}}\bigg )\bigg ). \end{aligned} \end{aligned}$$

Now we introduce a logarithmic factor which is essential to proving uniformness on the diagonal. This comes at the cost of losing a logarithmic factor.

Lemma 2.5

Let \( t>0 \) be given, and \( {\mathcal {A}} \) defined as in Definition 2.1. Then, for arbitrary \( \beta >0 \), there exists a constant C, such that we have

$$\begin{aligned} {\mathbb {P}}\bigg [\exists \mathbf{u }\in {\mathcal {A}}: |{\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) |\ge t \bigg (\frac{\log {n}}{n}\bigg )^{1/2}( w(U))^{-1}\bigg ]\le Cn^{-\frac{(t-\beta )^{2}}{c_{1}}}, \end{aligned}$$

where the constant C depends on \( \delta \) appearing in Definition 2.2 and \( c_{1} \) is the constant in Talagrand’s inequality from Lemma 2.4.

The statement of Lemma 2.5 holds for \(\tilde{\mathbf{u }} \in \tilde{A}\). A direct consequence of Lemma 2.5 is that we can consider a favorable set for the deviation of the empirical characteristic function from the true one.

Lemma 2.6

For some \( p\ge 1/2 \) and \( \kappa \ge 4(\sqrt{pc_{1}} +\beta ) \), let us consider the event

$$\begin{aligned}&{\mathcal {E}} := \bigg \{\forall \mathbf{u } \in {\mathcal {A}}: |{\widehat{\phi }}_{n}(\mathbf{u }) -\phi _{n}(\mathbf{u })|\le \frac{\kappa }{4} \bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}\bigg \}.\\&\tilde{{\mathcal {E}}} : = \bigg \{\forall \tilde{\mathbf{u }} \in \tilde{{\mathcal {A}}}: |{\widehat{\phi }}_{n}(\tilde{\mathbf{u }}) -\phi _{n}(\tilde{\mathbf{u }})|\le \frac{\kappa }{4} \bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}\bigg \}. \end{aligned}$$

Thus, we have

$$\begin{aligned} {\mathbb {P}}\bigg [{\mathcal {E}}^{\complement }\bigg ]\le Cn^{-p} \quad \text{ and } \quad {\mathbb {P}}\bigg [\tilde{{\mathcal {E}}}^{\complement }\bigg ]\le Cn^{-p}. \end{aligned}$$

Lemmas 2.5 and 2.6 hold for \( \tilde{\mathbf{u }} \in \mathcal {\tilde{A}} \) as well.

2.1 Truncated characteristic function

Here we present an extension of Lemma 2.1 in Neumann and Hössjer (1997), which renders the point-wise control of the characteristic function in the denominator uniform on sets \( {\mathcal {A}} \). Now, we briefly discuss the idea of a truncated characteristic function presented in detail in Neumann and Hössjer (1997). It is clear that the characteristic function \( \phi _{n}(\mathbf{u }) \) can be estimated at each point \( \mathbf{u }= (U, U) \) with the rate \( n^{-1/2} \). Hence, \( {\widehat{\phi }}_{n}(\mathbf{u }) \) is a reasonable estimator of \( \phi _{n}(\mathbf{u }) \), if \( |\phi _{n}(\mathbf{u })|\gg n^{-1/2}\). The idea is to cut off the frequencies \( \mathbf{u }\), for which \( |\phi _{n}(\mathbf{u }) |\le n^{-1/2}\).

First, we recall the key Lemma 2.1 from Neumann and Hössjer (1997):

Lemma 2.7

It holds that, for any \( p\ge 1\),

$$\begin{aligned} {\mathbb {E}}\bigg ( |\frac{1}{{\tilde{\phi }}_{n}(u)}-\frac{1}{\phi _{n}(u)}|^{2p}\bigg )\le C\bigg (\frac{1}{|\phi _{n}(u)|^{2p}}\wedge \frac{n^{-p}}{|\phi _{n}(u)|^{4p}}\bigg ), \end{aligned}$$

where \( \frac{1}{{\tilde{\phi }}_{n}(u)}:= \frac{\mathbb {1}\big (|{\widehat{\phi }}_{n}(u)|\ge n^{-1/2}\big )}{{\widehat{\phi }}_{n}(u)}\).

Neumann’s result is for \( p=1 \), but the extension to any p is straightforward. See also Neumann and Reiß (2009). The global threshold must be formulated in terms of \( {\widehat{\phi }}_{n}(\mathbf{u }) \), so that the compact set is in fact random. The main difference with Neumann’s truncated estimator lies in the fact that we introduce an additional logarithmic factor in the thresholding scheme. This logarithmic factor allows us to derive exponential inequalities, as we saw in Lemma 2.5.

Definition 2.8

Let the weight function w be given like in Definition 2.2. For some positive constant \( \kappa \), set

$$\begin{aligned} \frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}:= {\left\{ \begin{array}{ll} \frac{1}{{\widehat{\phi }}_{n}(\mathbf{u })}, &{} \text {if } |{\widehat{\phi }}_{n}(\mathbf{u })|\ge \kappa _{n}(U)n^{-1/2}, \\ \frac{1}{\kappa _{n}(U)n^{-1/2}}, &{} \textit{otherwise} \end{array}\right. } \end{aligned}$$
(5)

where \( \kappa _{n}(U) := \frac{\kappa }{2} (\log n)^{1/2}(w(U))^{-1} \).

We can now use Lemma 2.5 to assess the deviation of \( \frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })} \) from \( \frac{1}{\phi _{n}(\mathbf{u })} \).

Lemma 2.9

Suppose that for some \( p\ge 1/2 \) and \( \beta >0 \), we have \( \kappa \ge 2(\sqrt{pc_{1}} + \beta ) \), where \( c_{1} \) is the constant in Talagrand’s inequality. Then, for \( n>0 \) and a positive constant C, there is \(\mathbf{u }\in {\mathcal {A}} \) such that

$$\begin{aligned} {\mathbb {P}}\bigg [\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })} - \frac{1}{\phi _{n}(\mathbf{u })}\right|^{2} > \bigg (\frac{9\kappa ^{2}}{16}\frac{\log n(w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}\wedge \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}\bigg )\bigg ]\le C n^{-p}. \end{aligned}$$

We are now in position to formulate a uniform bound on the diagonal, which is an immediate result of Lemma 2.9.

Lemma 2.10

If the assumptions of Lemma 2.9 hold, then there is a constant \( C>0 \) depending on \( \kappa \), such that for \( n\ge 1 \)

$$\begin{aligned} {\mathbb {E}}\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}-\frac{1}{\phi _{n}(\mathbf{u })}\right|^{2} \bigg (\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}\wedge \frac{1}{|\phi _{n}(\mathbf{u })|^{2}}\bigg )^{-1}\bigg ]\le C. \end{aligned}$$

Also, Lemma 2.10 can be extended to powers different from 2. We just need to substitute 2 with 2q.

Note that an intermediate consequence of the preceding Lemma 2.9 is the following important corollary, which allows us to interchange between the empirical characteristic function and the true one with high probability.

Corollary 2.11

In the situation of the preceding statement, we have

$$\begin{aligned} {\mathbb {P}}\bigg [\exists \mathbf{u } \in {\mathcal {A}}: \left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}-\frac{1}{\phi _{n}(\mathbf{u })} \right|>\frac{1}{2}\left|\frac{1}{\phi _{n}(\mathbf{u })}\right|\bigg ]\le C n^{-p}. \end{aligned}$$

It is in fact this version of the statement which will play an important role below. On the complement of the preceding event, we have with high probability

$$\begin{aligned} \begin{aligned} -\frac{1}{2} \left|\frac{1}{\phi _{n}(\mathbf{u })}\right|&\le -\left|\frac{1}{\phi _{n}(\mathbf{u })}\right|+\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}\right|\le \frac{1}{2}\left|\frac{1}{\phi _{n}(\mathbf{u })}\right|\\ \frac{1}{2} \left|\frac{1}{\phi _{n}(\mathbf{u })}\right|&\le \left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}\right|\le \frac{3}{2}\left|\frac{1}{\phi _{n}(\mathbf{u })}\right|. \end{aligned} \end{aligned}$$
(6)

The statement of Lemma 2.11 and the above inequality hold for \( \tilde{\mathbf{u }} \in \tilde{A}\).

3 Adaptive parameter estimation

After recalling the statistical model, in this section we discuss the goal of this study. We aim to extend the minimax theory, from Papagiannouli (2020), to an adaptation theory for the covariance estimator.

3.1 Statistical model

We observe a two-dimensional Lévy process \((\mathbf{X }_{t_{i}})_{t_{i}\ge 0}\) for \( i = 0,1,\ldots ,n \) at equidistant time points \(0 = t_{0}<t_{1}<\ldots <t_{n}, \) where \( t_{i} = \frac{i}{n} \). We consider the characteristic function (1) on the diagonal, i.e., \( \mathbf{u }_{n}= (U_{n}, U_{n}) \), with characteristic triplet \( (\mathbf{b }, C, F) \) with drift part \(\mathbf{b } \in {\mathbb {R}}^{2}\), covariance matrix , and jump measure \(F \in {\mathcal {P}}({\mathbb {R}}^{2}) \).

In what follows, we are in a nonparametric setting in which the process \( \mathbf{X }_{t_{i}} \) belongs to the class \( {\mathcal {L}}^{r}_{M}\). Let us now recall this class.

Definition 3.1

For \(M>0\) and \( r\in [0,2) \), we define the class \({\mathcal {L}}^{r}_{M}\), the set of all Lévy processes, satisfying

$$\begin{aligned} \Vert C\Vert _{\infty }+\int _{{\mathbb {R}}^{2}}\left( 1\wedge |x_{1}x_{2}|^{r/2}\right) F(dx_{1},dx_{2}) < M, \end{aligned}$$
(7)

where \( \Vert C\Vert _{\infty } = \max (C^{11}+C^{12}, C^{21}+C^{22}) \) is the maximum of the row sums. In the second term r refers to the co-jump activity index of the jump components.

For details and examples concerning this class we refer to Section 3 in Papagiannouli (2020), where a minimax estimator for the covariance \( C^{12} \) is available. In addition, Jacod and Reiß (2014) provide a minimax estimator for the marginals, i.e. \( C^{11}, C^{22} \). Given the empirical characteristic function of the increments \( \Delta \mathbf{X }_{j} =\mathbf{X }_{j/n}- \mathbf{X }_{(j-1)/n} \)

$$\begin{aligned} {\widehat{\phi }}_{n}(\mathbf{u }_{n}) := \frac{1}{n}\sum _{j = 1}^{n} e ^{i\langle \mathbf{u }_{n}, \Delta _{j}^{n}\mathbf{X }\rangle }, \quad { \mathbf{u }_{n} \in {\mathbb {R}}^{2}} \end{aligned}$$

a spectral estimator is used:

$$\begin{aligned} \widehat{C}^{12}_{n}(U_{n})=\frac{n}{2U_{n}^{2}}\left( \log \vert {\hat{\phi }}_{n}(\tilde{\mathbf{u }}_{n})\vert \mathbb {1}({\hat{\phi }}_{n}(\tilde{\mathbf{u }}_{n})\ne 0)- \log \vert {\hat{\phi }}_{n}(\mathbf{u }_{n})\vert \mathbb {1}({\hat{\phi }}_{n}(\mathbf{u }_{n})\ne 0)\right) , \end{aligned}$$

where \(\mathbf{u }_{n}= (U_{n}, U_{n}) \), \( \tilde{\mathbf{u }}_{n} = (U_{n}, -U_{n})\).

A bias-variance type decomposition for the estimation error is available by Lemma 6.1 in Papagiannouli (2020). We recall the Lemma without the proof.

Lemma 3.2

The error bound for the estimation satisfies

$$\begin{aligned} |\widehat{C}^{12}_{n}(U_{n}) - C^{12}|\le |H_{n}(U_{n})|+ |D(U_{n})|, \end{aligned}$$
(8)

where

$$\begin{aligned} D(U_{n})=\frac{n}{2U^{2}_{n}}\bigg (\log \vert \phi _{n}(\tilde{\mathbf{u }}_{n})\vert -\log \vert \phi _{n}(\mathbf{u }_{n})\vert \bigg )-C_{12}, \end{aligned}$$
(9)

and on the set \(\big \{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }}_{n})\ne 0 \quad \text{ and } \quad {\widehat{\phi }}_{n}(\mathbf{u }_{n})\ne 0\big \}\)

$$\begin{aligned} H_{n}(U_{n})=-\frac{n}{2U^{2}_{n}}\bigg (\log \left|\frac{\phi _{n}(\tilde{\mathbf{u }}_{n})}{\phi _{n}(\mathbf{u }_{n})}\right|-\bigg (\log \left|\frac{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }}_{n})}{{\widehat{\phi }}_{n}(\mathbf{u }_{n})}\right|\bigg )\bigg ). \end{aligned}$$
(10)

\( H_{n}(\cdot ), D(\cdot ) \) are the corresponding stochastic and deterministic errors.

The spectral estimator \( \widehat{C}^{12}_{n}(U_{n}) \) achieves minimax rates for the optimal parameter \( U_{n} \). By Theorem 4.2 in Papagiannouli (2020), for \( r\in [0,2) \), M defined as in Definition 3.1 and for every \( 0 < \eta \le 1 \), there is a constant \( A_{\eta }>0 \), and \( N_{\eta } \) such that for every \( n\ge N_{\eta } \)

$$\begin{aligned} {\mathbb {P}}\Big [ |\widehat{C}^{12}_{n}(U_{n}) - C^{12}|\le w_{n}A_{\eta }\Big ]\ge 1-\eta , \end{aligned}$$
(11)

where

$$\begin{aligned} w_{n} = {\left\{ \begin{array}{ll} n^{-1/2} &{} \text {if }r\le 1 \\ (n\log n)^{\frac{r-2}{2}} &{} \text {if }r>1 \end{array}\right. } \end{aligned}$$
(12)

are the minimax rates for the optimal parameter

$$\begin{aligned} U_{n} = {\left\{ \begin{array}{ll} \sqrt{n} &{} r\le 1 \\ \frac{\sqrt{(r-1)n\log n}}{\sqrt{M}} &{} r>1.\end{array}\right. } \end{aligned}$$
(13)

The error bound incurred by the spectral estimator in Lemma 3.2 is the sum of two terms, i.e., the deterministic and stochastic error, with respect to the tuning parameter \( U_{n}\). The stochastic error displays behavior opposite to the deterministic error. The stochastic error tends to explode, however the deterministic error tends to zero as \( U_{n} \) grows. This observation and the fact that \( U_{n} \) depends on unknown parameters (rM) impose the need for a-posteriori choices of the parameter \(U_{n} \), which ideally are optimal in a well-defined sense. The goal is to derive a theoretical error bound for the adaptive estimator achieving almost the optimal rates.

3.2 Lepskiĭ’s stopping rule

In this section, we establish an adaptive choice for the parameter \( U_{n}\), as this is achieved by Lepskiĭ’s principle. Following Lepskiĭ’s principle, a “stopping” rule is designed to achieve adaptation for a class of minimax estimators. We use the following conventions for the notations. We denote by \( {\mathcal {U}}\) the parameter space. We consider a suitable finite discretization \(U_{0}<\ldots < U_{K} \) for our parameter. We set \( \widehat{C}^{12}_{n, j}:= \widehat{C}^{12}(U_{j}) \), i.e., we assign an estimator \( \widehat{C}^{12}_{n,j} \) for each \( U_{j} \). For each estimator \( \widehat{C}^{12}_{n,j} \) we set \( s_{n}(U_{j})\) to be the upper bound of the stochastic error \({\mathbb {E}}|H_{n}(U_{j})|\) for \( j= 0,1,\ldots , K \). Starting from a family of rate asymptotically minimax estimators \( \big \{\widehat{C}^{12}_{n}(U_{n})\big \} \), how can one get adaptation over the parameter space \( {\mathcal {U}}\), to find an optimal tuning parameter \( U_{j} \), which provides simultaneously minimax rates for the covariance over the sets \( [U_{0}, U_{K}]\subset {\mathcal {U}} \)?

Remark 3.3

In this paper we refer to the value \( U_{n} \) as the best choice and to the corresponding rate as the best possible rate. The rate will be optimal in a minimax sense since the bound we started from is tight (11). We refer to the value \( U_{bal} \) as the choice which balances the stochastic and deterministic error.

Let us first give a brief and simplified account of the classical Lepskiĭ method adjusted to our problem. We use the results in Section 5.4 of Reiß (2012). The key idea is to test real-valued estimators \(\widehat{C}^{12}_{n, 1}, \widehat{C}^{12}_{n, 2}, \dots , \widehat{C}^{12}_{n, j} \), whose stochastic errors are increasing as the index is increasing and the bias is decreasing, for the hypotheses \( H_{j}: \widehat{C}^{12}_{n, 1} = \widehat{C}^{12}_{n, 2}= \cdots = \widehat{C}^{12}_{n, j}\). If we accept \( H_{1}, H_{2}, \dots ,H_{j} \), this means that \( \widehat{C}^{12}_{n, j+1}\) differs significantly from \( \widehat{C}^{12}_{n, 1} , \widehat{C}^{12}_{n, 2}, \dots ,\widehat{C}^{12}_{n, j}\) so we reject \( H_{j+1} \). Further, we set \( \widehat{j} = j\). We summarize the above discussion in the following definition.

Definition 3.4

We choose a suitable finite discretization \( U_{0}<\ldots <U_{K} \) and take \( \infty>s_{n}(U_{K})>s_{n}(U_{K-1})>\ldots > s_{n}(U_{0})\), given some large enough constant K. We define the Lepskiĭ principle as

$$\begin{aligned} {\hat{j}} = \inf \Big \{j = 0, 1, \dots , K-1 \quad \exists k\le j: d\left( \widehat{C}^{12}_{n, j+1}, \widehat{C}^{12}_{n, k}\right) \le 4s_{n}(U_{j+1})\Big \}, \end{aligned}$$
(14)

where d is the Euclidean distance.

Heuristically, we want a rule so as the stochastic error will dominate the bias. We iterate the above stopping rule using the following algorithm.

figure a

\( \widehat{j} \) is the smallest index for which the stochastic error dominates the deterministic error. We observe that Lepskii’s strategy for parameter choice uses pairwise comparison of the estimators. By triangle inequality, Lemma 3.2, and monotonicity of the deterministic and stochastic error, we get for \( i, j \in \{0,\ldots ,K\}\) and \( i\le j \)

$$\begin{aligned} \begin{aligned} d(\widehat{C}_{n,i}^{12}, \widehat{C}_{n,j}^{12})&\le d(\widehat{C}_{n,i}^{12}, C^{12}) + d(\widehat{C}_{n,j}^{12}, C^{12})\\&\le d(U_{i}) +s_{n}(U_{i}) +d(U_{j}) + s_{n}(U_{j})\\&\le 4 s_{n}(U_{j}), \end{aligned} \end{aligned}$$
(15)

where \( d(\cdot ) \) is the upper bound for the deterministic error and \( s_{n}(\cdot ) \) is the upper bound for the stochastic error. The bound for the deterministic error has the form

$$\begin{aligned} d(U_{i}) = \frac{M2^{r/2}}{U_{i}^{2-r}} \end{aligned}$$

and depends on the co-jump activity index \( r\in (0,2] \) and the constant M from 3.1. Clearly, the deterministic error is monotonically non-increasing while the index i increases. We also need to ensure that the bound for the stochastic error is monotonically non-decreasing in order to be able to use Lepskii’s principle 14.

We aim to use the stochastic error for Lepskii’s principle instead of the deterministic error because the latter is dependent on a co-jump activity index r unknown to us. The stochastic error, on the other hand, depends on the characteristic function of a two-dimensional Lévy process, which might also be unknown. Yet, we can overcome this obstacle by exploiting the results of Sect. 2. As a result, we are able interchange with high probability between the theoretical \( s_{n}(U_{j}) \) and the empirical bound \( \tilde{s}_{n}(U_{j}) \) for the stochastic error.

This method enables us to construct an adaptive estimator using a Lepskii-type principle based on a data-dependent bound, i.e. \( \tilde{s}_{n}(\cdot ) \), on the interval \( {\mathcal {U}} = [U_{start}^{oracle}, U_{\max }] \). This is achieved via

$$\begin{aligned} U_{\widehat{j}} = \min \{U_{j}: \forall U_{j}\le U_{k}, U_{k}\in {\mathcal {U}}, |\widehat{C}^{12}_{n,j} - \widehat{C}^{12}_{n,k}|\le 6\tilde{s}_{n}(U_{k}) \}. \end{aligned}$$
(16)

Let us finally show that the adaptive estimator using (16) achieves almost the minimax convergence rates.

Theorem 3.5

For a sequence of parameters \( U_{j} \) which satisfies \( U_{j} \in [U_{start}^{oracle}, U_{\max }] \), there is a constant \( c \in (1/2, 1] \) such that the adaptive estimator satisfies

$$\begin{aligned} {\mathbb {P}}\bigg [ |\widehat{C}^{12}_{n, \widehat{j}} - C^{12}|\le 9 \frac{CU_{bal}^{-2}(n\log n)^{1/2}(w(U_{bal}))^{-1}}{|{\tilde{\phi }}_{n}(\mathbf{u }_{bal})|}\bigg ]\ge 1- a(n), \end{aligned}$$

where \( a(n) = 1-\exp (-\frac{1}{8}\big (c-\frac{1}{2}\big )^{2}n\big ) \), and \( \mathbf{u }_{bal} = (U_{bal}, U_{bal}) \).

Proof of Theorem 3.5 has been moved to Sect. 5, where we discuss the selection rule (16) in detail.

4 Analysis of the stochastic error

The main objective of the present section is to prove a high probability bound for the stochastic error. Observing the form of the stochastic error \( H_{n}\) in Assumption 10, it becomes clear that we need to control the empirical characteristic function in the denominator, which may lead to unfavorable behavior for the stochastic error. To overcome this problem we consider the results obtained in Sect. 2.

In comparison with other adaptive results obtained in Comte and Genon-Catalot (2010) and Comte and Lacour (2011), whose procedure depends on a semiparametric assumption concerning the decay of the characteristic function, our assumption introduces a threshold to ensure that the characteristic function guards large values and the estimator makes sense.

Lemma 4.1

Under the conditions of Lemma 2.5, the stochastic error satisfies, up to an absolute constant C,

$$\begin{aligned} {\mathbb {E}}[\mathbb {1}_{{\mathcal {E}}\cup \tilde{{\mathcal {E}}}}\cdot |H_{n}(U)|]\lesssim U^{-2}(n\log n)^{1/2}(w(U))^{-1}\bigg (\frac{1}{|\phi _{n}(\mathbf{u })|} \vee \frac{1}{|\phi _{n}(\tilde{\mathbf{u }})|}\bigg ). \end{aligned}$$
(17)

Proof

From Lemma 3.2 the stochastic error satisfies

$$\begin{aligned} \begin{aligned} |H_{n}(U)|&\le \frac{n}{2U^{2}}\left|\log |\frac{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }})}{\ {\widehat{\phi }}_{n}(\mathbf{u })}|-\log |\frac{\phi _{n}(\tilde{\mathbf{u }})}{\phi _{n}(\mathbf{u })}|\right|\\&= \frac{n}{2U^{2}}\left|\log |1+\frac{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }})- \phi _{n}(\tilde{\mathbf{u }})}{\phi _{n}(\tilde{\mathbf{u }})}|-\log |1+\frac{{\widehat{\phi }}_{n}(\mathbf{u })- \phi _{n}(\mathbf{u })}{\phi _{n}(\mathbf{u })}|\right|. \end{aligned} \end{aligned}$$
(18)

On the event \( {\mathcal {E}} \) and \( \tilde{{\mathcal {E}}} \) from Lemma 2.6, in the case that \( |\phi _{n}(\mathbf{u })|\ge \kappa _{n}n^{-1/2} \) and \( |\phi _{n}(\tilde{\mathbf{u }})|\ge \kappa _{n}n^{-1/2} \), it yields that \( \left|\frac{{\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u })}{\phi _{n}(\mathbf{u })}\right|\le \frac{1}{2} \) and \(\left|\frac{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }}) - \phi _{n}(\tilde{\mathbf{u }})}{\phi _{n}(\tilde{\mathbf{u }})}\right|\le \frac{1}{2} \). The above observations lead to

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[\mathbb {1}_{{\mathcal {E}}\cup \tilde{{\mathcal {E}}}}\cdot |H_{n}(U)|]&\le \frac{n}{2U^{2}}\bigg (\left|\frac{{\widehat{\phi }}_{n}(\tilde{\mathbf{u }})- \phi _{n}(\tilde{\mathbf{u }})}{\phi _{n}(\tilde{\mathbf{u }})} \right|+ \left|\frac{{\widehat{\phi }}_{n}(\mathbf{u })- \phi _{n}(\mathbf{u })}{\phi _{n}(\mathbf{u })}\right|\bigg )\\&\le \frac{Cn}{U^{2}} \bigg (\frac{\log {n}}{n}\bigg )^{1/2}(w(U))^{-1}\bigg (\frac{1}{|\phi _{n}(\mathbf{u })|}\vee \frac{1}{|\phi _{n}(\tilde{\mathbf{u }})|}\bigg ), \end{aligned} \end{aligned}$$
(19)

which concludes the proof. \(\square \)

Hence, everything boils down to controlling the unknown characteristic function in the denominator in a way that keeps the characteristic function large enough and enables a reasonable estimator. Using Lemma 2.11 and the inequality (6), we can substitute the unknown \( \frac{1}{|\phi _{n}(\mathbf{u })|} \) with \( \frac{1}{|{\tilde{\phi }}_{n}(\mathbf{u })|} \), which is data-dependent. Inserting inequality (6) into (18), we get the following high probability upper bound for the stochastic error

$$\begin{aligned} {\mathbb {E}}[\mathbb {1}_{{\mathcal {E}}\cup \tilde{{\mathcal {E}}}}\cdot |H_{n}(U)|]\le \frac{2C}{U^{2}}\left( n \log n\right) ^{1/2}(w(U))^{-1}\bigg (\frac{1}{|{\tilde{\phi }}_{n}(\mathbf{u })|} \vee \frac{1}{|{\tilde{\phi }}_{n}(\tilde{\mathbf{u }})|}\bigg ). \end{aligned}$$
(20)

Corollary 4.2

Under the conditions of Lemma 2.6, for any \( p> 1/2 \) there exists a positive constant C, such that, for all U, we have

$$\begin{aligned} {\mathbb {P}}\bigg [|H_{n}(U)|>\frac{C}{U^{2}}(n\log n)^{1/2}(w(U))^{-1}\bigg ]\lesssim n^{-p}. \end{aligned}$$
(21)

Proof

The proof is a consequence of Lemma 2.6 and 4.1 applying Markov inequality. \(\square \)

4.1 Oracle start for the parameter U

In order to apply a Lepskiĭ-type stopping rule, we need to ensure that the bound for the stochastic error is monotonically increasing. First we introduce some further notation.

4.1.1 Further notation

We write \( U_{start}^{oracle} \) as the staring point for the Lepskiĭ principle. By (13), we denote the optimal choice for the parameter, as \( U_{n} = \sqrt{\frac{r-1}{M}n\log n}\). We also denote as \(C_{sum} = \sum _{i,j} C_{ij}\), i.e. the sum of all elements of the covariance matrix.

We allow the bound for the stochastic error to depend either on the (possibly) unknown characteristic function or on the truncated empirical characteristic function. Since we can interchange w.h.p. between the true and empirical characteristic function, we use two different notations for the corresponding bounds of the stochastic error:

$$\begin{aligned}&s_{n}(U) := CU^{-2}(n\log n)^{1/2}(w(U))^{-1} \frac{1}{|\phi _{n}(\mathbf{u })|} \end{aligned}$$
(22)
$$\begin{aligned}&\tilde{s}_{n}(U):= CU^{-2}(n\log n)^{1/2}(w(U))^{-1}\frac{1}{|{\tilde{\phi }}_{n}(\mathbf{u })|}. \end{aligned}$$
(23)

We use these bounds for the stochastic error because it easy to check that \(\frac{1}{|{\tilde{\phi }}_{n}(\mathbf{u })|} \vee \frac{1}{|{\tilde{\phi }}_{n}(\tilde{\mathbf{u }})|} = \frac{1}{|{\tilde{\phi }}_{n}(\mathbf{u })|} \). In what follows, we occasionally use \( {\tilde{\phi }}_{n}(U) \) instead of \( {\tilde{\phi }}_{n}(\mathbf{u }) \) because we are estimating the characteristic function on the diagonal. Same rule applies for the function \( h(\mathbf{u }):=h(U,U) = 2\int _{{\mathbb {R}}^{2}}1-\cos (\langle \mathbf{u }, \mathbf{x }\rangle )F(dx)\).

Figure 1 illustrates the performance of the bound for the stochastic error using the bound \( \tilde{s}_{n}(U) \) and the stochastic error \( H_{n}(U) \), which is defined as in Assumption 10. We observe that the stochastic error is decreasing in the beginning and then it explodes. The occurrence of \( |{\widehat{\phi }}_{n}(\mathbf{u })|\) in the denominator might have unfavorable effects.

Fig. 1
figure 1

Vertical lines: (violet-dashed) \( U_{start}^{oracle} \); (orange-dashed) \( U_{n} \). Curves: (green) \( \tilde{s}_{n}(U) \)- values in the left-side \( y- \)axis; (purple) \( H_{n}(U) \)- values in the right-side \( y- \) axis. (Color figure online)

To obtain a possible remedy, we consider starting the Lepskiĭ procedure for a larger U and constructing a monotonically increasing bound for the stochastic error. Figure 1 depicts the above behavior.

We define the oracle start of U as follows:

$$\begin{aligned} U^{oracle}_{start} = \inf \bigg \{U>0 : |\phi _{n}(U)|\le \frac{1}{2}\bigg \}. \end{aligned}$$
(24)

Let us highlight the strategy of constructing a monotonically increasing bound for the stochastic error. Finding the oracle start of U, we show that \( U_{start}^{oracle}<U_{n} \). Then, we prove that \( s_{n}(U_{start}^{oracle}) < s_{n}(U_{n})\), ensuring that an increasing bound is available for the stochastic error, within the interval \( [U_{start}^{oracle}, U_{\max }] \) for \( U_{\max }>U_{n} \). The above discussion is depicted in Figure 1. It is worth emphasizing that the calculation of \( U_{start}^{oracle} \) requires the evaluation of the perhaps unknown \( \phi _{n}(\mathbf{u }) \). Thus, we take into consideration only a general assumption for the characteristic function, like the quasi-monotonicity of Assumption 1 for infinity variation co-jumps, i.e., \( r\in (1,2] \) and a boundedness condition for the covariance matrix.

Lemma 4.3

For big n, and \( K>0 \), the interval for \( U_{start}^{oracle} \) is

$$\begin{aligned} \bigg [ \frac{\sqrt{2\log 2}}{\sqrt{C_{sum} +K}}\cdot \sqrt{n}, \quad \frac{\sqrt{2\log 2}}{\sqrt{C_{sum}}}\cdot \sqrt{n}\bigg ] \end{aligned}$$

and for \( r\in (1,2] \), we get that

$$\begin{aligned} U_{start}^{oracle}< U_{n}, \end{aligned}$$

where \( U_{n} = \sqrt{\frac{r-1}{M}n\log n} \).

Proof

The absolute value of the characteristic function is given by

$$\begin{aligned} |\phi _{n}(\mathbf{u })|= \exp \bigg \{-\frac{1}{2n}\bigg (\langle C\mathbf{u }, \mathbf{u }\rangle + h(\mathbf{u })\bigg )\bigg \} , \end{aligned}$$
(25)

where \( \mathbf{u } = (U, U) \). We define

$$\begin{aligned} h(\mathbf{u }) = 2\int _{{\mathbb {R}}^{2}}1 -\cos (\langle \mathbf{u }, \mathbf{x }\rangle )F(d\mathbf{x }), \end{aligned}$$

where F is the Lévy measure in \( {\mathbb {R}}^{2}\). Using the Cauchy-Schwarz inequality for \( |\langle \mathbf{u }, \mathbf{x }\rangle |^{2}\le \Vert \mathbf{u }\Vert ^{2}\Vert \mathbf{x }\Vert ^{2}\), a positive constant K and \( v_{0} = (0,1)^{2} \subset {\mathbb {R}}^{2}\)

$$\begin{aligned} \begin{aligned} h(\mathbf{u })&= 2 \int _{{\mathbb {R}}^{2}}\Big (1 - \cos ( \langle \mathbf{u }, \mathbf{x }\rangle ) \Big )F(d\mathbf{x }) = 2 \int _{v_{0}} \Big (1 - \cos (\langle \mathbf{u }, \mathbf{x }\rangle )\Big )F(d\mathbf{x })\\&\quad + 2\int _{{\mathbb {R}}^{2}\setminus v_{0}} \Big (1 -\cos (\langle \mathbf{u }, \mathbf{x }\rangle \Big ) F(d\mathbf{x })\\&\le 2\int _{v_{0}}|\langle \mathbf{u }, \mathbf{x } \rangle |^{2}F(d\mathbf{x }) +4\int _{{\mathbb {R}}^{2}\setminus v_{0}}dF(\mathbf{x })\\&\le 4U^{2}\int _{v_{0}}\Vert x\Vert ^{2} dF(\mathbf{x }) +4 F({\mathbb {R}}^{2}\setminus v_{0})\\&\le KU^{2} \end{aligned} \end{aligned}$$
(26)

The last inequality derives from the fact that we always have \( \int _{{\mathbb {R}}^{2}} (1 \wedge \Vert \mathbf{x } \Vert ^{2} ) F(d\mathbf{x }) < \infty \). So we can obtain the following inequality

$$\begin{aligned} 0\le h(\mathbf{u })\le K U^{2}. \end{aligned}$$
(27)

It is easy to check that \( \langle C\mathbf{u }, \mathbf{u }\rangle = C_{sum}U^{2} \). Inserting this fact and (27) into (25) we get the following inequality for the absolute value of the characteristic function

$$\begin{aligned} \exp \bigg \{-\frac{(C_{sum}+K)U^{2}}{2n}\bigg \} \le |\phi _{n}(\mathbf{u })|\le \exp \bigg \{-\frac{C_{sum}U^{2}}{2n}\bigg \} \end{aligned}$$

Inserting the above inequality into (24) we get the required interval for \(U_{start}^{oracle}\), which ensures that \( U_{start}^{oracle}\sim \sqrt{n} \). This implies that \( U_{start}^{oracle} <U_{n}\) for big n. This concludes the proof. \(\square \)

Lemma 4.4

For \( U_{start}^{oracle}< U_{n} \) and n large enough, the stochastic error satisfies

$$\begin{aligned} s_{n}(U_{start}^{oracle})\le s_{n}(U_{n}) . \end{aligned}$$

Proof

It suffices to show that

$$\begin{aligned} \frac{s_{n}(U_{start}^{oracle})}{s_{n}(U_{n})}\le 1. \end{aligned}$$
(28)

By the form of \( s_{n}(U) \) in (22), it is easy to check that

$$\begin{aligned} \frac{s_{n}(U_{start}^{oracle})}{s_{n}(U_{n})} = \bigg (\frac{U_{n}}{U_{start}^{oracle}}\bigg )^{2}\frac{w(U_{n})}{w(U_{start}^{oracle})}\left|\frac{\phi _{n}(U_{n})}{\phi _{n}(U_{start}^{oracle})}\right|. \end{aligned}$$
(29)

By Definition 4.3, it yields \( \bigg (\frac{U_{n}}{U_{start}^{oracle}} \bigg )^{2}\le 1\). By Definition 2.2, we also know that w(U) is a decreasing function, which means that \( \frac{w(U_{n})}{w(U_{start}^{oracle})}>1 \). For the third term of (29) we have

$$\begin{aligned} \begin{aligned}&\left|\frac{\phi _{n}(U_{n})}{\phi _{n}(U_{start}^{oracle})}\right|= \exp \bigg \{\frac{1}{2n}\big (C_{sum}\left( (U_{start}^{oracle})^{2}- U_{n}^{2}\right) \big )\\&\quad +\frac{1}{2n}\big (C_{sum}\left( h(U_{start}^{oracle})- h(U_{n})\right) \big )\bigg \}.\\ \end{aligned} \end{aligned}$$
(30)

By (27) we have that \( h(U_{start}^{oracle}) -h(U_{n})\le h(U_{start}^{oracle}) \). We also get

$$\begin{aligned} (U_{start}^{oracle})^{2} - U_{n}^{2} \le n\bigg (\frac{2\log 2}{C_{sum}K} - \frac{r-1}{M}\log n\bigg ). \end{aligned}$$

Substituting the above inequalities into (30) we obtain

$$\begin{aligned} \begin{aligned} \left|\frac{\phi _{n}(U_{n})}{\phi _{n}(U_{start}^{oracle})}\right|&\le \exp \bigg \{\frac{C_{sum} \log 2}{C_{sum}+ K}-C_{sum}\frac{r-1}{2M}\log n + \frac{K\log 2}{C_{sum} +K}\bigg \}\\&= \exp \bigg \{\log 2 - C_{sum}\frac{r-1}{2M}\log n\bigg \} \\&= \frac{2}{n^{\frac{C_{sum(r-1)}}{2M}}}. \end{aligned} \end{aligned}$$
(31)

Taking everything into consideration we get

$$\begin{aligned} \frac{s_{n}(U_{start}^{oracle})}{s_{n}(U_{n})} \le C_{0} \frac{\log n}{n^{\frac{C_{sum(r-1)}}{2M}}} \end{aligned}$$

which is smaller than one as \( n\rightarrow \infty \). The statement is proved. \(\square \)

A side product of the above analysis is the following corollary, which ensures that the upper bound of the stochastic error is always monotonically increasing over the desired interval.

Corollary 4.5

If we set

$$\begin{aligned} s_{n}^{*}(u) := \sup _{U_{start}^{oracle}\le v \le u}s_{n}(v), \end{aligned}$$
(32)

then \( s_{n}^{*}\) satisfies

$$\begin{aligned} s_{n}(U_{n}) = s_{n}^{*}(U_{n}). \end{aligned}$$

Proof

We define the sets \( {\mathcal {U}} = [U_{start}^{oracle}, U_{n}] \) and \( {\mathcal {S}} = \{s_{n}(U) : U \in [U_{start}^{oracle}, U_{end}]\} \). \( {\mathcal {U}} \) is a non-empty set of \( {\mathbb {R}}\). \( U_{n} \) is the least upper bound of \( {\mathcal {U}} \). By the continuity of the stochastic error on the interval \( [U_{start}^{oracle}, U_{n}] \) and the extreme value theorem, we get that

$$\begin{aligned} \sup {\mathcal {S}} = \sup _{U_{start}^{oracle}\le U \le U_{n}}s_{n}(U) = s_{n}(U_{n}), \end{aligned}$$

which concludes the proof. \(\square \)

Despite the fact that we used the (possible) unknown theoretical characteristic function as a criterion for the oracle start of Lepskiĭ procedure and construct a monotonically increasing bound as we wish, it is useful to secure a data-driven criterion as well. For this reason we propose the following definition.

Definition 4.6

For \( c \in (0,1] \), we define the criterion for the oracle start of U as following

$$\begin{aligned} \widehat{U}_{start}^{oracle} := \inf \big \{U >0: |{\widehat{\phi }}_{n}(U)|\le c \big \}. \end{aligned}$$

The last ingredient which remains to be proven is the following high probability bound, which will allow us to connect a data-driven choice for the oracle start of the Lepskiĭ procedure with the theoretical characteristic function.

Lemma 4.7

For \( c \in (1/2,1] \), choosing \( \widehat{U}_{start}^{oracle} \) as in Definition 4.6, there is a high probability event \( \{|{\widehat{\phi }}_{n}(\widehat{U}_{start}^{oracle})|\le c\} \) satisfying

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb {P}}\big [|{\widehat{\phi }}_{n}\big (\widehat{U}_{start}^{oracle}\big )|\le c\big ] = 1, \end{aligned}$$

with probability at least \( 1-\exp (-\frac{1}{8}\big (c-\frac{1}{2}\big )^{2}n\big ) \).

Proof

The boundedness condition of the Hoeffding’s inequality is trivial for the random variables \( |e^{i\langle \mathbf{u }, Y_{j} \rangle }|\), where \( Y_{j} \) are Lévy increments. Hence, \( {\mathbb {P}}[|{\widehat{\phi }}_{n}(U_{start}^{oracle})|\le c - 1/2] = 1 - {\mathbb {P}}[|{\widehat{\phi }}_{n}(U_{start}^{oracle})|>c - 1/2] \). Applying Hoeffding’s inequality, we obtain

$$\begin{aligned} \begin{aligned} {\mathbb {P}}[|{\widehat{\phi }}_{n}(U)|>c]&\le {\mathbb {P}}\bigg [\frac{1}{n}\left|\sum _{j=1}^{n} e^{i \langle \mathbf{u }, Y_{j}\rangle } - {\mathbb {E}}e^{i\langle \mathbf{u }, Y_{j}\rangle }\right|>c\bigg ]\\&\le \exp \big (-\frac{1}{8}n(c-\frac{1}{2})^{2}\big ). \end{aligned} \end{aligned}$$

Inserting Definition 4.6 to the empirical characteristic function, the statement is proven. \(\square \)

5 Balancing principle when the stochastic error is data-dependent

In this section, we prove an upper bound for the best possible adaptive parameter using a balancing principle inspired by the work of De Vito et al (2010) for adaptive kernel methods. The optimal choice \( U_{n} \) crucially depends on the unknown parameters rM. Using a Lepskii-rule like in Sect. 3.2, we construct a completely data-driven estimation procedure adapted to \( U \in {\mathcal {U}}\), where \( {\mathcal {U}} = [U_{start}^{oracle}, U_{max}] \).Our main result for the adaptive estimation shows that the Lepskiĭ estimator achieves almost the optimal rates.

In the following we denote by \( a(n) :=1-\exp (-\frac{1}{8}\big (c-\frac{1}{2}\big )^{2}n\big ) \). By (23), with high probability, at least \( 1-a(n) \), the upper bound for the stochastic error will be of the form

$$\begin{aligned} \tilde{s}_{n}(U) = \frac{2C\gamma (n)}{\theta (U)}\frac{(w(U))^{-1}}{|{\tilde{\phi }}_{n}(U)|}, \end{aligned}$$
(33)

where \( \theta (U) = U^{2} \), \( \gamma (n) = (n\log n)^{1/2} \) and \( 0 <w(U) \le 1 \). Further, the term d(U) is the deterministic error bound, which does not depend on data and is of the form

$$\begin{aligned} d(U) = \frac{M2^{r/2}}{U^{2-r}}, \end{aligned}$$
(34)

where \( r \in (1,2] \) is the co-jump activity index and M is from Definition 3.1. Recall that inequality (6) allows us to interchange with high probability between the (perhaps) unknown characteristic function and the empirical characteristic function. A direct consequence of the above observation is that we can interchange with high probability between the empirical bound \( \tilde{s}_{n}(U) \) and the theoretical bound \( s_{n}(U) \) for the stochastic error

$$\begin{aligned} \frac{1}{2} s_{n}(U)\le \tilde{s}_{n}(U)\le 3 s_{n}(U). \end{aligned}$$
(35)

This leads to \( s_{n}(U) + d(U) \le 2 \tilde{s}_{n}(U) + d(U)\).

Consequently, the estimation error bound is given by the sum of two competing terms with probability at least \( 1- \exp (-2n(c+1)^{2}) \), i.e.,

$$\begin{aligned} |\widehat{C}^{12}_{n}(U) - C^{12}|\le s_{n}(U) + d(U) \le 2\tilde{s}_{n}(U)+ d(U). \end{aligned}$$
(36)

The upper bound of (36) is the sum of a bias term which decreases in U and a stochastic error which increases in U, for \( U \in {\mathcal {U}} \). According to the balancing principle, the best possible adaptive parameter choice is found by solving the bias-variance-type decomposition (36), which implies that we have to balance the deterministic and the stochastic error. We consider that \( U_{bal} \) makes the contribution of two terms equal, i.e. \( d(U_{bal}) = 2\tilde{s}_{n}(U_{bal}) \). We observe that the corresponding error estimate is, with probability at least \( 1 -a(n) \),

$$\begin{aligned} |\widehat{C}^{12}_{n,j}- C^{12}|\le 2d(U_{bal}) = 4\tilde{s}_{n}(U_{bal}), \end{aligned}$$
(37)

where \( 0< a(n) < 1 \) and \( U_{bal} \) is the best possible parameter. Let us now highlight the idea behind the balancing principle. It is clear, by the monotonicity of the stochastic and deterministic error, that

$$\begin{aligned} 2\tilde{s}_{n}(U_{bal})+ d(U_{bal})\le 2\min _{U}\{2\tilde{s}_{n}(U)+d(U)\}. \end{aligned}$$

If we choose \( U_{*}\le U_{bal} \):

$$\begin{aligned} 2\tilde{s}_{n}(U_{bal}) + d(U_{bal})\le 2 d(U_{bal})\le 2d(U_{*})\le 2\min _{U}\{2\tilde{s}_{n}(U)+d(U)\}. \end{aligned}$$

On the other hand, if we choose \( U_{*}\ge U_{bal} \):

$$\begin{aligned} \tilde{s}_{n}(U_{bal}) + d(U_{bal})\le 4 \tilde{s}_{n}(U_{bal})\le 4\tilde{s}_{n}(U_{*})\le 2\min _{U}\{2\tilde{s}_{n}(U)+d(U)\}. \end{aligned}$$

Driven by inequality (35), the strategy for the balancing principle will give us with high probability

$$\begin{aligned} \begin{aligned} \tilde{s}_{n}(U_{bal})&\le 2 \min _{U}\{2\tilde{s}_{n}(U)+d(U)\}\\&\le 2 \min _{U}\{6 s_{n}(U) +d(U)\}\\&\le 12 \min _{U}\{s_{n}(U)+d(U)\}\\&\le 12 s_{n}(U_{bal}). \end{aligned} \end{aligned}$$
(38)

The corresponding best parameter choice \( U_{bal} \) gives, with probability \( 1 - a(n) \), the rate

$$\begin{aligned} |\widehat{C}^{12}_{n, bal} - C^{12}|\le \frac{8C\gamma (n)}{\theta (U_{bal})}\frac{(w(U_{bal}))^{-1}}{{|{\tilde{\phi }}_{n}(U_{bal})|}}= 2d\left( U_{bal}\right) . \end{aligned}$$
(39)

The aim is to choose \( U_{bal} \) from the set:

$$\begin{aligned} {\mathcal {U}} := [U_{start}^{oracle}, U_{max}] \quad \text {with} \quad U_{start}^{oracle} \sim \sqrt{n}. \end{aligned}$$
(40)

To define a parameter strategy, we first consider a discretization for the possible values of \(U_{j} \), that is, an ordered sequence \( U_{j}\) such that the best value \( U_{bal} \) falls within the considered grid \( {\mathcal {U}} \). The balancing principle estimate for \( U_{bal}\) is defined via To define a parameter strategy, we first consider a discretization for the possible values of \(U_{j} \), that is, an ordered sequence such that the best value \( U_{bal} \) falls within the considered grid \( {\mathcal {U}} \). The balancing principle estimate for \( U_{bal}\) is defined via

$$\begin{aligned} U_{\widehat{j}} = \min \Bigg \{U_{j}: \forall U_{j}\le U_{k}, U_{k}\in (U_{start}^{oracle}, U_{\max }), |\widehat{C}^{12}_{n,j} - \widehat{C}^{12}_{n,k}|\le 6 \tilde{s}_{n}(U_{k}) \Bigg \}. \end{aligned}$$
(41)

The reasons why we expect this estimate to be sufficiently close to \( U_{bal} \) and why this estimate does not depend on the deterministic error, d, are better explained with the following argument. Observe that if we choose two indices \( \alpha , \beta \) such that \( U_{\alpha }\ge U_{\beta } \ge U_{bal} \), then with probability at least \( 1- a(n)\),

$$\begin{aligned} \begin{aligned} |\widehat{C}^{12}_{n,\alpha } - \widehat{C}^{12}_{n,\beta }|&\le |\widehat{C}^{12}_{n,\alpha } - C^{12}|+ |\widehat{C}^{12}_{n,\beta }- C^{12}|\\&\le 2\tilde{s}_{n}(U_{\alpha }) +d(U_{\alpha }) + 2\tilde{s}_{n}(U_{\beta }) + d(U_{\beta })\\&\le 3\tilde{s}_{n}(U_{\alpha }) + 3\tilde{s}_{n}(U_{\beta })\\&\le 6\tilde{s}_{n}(U_{\alpha }). \end{aligned} \end{aligned}$$
(42)

The intuition is that when such a condition is violated, we are close to the value which contributes equal to the deterministic and stochastic errors, which is \( U_{bal} \).

Theorem 3.5 shows that the value \( U_{\widehat{j}} \), given by balancing principle (41), provides the same estimation error of \( U_{bal} \) up to a constant. Note that all the inequalities in the following proofs are to be interpreted as holding with high probability. We can now prove the convergence rate for the adaptive estimator \( \widehat{C}^{12}_{n, \widehat{j}} \).

5.1 End proof of Theorem 3.5.

Let us introduce the parameter choice \( U_{*} \)

$$\begin{aligned} U_{*} = \min \{U_{j}\in {\mathcal {U}}: d(U_{j}) < 2\tilde{s}_{n}(U_{j})\}. \end{aligned}$$

By the definition of \( U_{\widehat{j}} \), we conclude that \( U_{\widehat{j}} \le U_{*} \) and thus, by the triangle inequality,

$$\begin{aligned} \begin{aligned} |\widehat{C}_{n,\widehat{j}}^{12} -C^{12}|&\le |\widehat{C}^{12}_{n, \widehat{j}} - \widehat{C}^{12}_{n, *}|+ |\widehat{C}^{12}_{n,*} - C^{12}|\\&\le 6\tilde{s}_{n}(U_{*}) + d(U_{*}) + 2\tilde{s}_{n}(U_{*})\\&\le 9 \tilde{s}_{n}(U_{*}). \end{aligned} \end{aligned}$$

The first inequality holds due to (41). Finally, by the monotonicity of \( \tilde{s}_{n}(\cdot ) \) and the fact that \( U_{*} \le U_{bal} \), we get that \( \tilde{s}_{n}(U_{*}) \le \tilde{s}_{n}(U_{bal})\). Note the above inequality is uniform with respect to \( U_{j} \) due to Lemma 2.10. The proof is now complete.

6 Numerical experiments

In this section we test the behavior of the covariance estimator in order to adapt the parameter U, i.e., the frequency for estimating the covariance. This means that we first have to simulate a bivariate Lévy process on [0, 1] . We will draw our observations from a process \( X_t = B_t + J_t\), where \( X_t \) is a superposition of a two-dimensional Brownian motion \( B_{t} \) and a two-dimensional jump process \( J_{t} \). Its jumps are driven by a two-dimensional \( r_{i} \)-stable process for \( i = 1,2 \) where \( r_{i}\in (0,2] \). \( X_t \) thus models a process with both diffusion and jump components. We assume the covariance matrix has the form . In each run of our simulation, we will generate \( n = 1,000 \) observations, corresponding to observations taken every 1/1, 000 over a time interval [0, 1] and \( U_{i} \in [0.1, 50]\) for \( i \in \{1, 2, \dots , 500\} \).

We conduct several experiments for U, using different choices for the jump index activity. We start with jumps of finite variation, i.e., \( r_{i}\in [0.1, 0.9] \), then we continue with jumps of infinite variation, i.e., \( r_{i} \in [1.1, 1.8] \). In the following Figs. 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, and 13a, we plot the empirical characteristic function \({\widehat{\phi }}_{n}(U_{i}) \), the real and positive part of \( {\widehat{\phi }}_{n}(\tilde{U}_{i}) \), \( \log |\phi _{n}(U_{i})|\), \( \log |\phi _{n}(\tilde{U}_{i})|\), and \( \log |\phi _{n}(\tilde{U}_{i})|- \log |\phi _{n}(U_{i})|\) against the parameter for adaptation \( U_{i}\).

Figures 2b, 3b, 4b, 5b show that the estimator is consistent to the true value when \( U_{i} \) ranges from around 5 until 30. Recall that the true value is \( C^{12} = 1 \). As expected the behavior of the estimator is quite erratic in the beginning and at the end of the interval, because the bias of the estimator is quite high. In principle, we have found that the “optimal” stopping index is for \( U_{i} = 30\).

Fig. 2
figure 2

a Indices of jump activity: \( r_{1} = 0.2, r_{2} = 0.1\), b Covariance estimator

Fig. 3
figure 3

Left plots: empirical characteristic function with FV of jumps; Right plots: covariance estimator

Fig. 4
figure 4

a Indices of jump activity: \( r_{1} = 0.5, r_{2} = 0.4 \), b Covariance estimator

Fig. 5
figure 5

a Indices of jump activity: \( r_{1} = 0.6, r_{2} = 0.5 \), b Covariance estimator

Fig. 6
figure 6

Left plots: empirical characteristic function with FV of jumps; Right plots: covariance estimator

Fig. 7
figure 7

a \( r_{1} = 0.8, r_{2} = 0.7 \), b Covariance estimator

Fig. 8
figure 8

a \( r_{1} = 0.9, r_{2} = 0.8 \), b Covariance estimator

Fig. 9
figure 9

a Indices of jump activity: \( r_{1} = 1.0, r_{2} = 0.9 \) b Covariance estimator

Next, we plot bivariate Lévy processes with at least one jump component of infinite variation. Figs. 9b, 10b, 11b, 12b, 13b show the behavior of the estimator not being consistent with the theoretical one. Henceforth, Lepskiĭ’s method cannot be applied, especially in the case of Figs. 9a and 13a.

Fig. 10
figure 10

a Indices of jump activity: \( r_{1} = 1.1, r_{2} = 1.0 \), b Covariance estimator

Fig. 11
figure 11

a Indices of jump activity: \( r_{1} = 1.5, r_{2} = 0.5 \), b Covariance estimator

Fig. 12
figure 12

Left plots: empirical characteristic function with one jump of IV; Right plot: covariance estimator

Fig. 13
figure 13

Left plots: empirical characteristic function with IV jumps; Right plot: covariance estimator

Next, we consider some numerical experiments discussing how Algorithm 1 can be approximately implemented for the stopping rule. To illustrate the performance of the method for \( \widehat{U}_{start}^{oracle} \) in (4.6), we proceed as follows. Fix \( r_{1} = 0.5, r_{2} = 1.5 \), therefore we assume at least one jump component to be of infinite variation. Hence, the co-jump index activity is given by \( r = 1.5 \). In Fig. 14a, b, we observe that the estimator is consistent with the true one \( C^{12} = 1 \), choosing \( \widehat{U}_{start}^{oracle} \) as is (4.6). This implies that the estimator is consistent with the true value even in the beginning of the method compared with the previous Figures, where the behavior of the estimator is quite erratic in the beginning.

Fig. 14
figure 14

Vertical lines: purple-dashed is the \( \widehat{U}_{start}^{oracle} \), orange-dashed is the \( U_{bal} \). Blue curves: Adaptive estimator \( \widehat{C}^{12}_{n, j} \). (Color figure online)

7 Discussion

In this paper, we address the adaptive estimation of the covariance for a two-dimension Lévy process. We extend the minimax results obtained in Papagiannouli (2020), where the class of estimator requires prior knowledge of the process parameters which control the tuning parameter \( U_{n} = c(r,M)\sqrt{n\log n} \). We devise a fully data-dependent method based on a variant of Lepskii’s principle, where a balance between bias and stochastic part of the estimator is obtained. We show, in Theorem 3.5 that the adaptive estimator achieves the minimax rates of convergence up to a logarithmic factor. It should be noted that such a logarithmic gap between the minimax and adaptive rates is well-known in the literature.

Comments on the stochastic error. The construction of an adaptive estimator is complicated in the current context by the irregular behavior of the stochastic error. The bound (20) ensures that the stochastic error of the estimator is upper-bounded by the truncated characteristic function up to a logarithmic factor and a multiplicative constant C. This means that the bound depends on the random quantity \( |{\tilde{\phi }}_{n}(\mathbf{u })|\) but not on the unknown characteristic function \( \phi _{n}(\mathbf{u }) \). With high probability, the inequality (6) allows us to interchange between the unknown characteristic function and the truncated empirical characteristic function, which is data-dependent. A direct consequence of the above observation is that we can interchange with high probability between the empirical bound \( \tilde{s}_{n}(U) \) and the theoretical bound \( s_{n}(U) \) for the stochastic error. This procedure was not known up to know in the literature of nonparametric estimation for multi-dimensional Lévy processes.

In Fig. 1, an irregular behavior of the bound for the stochastic error is observed for small values of U, because of the empirical characteristic function in the denominator. To overcome this obstacle, we find an oracle start for U, so as to ensure a monotonically increasing bound for the stochastic error.

Comments on the balancing principle. The construction of a monotonically increasing bound for the stochastic error allows us to apply Lepskiĭ’s principle for the adaptive estimator \( \widehat{C}^{12}_{n,{\hat{j}}} \). As a rule, we use the empirical stochastic error so as our procedure to be completely data-dependent. In this way, we avoid using the deterministic upper bound, which depends on th unknown co-jump index activity r. Theorem 3.5 shows us that the balancing principle can adaptively achieve the best possible rate, which is near-optimal in a minimax sense.

Comments on the proofs. In Sect. 7 we prove the results of Sect. 2 concerning the uniform control of the deviation of the empirical characteristic function on the diagonal of \( {\mathbb {R}}^{2}\). Through employing chaining arguments, we prove uniform convergence for the normalized characteristic function. We use concentration inequalities, such as Talagrand inequality, in order to prove uniform control for the empirical characteristic function on a countable set. Finally, we derive a uniform upper bound for the truncated empirical characteristic function after introducing favorable sets for the truncated estimator. The main difficulty here is to define the desired events where we could interchange with probability between truncated and empirical characteristic functions.

8 Proofs for Section 2

In this section, we provide proofs of the results which are presented in Sect. 2. The proof of Theorem 2.3 follows a chaining argument for the empirical processes. Thus, we recall the following definitions from empirical process theory.

Definition 7.1

We consider measurable functions \( f,g:{\mathcal {F}}\rightarrow {\mathbb {R}} \). For two such functions fg we introduce the “bracket” notation:

$$\begin{aligned}{}[f, g]:= \big \{ h:{\mathcal {F}}\rightarrow {\mathbb {R}} \quad \text{ such } \text{ that }\quad f\le h\le g \big \}. \end{aligned}$$
(43)

Definition 7.2

By the bracketing entropy number \( N_{[ \cdot ]}(\epsilon , {\mathbb {G}})\) of a class \( {\mathbb {G}}\) we mean the minimal number N for which there exist functions \( f_{1},\dots , f_{N} \) and \( g_{1}, \dots , g_{N}\) such that

$$\begin{aligned} {\mathbb {G}}\subset \bigcup _{i= 1}^{N}[f_{i}, g_{i}] \qquad \text{ and } \quad \int |f_{i}-g_{i}|^{2}d{\mathbb {P}}\le \epsilon ^{2}, \quad i = 1,2,\dots , N. \end{aligned}$$

\( N_{[ \cdot ]}(\epsilon , {\mathbb {G}})\) is the minimal number of \( L^{2}({\mathbb {P}}) \)-balls of radius \( \epsilon \) which are needed to cover \( {\mathbb {G}}\). The class \( {\mathbb {G}}\) is called bracketing compact if \(N_{[ \cdot ]}(\epsilon , {\mathbb {G}})<\infty \) for any \( \epsilon >0 \). The entropy integral is defined by

$$\begin{aligned} J_{[ \cdot ]}(\delta , {\mathbb {G}}):= \int _{0}^{\delta }\big ( \log (N_{[ \cdot ]}(\epsilon , {\mathbb {G}}))\big )^{1/2} d\epsilon . \end{aligned}$$

The convergence of the integral depends on the size of the bracketing numbers for \( \epsilon \rightarrow 0 \). Finally, a function \( F\ge 0\) is called an envelope function for \( {\mathbb {G}}\), if

$$\begin{aligned} \forall f \in {\mathbb {G}}: |f|\le F. \end{aligned}$$

8.1 Proof of Theorem 2.3

We decompose \( \sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) )\) in its real and imaginary parts,

$$\begin{aligned} \begin{aligned}&\textsf {Re}(\sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) )) := n^{-1/2}\sum _{t=1}^{n}\big (\cos (\langle \mathbf{u }, X_{t}\rangle )- {\mathbb {E}}\cos (\langle \mathbf{u }, X_{1}\rangle )\big )\\&\textsf {Im}(\sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }))) := n^{-1/2}\sum _{t=1}^{n}\big (\cos (\langle \mathbf{u }, X_{t}\rangle )- {\mathbb {E}}\cos (\langle \mathbf{u }, X_{1}\rangle )\big ) \end{aligned} \end{aligned}$$

We consider the class \( {\mathbb {G}}\), which consists of complexed valued functions,

$$\begin{aligned} {\mathbb {G}}= \Big \{\mathbf{x }\rightarrow w(U)\cos (\langle \mathbf{u }, \mathbf{x }\rangle )|\mathbf{u }\in {\mathbb {R}}^{2}\Big \} \cup \Big \{\mathbf{x }\rightarrow w(U)\sin (\langle \mathbf{u }, \mathbf{x }\rangle )|\mathbf{u }\in {\mathbb {R}}^{2}\Big \}. \end{aligned}$$

An application of Corollary 19.35 in Van der Vaart (2000) gives

$$\begin{aligned} {\mathbb {E}}\Vert \sqrt{n}({\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u }) )\Vert _{L_{\infty }(w)} \le C J_{[\cdot ]}\big ({\mathbb {E}}[F^{2}(X_{1})], {\mathbb {G}}, L^{2}({\mathbb {P}})\big ), \end{aligned}$$
(44)

where \( F = 1\) is an envelope function in \( {\mathbb {G}}\). It remains to prove that the bracketing integral on the right-hand side of (44) is bounded. We need to cover \( {\mathbb {G}}\) with functions such that

$$\begin{aligned} {\mathbb {G}}\subset \bigcup _{i= 1}^{N}\Big \{[g_{i}^{-}, g_{i}^{+}]: \int |g_{i}^{+}-g_{i}^{-}|^{2}d{\mathbb {P}}\le \epsilon ^{2}\Big \} \end{aligned}$$

and find N, the minimum number to cover \( {\mathbb {G}}\). Inspired by Yukich (1985), we characterize the convergence of \( C_{n}(\mathbf{u }) \) in terms of the tail behavior of \( {\mathbb {P}}\). For every \( \epsilon >0 \), we set

$$\begin{aligned} M := M(\epsilon ):=\inf \{m>0: {\mathbb {P}}(|X_{1}|>m)\le \epsilon ^{2}\}. \end{aligned}$$
(45)

Furthermore, for all j, define the bracket functions for \( \mathbf{x } = (x_{1}, x_{2}) \)

$$\begin{aligned} \begin{aligned}&g_{j}^{\pm }(\mathbf{x }) = \big ( w(U_{j})\cos (\langle \mathbf{u }_{j}, \mathbf{x } \rangle ) \pm \epsilon \big )\mathbb {1} _{[-M, M]}(\mathbf{x }) \pm \Vert w\Vert _{\infty } \mathbb {1} _{[-M, M]^{\complement }}(\mathbf{x }),\\&h_{j}^{\pm }(\mathbf{x }) = \big ( w(U_{j})\sin (\langle \mathbf{u }_{j}, \mathbf{x } \rangle ) \pm \epsilon \big )\mathbb {1} _{[-M, M]}(\mathbf{x }) \pm \Vert w\Vert _{\infty } \mathbb {1} _{[-M, M]^{\complement }}(\mathbf{x }), \end{aligned} \end{aligned}$$

where \( \mathbf{u }_{j} = (U_{j}, U_{j}) \) and \( \mathbf{x } = (x_{1}, x_{2}) \). We obtain for the size of the brackets that

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\big [ |g_{i}^{+}(X_{1})-g_{i}^{-}(X_{1})|^{2}\big ]&\le {\mathbb {E}}\big [ |2 \epsilon \mathbb {1}_{[M, M]^{2}}(\mathbf{x }) +2\Vert w\Vert _{\infty }\mathbb {1}_{[-M, M]^{\complement }}(\mathbf{x })|^{2} \big ]\\&\le 4 \epsilon ^{2}(1 +\Vert w\Vert _{\infty }^{2}). \end{aligned} \end{aligned}$$

An analogous argument gives

$$\begin{aligned} {\mathbb {E}}\big [ |h_{i}^{+}(X_{1})-h_{i}^{-}(X_{1})|^{2}\big ] \le 4 \epsilon ^{2}(1 +\Vert w\Vert _{\infty }^{2}). \end{aligned}$$

It remains to choose \(U_{j} \) in such a way that the brackets cover \( {\mathbb {G}}\). We consider an arbitrary \(U \in {\mathbb {R}}\) and any grid point \( U_{j} \). For a function \( g_{U}(\cdot ):= w(U) \cos (U(x_{1}+x_{2})) \in {\mathbb {G}}\) to be contained in the bracket \([g^{-}_{j}, g_{j}^{+}] \), we have to ensure

$$\begin{aligned} |w(U)\cos (\langle \mathbf{u }_{j}, \mathbf{x } \rangle ) - w(U)\cos (\langle \mathbf{u }, \mathbf{x } \rangle )|\le \epsilon , \quad {\forall \mathbf{x } \in [-M, M]^{2}}. \end{aligned}$$
(46)

With the estimate

$$\begin{aligned} \begin{aligned}&|w(U)\cos (\langle \mathbf{u }_{j}, \mathbf{x } \rangle ) - w(U)\cos (\langle \mathbf{u }, \mathbf{x } \rangle )|\\&\quad \le \big (w(U) +w(U_{j})\big ) \wedge \\&\big (|w(U)\cos (\langle \mathbf{u }, \mathbf{x }\rangle ) - w(U) \cos (\langle \mathbf{u }_{j}, \mathbf{x }\rangle )|\mathbb {1}_{[M, M]^{2}}(\mathbf{x }) \\&\qquad +|w(U)\cos (\langle \mathbf{u }_{j}, \mathbf{x }\rangle ) - w(U_{j}) \cos (\langle \mathbf{u }_{j}, \mathbf{x }\rangle )|\mathbb {1}_{[M, M]^{2}}(\mathbf{x }) \big )\\&\quad \le \big (w(U) +w(U_{j})\big ) \wedge \big ( 2M \Vert w\Vert _{\infty }|U - U_{j}|+ Lip(w)|U- U_{j}|\big ), \end{aligned} \end{aligned}$$

where Lip(w) is the Lipschitz constant of the weight function w. In the last inequality, we used the estimate \( |\cos (\langle \mathbf{u }, \mathbf{x }\rangle ) - \cos (\langle \mathbf{u }_{j}, \mathbf{x } \rangle ) |\le 2|\sin ((x_{1}+x_{2})(U - U_{j})/2)|\le 2M |U- U_{j} |\). (46) is seen to hold for any \( \mathbf{u }\in {\mathbb {R}}^{2}\), when

$$\begin{aligned} \min \big \{ w(U) +w(U_{j}), |U - U_{j}|\big (Lip(w)+ 2M\Vert w\Vert _{\infty }\big )\big \} \le \epsilon . \end{aligned}$$

Consequently, we choose the grid points \( U_{j} \) such as

$$\begin{aligned} \sup _{1\le j\le N} |U_{j} - U_{j-1}|\le \frac{2 \epsilon }{ Lip(w)+ 2M\Vert w\Vert _{\infty } } \end{aligned}$$
(47)

and

$$\begin{aligned} U_{j} = \frac{j\epsilon }{Lip(w)+2M\Vert w\Vert _{\infty }} \end{aligned}$$
(48)

for \( |j| \le J(\epsilon ) \), where \( J(\epsilon ) \) is the smallest integer such that \( w(U_{1})\le \frac{\epsilon }{2}, \cdots , w(U_{J(\epsilon )})\le \frac{\epsilon }{2} \). We need to find the smallest integer \( J(\epsilon ) \) in order to cover \( {\mathbb {G}}\) with \( L^{2}({\mathbb {P}}) \)-balls of radius \( \epsilon \). This yields

$$\begin{aligned} J(\epsilon ) \le \frac{2U(\epsilon )(Lip(w)+2M\Vert w\Vert _{\infty })}{\epsilon }, \end{aligned}$$

with \( U(\epsilon )\le U_{J(\epsilon )} \), where

$$\begin{aligned} U(\epsilon ) := \inf \Big \{ U>0: w(U)\le \frac{\epsilon }{2} \Big \} = O(\exp (\epsilon ^{-(1+1/2)^{-1}})). \end{aligned}$$

Therefore, the minimal number of \( L^{2}({\mathbb {P}}) \)-balls of radius \( \epsilon \) satisfies \( N_{[ \cdot ]}(\epsilon , {\mathbb {G}})\le 2(2J(\epsilon )+1)\). The generalized Markov inequality yields that

$$\begin{aligned} M(\epsilon ) \le \bigg (\frac{{\mathbb {E}}|X_{1}|^{2+\gamma }}{\epsilon ^{2}}\bigg ) ^{1/\gamma } = O(\epsilon ^{-2/\gamma }). \end{aligned}$$

The entropy number bracketing satisfies

$$\begin{aligned} \begin{aligned} \log (N_{[\cdot ]}(\epsilon , {\mathbb {G}}))&\le \log (U(\epsilon ) ) + \log \bigg ( \frac{Lip(w) + 2M\Vert w\Vert _{\infty }}{\epsilon } \bigg )\\&= O(\epsilon ^{-(1+1/2)^{-1}} + \log (\epsilon ^{-1-2/\gamma }))= O(\epsilon ^{-(1+1/2)^{-1}}). \end{aligned} \end{aligned}$$
(49)

Thus, we have shown that

$$\begin{aligned} J_{[\cdot ]}({\mathbb {E}}[F^{2}(X_{1})], {\mathbb {G}}, L^{2}({\mathbb {P}}))= \int _{0}^{1} \sqrt{\log (N_{[\cdot ]}(\epsilon , {\mathbb {G}}))} d\epsilon <\infty . \end{aligned}$$

This completes the proof.\(\square \)

8.2 Proof of Lemma 2.4.

The proof consists in checking the assumptions of Lemma A.2. We denote by \( X_{j}^{(\mathbf{u })} := |e^{i\langle \mathbf{u }, Y_{j}\rangle }|\), where \( Y_{j} \) are Lévy increments. We trivially have

$$\begin{aligned} \sup _{\mathbf{u }\in {\mathcal {A}}} \textsf {var}(X_{1}^{(\mathbf{u })}):=\sup _{\mathbf{u }\in {\mathcal {A}}}\textsf {Var}(|e^{i\langle \mathbf{u }, Y_{1}\rangle }|) \le 1 \quad \text{ and } \quad \sup _{\mathbf{u }\in {\mathcal {A}}}X_{1}^\mathbf{(u) }\le 1. \end{aligned}$$

We set \( S_{n}^{(\mathbf{u })} := n^{-1}({\widehat{\phi }}_{n}(\mathbf{u })- \phi _{n}(\mathbf{u })) \). By Theorem 2.3 we have positive constant \( C<0 \):

$$\begin{aligned} {\mathbb {E}}\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}|{\widehat{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|\bigg ]\le Cn^{-1/2}(w(U))^{-1}. \end{aligned}$$

Therefore, we can apply Talagrand’s inequality with \( R = 1 \), and \( v^{2}=1\). The claim now follows from inserting Lemma A.2. \(\square \)

8.3 Proof of Lemma 2.5.

The proof can be based on the countable set of rational numbers. By continuity of the characteristic function and of w, it carries over to the whole range of real numbers.

By Lemma 2.4 and setting

$$\begin{aligned} \kappa := t(\log n)^{1/2}n^{-1/2} - (1+\epsilon )Cn^{-1/2}, \end{aligned}$$
(50)

for some \( \epsilon >0 \), we have

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\bigg [\exists q \in {\mathbb {Q}}: |{\widehat{\phi }}_{n}(q)-\phi _{n}(q)|\ge t(\log n)^{1/2}(w(q))^{-1}n^{-1/2}\bigg ]\\&\quad \le {\mathbb {P}}\bigg [\sup _{q\in {\mathbb {Q}}} |{\widehat{\phi }}_{n}(q)-\phi _{n}(q)|\ge t(\log n)^{1/2}n^{-1/2}\bigg ]\\&\quad \le {\mathbb {P}}\bigg [\sup _{q\in {\mathbb {Q}}} |{\widehat{\phi }}_{n}(q)-\phi _{n}(q)|\ge (1+\epsilon ){\mathbb {E}}\big [\sup _{q\in {\mathbb {Q}}}|{\widehat{\phi }}_{n}(q)-\phi _{n}(q)|\big ] +\kappa \bigg ]\\&\quad \le 2\exp \bigg (-n\bigg (\frac{\kappa ^{2}}{c_{1}}\wedge \frac{\kappa }{c_{2}}\bigg )\bigg ). \end{aligned} \end{aligned}$$
(51)

By definition of \( \kappa \) and for some constant C large enough we get

$$\begin{aligned} \begin{aligned}&2\exp \bigg (-n\bigg (\frac{\kappa ^{2}}{c_{1}}\wedge \frac{\kappa }{c_{2}}\bigg )\bigg )\\&\quad =2\exp \bigg (-\frac{\big (t(\log n)^{1/2}-(1+\epsilon )C\big )^{2}}{c_{1}}\bigg )\\&\quad \vee 2 \exp \bigg (-\frac{n^{1/2}\big (t(\log n)^{1/2}-(1+\epsilon )C\big )}{c_{2}}\bigg )\\&\quad \le C\exp \bigg (-\frac{(t-\beta )^{2}}{c_{1}}\log n\bigg )\\&\quad = C n^{-\frac{(t-\beta )^{2}}{c_{1}}}. \end{aligned} \end{aligned}$$
(52)

By the continuity of the characteristic function, we extend the above results from the rational numbers to real line. This completes the proof. \(\square \)

8.4 Proof of Lemma 2.6

The claim follows using Lemma 2.5 and the choice of \( \kappa \ge 4(\sqrt{p c_{1}}+\beta )\), where \( \beta \) and \( c_{1} \) are the constants from Lemma 2.5. In particular, we have

$$\begin{aligned} {\mathbb {P}}\bigg [{\mathcal {E}}^{\complement }\bigg ]\le Cn^{-\frac{(\kappa /4- \beta )^{2}}{c_{1}}}\le C n^{-p}. \end{aligned}$$
(53)

Using the same argument we get \( {\mathbb {P}}\bigg [\tilde{{\mathcal {E}}}^{\complement }\bigg ]\le Cn^{-\frac{(\kappa /4- \beta )^{2}}{c_{1}}}\le C n^{-p} \).This proves the claim. \(\square \)

8.5 Proof of Lemma 2.9

We consider the following partition in the diagonal of the characteristic function \( {\mathcal {A}} = {\mathcal {A}}_{1}\cup {\mathcal {A}}_{2}\) with

$$\begin{aligned}&{\mathcal {A}}_{1}= \bigg \{\mathbf{u } \in {\mathcal {A}}: |\phi _{n}(\mathbf{u })|\le \frac{3\kappa }{4}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}\bigg \}, \end{aligned}$$
(54)
$$\begin{aligned}&{\mathcal {A}}_{2}= \bigg \{\mathbf{u } \in {\mathcal {A}}: |\phi _{n}(\mathbf{u })|> \frac{3\kappa }{4}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}\bigg \}, \end{aligned}$$
(55)

We analyze the deviation of the truncated estimator from the true one on each aforementioned sets and the event \( {\mathcal {E}} \). The event \( {\mathcal {E}}^{\complement } \) is negligible by Lemma 2.6, so it is enough to take into consideration only the event \( {\mathcal {E}} \). First, we consider \( {\mathcal {A}}_{1}\). For arbitrary \( \mathbf{u } \in {\mathcal {A}}_{1} \), we get

$$\begin{aligned} \begin{aligned}&\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })} - \frac{1}{\phi _{n}(\mathbf{u })}\right|^{2} \\&=\frac{|{\tilde{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\mathbb {1}\bigg (\bigg \{|{\widehat{\phi }}_{n}(\mathbf{u })|>\frac{\kappa }{2}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1} \bigg \}\bigg )\\&\qquad +\frac{|{\tilde{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\mathbb {1}\bigg (\bigg \{|{\widehat{\phi }}_{n}(\mathbf{u })|\le \frac{\kappa }{2}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1} \bigg \}\bigg )\\&\quad := A_{1} +A_{2}. \end{aligned} \end{aligned}$$
(56)

We will bound the quantities \( A_{1} \) and \(A_{2} \) separately. We start with the first term of the sum (56), which is \( A_{1} \). On the set \(\bigg \{|{\widehat{\phi }}_{n}(\mathbf{u })|>\frac{\kappa }{2}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1} \bigg \}\), using the Definition 2.8 we get that \( |{\tilde{\phi }}_{n}(\mathbf{u })|= |{\widehat{\phi }}_{n}(\mathbf{u })|\). On the one hand, by Lemma 2.6, we have

$$\begin{aligned}&\frac{|{\tilde{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\nonumber \\&\quad \le \frac{\frac{k^{2}}{4^{2}}\frac{\log n}{n}(w(U))^{-2}}{|{\widehat{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\le \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}. \end{aligned}$$
(57)

We observe that using (54), we get

$$\begin{aligned}&\frac{1}{|{\widehat{\phi }}_{n}(\mathbf{u })|}\le \frac{2}{\kappa \big (\frac{\log n}{n}\big )^{1/2} (w(U))^{-1}}\le \frac{3}{2}\frac{1}{|\phi _{n}(\mathbf{u })|}. \end{aligned}$$
(58)

On the other hand, inserting (58) into \( A_{1} \), it yields that

$$\begin{aligned}&\frac{|{\tilde{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}} = \frac{|{\widehat{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u })|^{2}}{|{\widehat{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\nonumber \\&\quad \le \frac{9 k^{2}}{16}\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}. \end{aligned}$$
(59)

Combining (57) with (59), we get

$$\begin{aligned} A_{1}\le \frac{9k^{2}}{16}\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}\wedge \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}. \end{aligned}$$
(60)

Next, we study the second term of the sum (56), which is \( A_{2} \). On the set \( \bigg \{|{\widehat{\phi }}_{n}(\mathbf{u })|\le \frac{\kappa }{2}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1} \bigg \} \), along with the Definition 2.8, we have that \( |{\tilde{\phi }}_{n}(\mathbf{u })|= \frac{\kappa }{2}(\frac{\log n}{n})^{1/2}w(U)^{-1} \). It yields that

$$\begin{aligned} \begin{aligned} A_{2}&\le \frac{|{\tilde{\phi }}_{n}(\mathbf{u }) - \phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\le \frac{\left|\frac{3\kappa }{4}\big (\frac{\log n }{n}\big )^{1/2}(w(U))^{-1}-\frac{\kappa }{2}\big (\frac{\log n }{n}\big )^{1/2}(w(U))^{-1}\right|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\\&\le \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}. \end{aligned} \end{aligned}$$
(61)

Then, (60) and (61) imply that

$$\begin{aligned} A_{1} + A_{2}\le \frac{9k^{2}}{16}\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}\wedge \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}. \end{aligned}$$
(62)

The last ingredient is to consider the set \( {\mathcal {A}}_{2} \). Using (55), it holds that

$$\begin{aligned} |{\widehat{\phi }}_{n}(\mathbf{u })|\ge \left||\phi _{n}(\mathbf{u })|- |{\widehat{\phi }}_{n}(\mathbf{u }) -\phi _{n}(\mathbf{u })|\right|\ge \frac{\kappa }{2} \bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}. \end{aligned}$$
(63)

By Definition 2.8, the above inequality implies that \( |{\tilde{\phi }}_{n}(\mathbf{u })|= |{\widehat{\phi }}_{n}(\mathbf{u })|\). On the one hand we have

$$\begin{aligned} \begin{aligned} \left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })} - \frac{1}{\phi _{n}(\mathbf{u })} \right|^{2}&= \frac{|{\tilde{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\tilde{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\\&= \frac{|{\widehat{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\widehat{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}\\&\le \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}. \end{aligned} \end{aligned}$$
(64)

On the other hand, Lemma 2.6 and (55) give

$$\begin{aligned} \begin{aligned} |{\widehat{\phi }}_{n}(\mathbf{u })|&\ge \left||\phi _{n}(\mathbf{u })|- |{\widehat{\phi }}_{n}(\mathbf{u }) -\phi _{n}(\mathbf{u })|\right|\\&\ge \left||\phi _{n}(\mathbf{u })|- \frac{\kappa }{4}\bigg (\frac{\log n}{n}\bigg )^{1/2}(w(U))^{-1}\right|\\&\ge 2|\phi _{n}(\mathbf{u })|. \end{aligned} \end{aligned}$$
(65)

Consequently, by (65)

$$\begin{aligned} \begin{aligned} \frac{|{\widehat{\phi }}_{n}(\mathbf{u })-\phi _{n}(\mathbf{u })|^{2}}{|{\widehat{\phi }}_{n}(\mathbf{u })|^{2}|\phi _{n}(\mathbf{u })|^{2}}&\le \frac{\frac{\kappa ^{2}}{4^{2}}\big (\frac{\log n }{n}\big )(w(U))^{-2}}{4|\phi _{n}(\mathbf{u })|^{4}}\\&\le \frac{\kappa ^{2}}{4^{3}}\frac{\big (\frac{\log n }{n}\big )(w(U))^{-2}}{|\phi _{n}(\mathbf{u })|^{4}}, \end{aligned} \end{aligned}$$
(66)

which concludes the proof. \(\square \)

8.6 Proof of Lemma 2.10

To derive the desired upper bound we distinguish between two events, E and \( E^{\complement } \), which are defined as in Lemma 2.6. So,

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}- \frac{1}{\phi _{n}( \mathbf{u })}\right|^{2}\bigg ]&= {\mathbb {E}}\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}- \frac{1}{\phi _{n}( \mathbf{u })}\right|^{2}\mathbb {1}(E)\bigg ]\\&\quad + {\mathbb {E}}\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}- \frac{1}{\phi _{n}( \mathbf{u })}\right|^{2}\mathbb {1}(E^{\complement })\bigg ]. \end{aligned} \end{aligned}$$
(67)

First, we establish an upper bound for the first part of the right hand side sum. Lemma 2.9 on the event E yields that,

$$\begin{aligned} \begin{aligned}&E\bigg [\sup _{\mathbf{u }\in {\mathcal {A}}}\left|\frac{1}{{\tilde{\phi }}_{n}(\mathbf{u })}- \frac{1}{\phi _{n}(\mathbf{u })}\right|^{2}\bigg (\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(U)|^{4}}\wedge \frac{1}{|\phi _{n}(\mathbf{u })|^{2}}\bigg )^{-1}\bigg ]\\&\quad \le {\mathbb {E}}\bigg [\sup _{u\in {\mathcal {A}}}\bigg (\frac{9\kappa ^{2}}{16}\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(\mathbf{u })|^{4}}\wedge \frac{1}{4}\frac{1}{|\phi _{n}(\mathbf{u })|^{2}}\bigg )\\&\bigg (\frac{\log n (w(U))^{-2}n^{-1}}{|\phi _{n}(U)|^{4}}\wedge \frac{1}{|\phi _{n}(\mathbf{u })|^{2}}\bigg )^{-1}\bigg ]\\&\le \frac{9\kappa ^{2}}{16}. \end{aligned} \end{aligned}$$

On the contrary, the event \( E^{\complement } \) is negligible, using Lemma 2.6 and this concludes the proof. \(\square \)

8.7 Proof of Lemma 2.11.

This is a direct consequence of the Proof of Lemma 2.10. The statement of the corollary can be found in formulas (57), (61), and (64). \(\square \)