A Lepskiĭ-type stopping rule for the covariance estimation of multi-dimensional Lévy processes

We suppose that a Lévy process is observed at discrete time points. Starting from an asymptotically minimax family of estimators for the continuous part of the Lévy Khinchine characteristics, i.e., the covariance, we derive a data-driven parameter choice for the frequency of estimating the covariance. We investigate a Lepskiĭ-type stopping rule for the adaptive procedure. Consequently, we use a balancing principle for the best possible data-driven parameter. The adaptive estimator achieves almost the optimal rate. Numerical experiments with the proposed selection rule are also presented.


Introduction
In recent years, the use of multi-dimensional Lévy processes for modeling purposes has become very popular in many areas, especially in the field of finance (e.g. Cont and Tankov 2004; see also Sato 1999 for a comprehensive study). The distribution of a Lévy process is usually specified by its characteristic triplet (drift, Gaussian component, and Lévy measure) rather than by the distribution of its independent increments. Indeed, the exact distribution of these increments is most often intractable or without closed formula. For this reason, an important task is to provide estimation methods for the characteristic triplet.
Such estimation methods depend on the way observations are performed. In our model, two-dimensional Lévy process X t is observed at high frequency, i.e., the time between two consecutive observations is 1 n . The characteristic function of such a two-dimensional Lévy process is given by The remainder of the paper is organized as follows. Section 2 provides general results for the uniform control of the deviation of the empirical characteristic function on R 2 , so that it also can be read as an independent contribution. Section 3 introduces Lepskiȋ's strategy for devising a stopping rule algorithm for the parameter U . In Sect. 4, we present theoretical guarantees for the adaptive estimation. Hence, we are able to construct a monotonically increasing upper bound for the stochastic error. In Sect. 5, we devise a balancing principle for the optimal choice of U and present the convergence rates of the adaptive estimator. Section 6 summarizes the results. A short illustration of the behavior of the estimator and stopping rules is then provided in Sect. 6 by means of empirical simulations from synthetic data. Finally, proofs for Sect. 2 are given in Sect. 7.

Estimating the characteristic function
Here, we discuss technical tools which provide a uniform control of the deviations of the empirical characteristic function on R 2 . The interesting point here is that the decay of the characteristic function is not assumed to be explicitly known but comes in by implication. To keep the exposition intuitive and free from technicalities, the proofs of lemmas have been postponed to Sect. 7. Throughout this section, we use the letter C to denote a constant that may change from line to line.
For the sake of keeping the calculations simple, we will restrict ourselves to estimating the characteristic function on the diagonal. For this purpose, let us introduce the following definition. Let a probability space ( , F , (F t ) t≥0 , P) be given. We assume that X t = (X (1) , X (2) ) is a bivariate Lévy process observed at n equidistant time points , . . . , n = T , where = i n for i = 1, . . . , n and T = 1. We denote by the normalized empirical characteristic function process, where u ∈ A. For an appropriate weight function w : R → (0, 1], we consider Recall that √ n( φ n (u) − φ n (u)) converges weakly to a Gaussian process if and only if x → e i u,x , u ∈ A is a functional Donsker class for P.
We start by defining a weight function that was introduced in Neumann and Reiß (2009) and is the key for the uniform convergence of the empirical characteristic function. The above definition is meaningful under the following, rather general assumption concerning the characteristic function.
Assumption 1 There is a function g which is non-decreasing on R − and non-increasing on R + . There exist positive constants C and C , such that Some remarks are in order here: The following cases may be considered for the characteristic function.
(a) Gaussian decay Under some boundedness condition for the covariance matrix and the activity of jumps, we can prove that (b) Exponential decay Here,the characteristic function φ n decays at most exponentially, that is, for some a > 0, C > 0, Examples of distributions with this property include normal inverse Gaussian and generalized tempered stable distributions. (c) Polynomial decay In this case the characteristic function satisfies for some β ≥ 0, Typical examples for this property are the compound Poisson distribution, gamma distribution, and variance gamma distribution. Contrary to the properties formulated above, our reasoning does not rely on any semiparametric assumption about the shape of the characteristic function. The only thing needed is the quasi-monotonicity of Assumption 1 which is fairly general. We receive the following result, extending Theorem 4.1 of Neumann and Reiß (2009) Let us mention that the logarithmic decay of the weight function w is in accordance with the well-known results of Csörgő and Totik (1983), where lim n→∞ √ n( φ n ((T n , T n )) − φ n ((T n , T n )) = 0 almost surely on intervals [−T n , T n ] whenever log T n /n → ∞. We are now ready to prove a uniform bound for the deviation of the empirical characteristic function from the true one. First, we establish a Talagrand inequality using Lemma A.2 from "Appendix A".
Lemma 2.4 Let U be some countable index set. Then for arbitrary > 0, there are positive constants c 1 , c 2 = c 2 ( ), such that for every κ > 0 we obtain Now we introduce a logarithmic factor which is essential to proving uniformness on the diagonal. This comes at the cost of losing a logarithmic factor. Lemma 2.5 Let t > 0 be given, and A defined as in Definition 2.1. Then, for arbitrary β > 0, there exists a constant C, such that we have where the constant C depends on δ appearing in Definition 2.2 and c 1 is the constant in Talagrand's inequality from Lemma 2.4.
The statement of Lemma 2.5 holds forũ ∈Ã. A direct consequence of Lemma 2.5 is that we can consider a favorable set for the deviation of the empirical characteristic function from the true one.

Truncated characteristic function
Here we present an extension of Lemma 2.1 in Neumann and Hössjer (1997), which renders the point-wise control of the characteristic function in the denominator uniform on sets A. Now, we briefly discuss the idea of a truncated characteristic function presented in detail in Neumann and Hössjer (1997). It is clear that the characteristic function φ n (u) can be estimated at each point u = (U , U ) with the rate n −1/2 . Hence, φ n (u) is a reasonable estimator of φ n (u), if |φ n (u)| n −1/2 . The idea is to cut off the frequencies u, for which |φ n (u)| ≤ n −1/2 . First, we recall the key Lemma 2.1 from Neumann and Hössjer (1997): Lemma 2.7 It holds that, for any p ≥ 1, Neumann's result is for p = 1, but the extension to any p is straightforward. See also Neumann and Reiß (2009). The global threshold must be formulated in terms of φ n (u), so that the compact set is in fact random. The main difference with Neumann's truncated estimator lies in the fact that we introduce an additional logarithmic factor in the thresholding scheme. This logarithmic factor allows us to derive exponential inequalities, as we saw in Lemma 2.5.
We can now use Lemma 2.5 to assess the deviation of 1

Lemma 2.9
Suppose that for some p ≥ 1/2 and β > 0, we have κ ≥ 2( √ pc 1 + β), where c 1 is the constant in Talagrand's inequality. Then, for n > 0 and a positive constant C, there is u ∈ A such that We are now in position to formulate a uniform bound on the diagonal, which is an immediate result of Lemma 2.9. Lemma 2.10 If the assumptions of Lemma 2.9 hold, then there is a constant C > 0 depending on κ, such that for n ≥ 1 Also, Lemma 2.10 can be extended to powers different from 2. We just need to substitute 2 with 2q.
Note that an intermediate consequence of the preceding Lemma 2.9 is the following important corollary, which allows us to interchange between the empirical characteristic function and the true one with high probability.

Corollary 2.11
In the situation of the preceding statement, we have It is in fact this version of the statement which will play an important role below. On the complement of the preceding event, we have with high probability The statement of Lemma 2.11 and the above inequality hold forũ ∈Ã.

Adaptive parameter estimation
After recalling the statistical model, in this section we discuss the goal of this study. We aim to extend the minimax theory, from Papagiannouli (2020), to an adaptation theory for the covariance estimator.

Statistical model
We observe a two-dimensional Lévy process (X t i ) t i ≥0 for i = 0, 1, . . . , n at equidistant time points 0 = t 0 < t 1 < . . . < t n , where t i = i n . We consider the characteristic function (1) on the diagonal, i.e., u n = (U n , U n ), with characteristic triplet (b, C, F) with drift part b ∈ R 2 , covariance matrix C = C 11 C 12 C 21 C 22 , and jump measure F ∈ P(R 2 ). In what follows, we are in a nonparametric setting in which the process X t i belongs to the class L r M . Let us now recall this class. Definition 3.1 For M > 0 and r ∈ [0, 2), we define the class L r M , the set of all Lévy processes, satisfying where C ∞ = max(C 11 + C 12 , C 21 + C 22 ) is the maximum of the row sums. In the second term r refers to the co-jump activity index of the jump components.
For details and examples concerning this class we refer to Section 3 in Papagiannouli (2020), where a minimax estimator for the covariance C 12 is available. In addition, Jacod and Reiß (2014) provide a minimax estimator for the marginals, i.e. C 11 , C 22 . Given the empirical characteristic function of the increments X j = X j/n − X ( j−1)/n φ n (u n ) := 1 n n j=1 e i u n , n j X , u n ∈ R 2 a spectral estimator is used: A bias-variance type decomposition for the estimation error is available by Lemma 6.1 in Papagiannouli (2020). We recall the Lemma without the proof.

Lemma 3.2 The error bound for the estimation satisfies
where and on the set φ n (ũ n ) = 0 and φ n (u n ) = 0 H n (·), D(·) are the corresponding stochastic and deterministic errors.
The spectral estimator C 12 n (U n ) achieves minimax rates for the optimal parameter U n . By Theorem 4.2 in Papagiannouli (2020), for r ∈ [0, 2), M defined as in Definition 3.1 and for every 0 < η ≤ 1, there is a constant A η > 0, and N η such that for every n ≥ N η where are the minimax rates for the optimal parameter The error bound incurred by the spectral estimator in Lemma 3.2 is the sum of two terms, i.e., the deterministic and stochastic error, with respect to the tuning parameter U n . The stochastic error displays behavior opposite to the deterministic error. The stochastic error tends to explode, however the deterministic error tends to zero as U n grows. This observation and the fact that U n depends on unknown parameters (r , M) impose the need for a-posteriori choices of the parameter U n , which ideally are optimal in a well-defined sense. The goal is to derive a theoretical error bound for the adaptive estimator achieving almost the optimal rates.

Lepski's stopping rule
In this section, we establish an adaptive choice for the parameter U n , as this is achieved by Lepskiȋ's principle. Following Lepskiȋ's principle, a "stopping" rule is designed to achieve adaptation for a class of minimax estimators. We use the following conventions for the notations. We denote by U the parameter space. We consider a suitable finite discretization U 0 < . . . < U K for our parameter. We set C 12 n, j := C 12 (U j ), i.e., we assign an estimator C 12 n, j for each U j . For each estimator C 12 n, j we set s n (U j ) to be the upper bound of the stochastic error E|H n (U j )| for j = 0, 1, . . . , K . Starting from a family of rate asymptotically minimax estimators C 12 n (U n ) , how can one get adaptation over the parameter space U, to find an optimal tuning parameter U j , which provides simultaneously minimax rates for the covariance over the sets [U 0 , U K ] ⊂ U? Remark 3.3 In this paper we refer to the value U n as the best choice and to the corresponding rate as the best possible rate. The rate will be optimal in a minimax sense since the bound we started from is tight (11). We refer to the value U bal as the choice which balances the stochastic and deterministic error.
Let us first give a brief and simplified account of the classical Lepskiȋ method adjusted to our problem. We use the results in Section 5.4 of Reiß (2012). The key idea is to test real-valued estimators C 12 n,1 , C 12 n,2 , . . . , C 12 n, j , whose stochastic errors are increasing as the index is increasing and the bias is decreasing, for the hypotheses H j : C 12 n,1 = C 12 n,2 = · · · = C 12 n, j . If we accept H 1 , H 2 , . . . , H j , this means that C 12 n, j+1 differs significantly from C 12 n,1 , C 12 n,2 , . . . , C 12 n, j so we reject H j+1 . Further, we set j = j. We summarize the above discussion in the following definition.

Definition 3.4
We choose a suitable finite discretization U 0 < . . . < U K and take ∞ > s n (U K ) > s n (U K −1 ) > . . . > s n (U 0 ), given some large enough constant K . We define the Lepskiȋ principle aŝ where d is the Euclidean distance.
Heuristically, we want a rule so as the stochastic error will dominate the bias. We iterate the above stopping rule using the following algorithm.

Algorithm 1 StoppingRule
j is the smallest index for which the stochastic error dominates the deterministic error. We observe that Lepskii's strategy for parameter choice uses pairwise comparison of the estimators. By triangle inequality, Lemma 3.2, and monotonicity of the deterministic and stochastic error, we get for i, j ∈ {0, . . . , K } and i ≤ j where d(·) is the upper bound for the deterministic error and s n (·) is the upper bound for the stochastic error. The bound for the deterministic error has the form and depends on the co-jump activity index r ∈ (0, 2] and the constant M from 3.1. Clearly, the deterministic error is monotonically non-increasing while the index i increases. We also need to ensure that the bound for the stochastic error is monotonically non-decreasing in order to be able to use Lepskii's principle 14. We aim to use the stochastic error for Lepskii's principle instead of the deterministic error because the latter is dependent on a co-jump activity index r unknown to us. The stochastic error, on the other hand, depends on the characteristic function of a two-dimensional Lévy process, which might also be unknown. Yet, we can overcome this obstacle by exploiting the results of Sect. 2. As a result, we are able interchange with high probability between the theoretical s n (U j ) and the empirical bounds n (U j ) for the stochastic error.
This method enables us to construct an adaptive estimator using a Lepskii-type principle based on a data-dependent bound, i.e.s n (·), on the interval U = [U or acle start , U max ]. This is achieved via Let us finally show that the adaptive estimator using (16) achieves almost the minimax convergence rates.
Theorem 3.5 For a sequence of parameters U j which satisfies U j ∈ [U or acle start , U max ], there is a constant c ∈ (1/2, 1] such that the adaptive estimator satisfies Proof of Theorem 3.5 has been moved to Sect. 5, where we discuss the selection rule (16) in detail.

Analysis of the stochastic error
The main objective of the present section is to prove a high probability bound for the stochastic error. Observing the form of the stochastic error H n in Assumption 10, it becomes clear that we need to control the empirical characteristic function in the denominator, which may lead to unfavorable behavior for the stochastic error. To overcome this problem we consider the results obtained in Sect. 2.
In comparison with other adaptive results obtained in Comte and Genon-Catalot (2010) and Comte and Lacour (2011), whose procedure depends on a semiparametric assumption concerning the decay of the characteristic function, our assumption introduces a threshold to ensure that the characteristic function guards large values and the estimator makes sense.
Proof From Lemma 3.2 the stochastic error satisfies On the event E andẼ from Lemma 2.6, in the case that |φ n (u)| ≥ κ n n −1/2 and |φ n (ũ)| ≥ which concludes the proof.
Hence, everything boils down to controlling the unknown characteristic function in the denominator in a way that keeps the characteristic function large enough and enables a reasonable estimator. Using Lemma 2.11 and the inequality (6), we can substitute the unknown 1 |φ n (u)| with 1 |φ n (u)| , which is data-dependent. Inserting inequality (6) into (18), we get the following high probability upper bound for the stochastic error Proof The proof is a consequence of Lemma 2.6 and 4.1 applying Markov inequality.

Oracle start for the parameter U
In order to apply a Lepskiȋ-type stopping rule, we need to ensure that the bound for the stochastic error is monotonically increasing. First we introduce some further notation.

Further notation
We write U or acle start as the staring point for the Lepskiȋ principle. By (13), we denote the optimal choice for the parameter, as U n = r −1 M n log n. We also denote as C sum = i, j C i j , i.e. the sum of all elements of the covariance matrix.
We allow the bound for the stochastic error to depend either on the (possibly) unknown characteristic function or on the truncated empirical characteristic function. Since we can interchange w.h.p. between the true and empirical characteristic function, we use two different notations for the corresponding bounds of the stochastic error: We use these bounds for the stochastic error because it easy to check that 1 In what follows, we occasionally useφ n (U ) instead ofφ n (u) because we are estimating the characteristic function on the diagonal. Same rule applies for the function h(u) := h(U , U ) = 2 R 2 1 − cos( u, x )F(dx). Figure 1 illustrates the performance of the bound for the stochastic error using the bound s n (U ) and the stochastic error H n (U ), which is defined as in Assumption 10. We observe that the stochastic error is decreasing in the beginning and then it explodes. The occurrence of | φ n (u)| in the denominator might have unfavorable effects.
To obtain a possible remedy, we consider starting the Lepskiȋ procedure for a larger U and constructing a monotonically increasing bound for the stochastic error. Figure 1 depicts the above behavior.
We define the oracle start of U as follows: Let us highlight the strategy of constructing a monotonically increasing bound for the stochastic error. Finding the oracle start of U , we show that U or acle start < U n . Then, we prove that s n (U or acle start ) < s n (U n ), ensuring that an increasing bound is available for the stochastic error, within the interval [U or acle start , U max ] for U max > U n . The above discussion is depicted in Figure 1. It is worth emphasizing that the calculation of U or acle start requires the evaluation of the perhaps unknown φ n (u). Thus, we take into consideration only a general assumption for the characteristic function, like the quasi-monotonicity of Assumption 1 for infinity variation co-jumps, i.e., r ∈ (1, 2] and a boundedness condition for the covariance matrix.

Lemma 4.3 For big n, and K > 0, the interval for U or acle start is
and for r ∈ (1, 2], we get that U or acle start < U n , where U n = r −1 M n log n.
Proof The absolute value of the characteristic function is given by where u = (U , U ). We define where F is the Lévy measure in R 2 . Using the Cauchy-Schwarz inequality for | u, x | 2 ≤ u 2 x 2 , a positive constant K and v 0 = (0, 1) 2 ⊂ R 2 The last inequality derives from the fact that we always have R 2 (1 ∧ x 2 )F(dx) < ∞. So we can obtain the following inequality It is easy to check that Cu, u = C sum U 2 . Inserting this fact and (27) into (25) we get the following inequality for the absolute value of the characteristic function Inserting the above inequality into (24) we get the required interval for U or acle start , which ensures that U or acle start ∼ √ n. This implies that U or acle start < U n for big n. This concludes the proof. Lemma 4.4 For U or acle start < U n and n large enough, the stochastic error satisfies s n (U or acle start ) ≤ s n (U n ). Proof It suffices to show that s n (U or acle start ) s n (U n ) ≤ 1.
By the form of s n (U ) in (22), it is easy to check that By (27) we have that h(U or acle start ) − h(U n ) ≤ h(U or acle start ). We also get Substituting the above inequalities into (30) we obtain Taking everything into consideration we get which is smaller than one as n → ∞. The statement is proved.
A side product of the above analysis is the following corollary, which ensures that the upper bound of the stochastic error is always monotonically increasing over the desired interval.

Corollary 4.5 If we set
then s * n satisfies s n (U n ) = s * n (U n ).
Proof We define the sets U = [U or acle start , U n ] and S = {s n (U ) : U ∈ [U or acle start , U end ]}. U is a non-empty set of R. U n is the least upper bound of U. By the continuity of the stochastic error on the interval [U or acle start , U n ] and the extreme value theorem, we get that which concludes the proof.
Despite the fact that we used the (possible) unknown theoretical characteristic function as a criterion for the oracle start of Lepskiȋ procedure and construct a monotonically increasing bound as we wish, it is useful to secure a data-driven criterion as well. For this reason we propose the following definition.

Definition 4.6
For c ∈ (0, 1], we define the criterion for the oracle start of U as following U or acle start := inf U > 0 : | φ n (U )| ≤ c .
The last ingredient which remains to be proven is the following high probability bound, which will allow us to connect a data-driven choice for the oracle start of the Lepskiȋ procedure with the theoretical characteristic function.
Applying Hoeffding's inequality, we obtain Inserting Definition 4.6 to the empirical characteristic function, the statement is proven.

Balancing principle when the stochastic error is data-dependent
In this section, we prove an upper bound for the best possible adaptive parameter using a balancing principle inspired by the work of De Vito et al (2010) for adaptive kernel methods. The optimal choice U n crucially depends on the unknown parameters r , M. Using a Lepskiirule like in Sect. 3.2, we construct a completely data-driven estimation procedure adapted to U ∈ U, where U = [U or acle start , U max ].Our main result for the adaptive estimation shows that the Lepskiȋ estimator achieves almost the optimal rates.
In the following we denote by a(n) := 1 − exp(− 1 8 c − 1 2 2 n . By (23), with high probability, at least 1 − a(n), the upper bound for the stochastic error will be of the form where θ(U ) = U 2 , γ (n) = (n log n) 1/2 and 0 < w(U ) ≤ 1. Further, the term d(U ) is the deterministic error bound, which does not depend on data and is of the form where r ∈ (1, 2] is the co-jump activity index and M is from Definition 3.1. Recall that inequality (6) allows us to interchange with high probability between the (perhaps) unknown characteristic function and the empirical characteristic function. A direct consequence of the above observation is that we can interchange with high probability between the empirical bounds n (U ) and the theoretical bound s n (U ) for the stochastic error This leads to s n (U ) + d(U ) ≤ 2s n (U ) + d(U ). Consequently, the estimation error bound is given by the sum of two competing terms with probability at least 1 − exp(−2n(c + 1) 2 ), i.e., The upper bound of (36) is the sum of a bias term which decreases in U and a stochastic error which increases in U , for U ∈ U. According to the balancing principle, the best possible adaptive parameter choice is found by solving the bias-variance-type decomposition (36), which implies that we have to balance the deterministic and the stochastic error. We consider that U bal makes the contribution of two terms equal, i.e. d(U bal ) = 2s n (U bal ). We observe that the corresponding error estimate is, with probability at least 1 − a(n), where 0 < a(n) < 1 and U bal is the best possible parameter. Let us now highlight the idea behind the balancing principle. It is clear, by the monotonicity of the stochastic and deterministic error, that If we choose U * ≤ U bal : On the other hand, if we choose U * ≥ U bal : Driven by inequality (35), the strategy for the balancing principle will give us with high probabilitys The corresponding best parameter choice U bal gives, with probability 1 − a(n), the rate The aim is to choose U bal from the set: To define a parameter strategy, we first consider a discretization for the possible values of U j , that is, an ordered sequence U j such that the best value U bal falls within the considered grid U. The balancing principle estimate for U bal is defined via To define a parameter strategy, we first consider a discretization for the possible values of U j , that is, an ordered sequence such that the best value U bal falls within the considered grid U. The balancing principle estimate for U bal is defined via The reasons why we expect this estimate to be sufficiently close to U bal and why this estimate does not depend on the deterministic error, d, are better explained with the following argument. Observe that if we choose two indices α, β such that U α ≥ U β ≥ U bal , then with probability at least 1 − a(n), The intuition is that when such a condition is violated, we are close to the value which contributes equal to the deterministic and stochastic errors, which is U bal . Theorem 3.5 shows that the value U j , given by balancing principle (41), provides the same estimation error of U bal up to a constant. Note that all the inequalities in the following proofs are to be interpreted as holding with high probability. We can now prove the convergence rate for the adaptive estimator C 12 n, j .

End proof of Theorem 3.5.
Let us introduce the parameter choice U * By the definition of U j , we conclude that U j ≤ U * and thus, by the triangle inequality, The first inequality holds due to (41). Finally, by the monotonicity ofs n (·) and the fact that U * ≤ U bal , we get thats n (U * ) ≤s n (U bal ). Note the above inequality is uniform with respect to U j due to Lemma 2.10. The proof is now complete.

Numerical experiments
In this section we test the behavior of the covariance estimator in order to adapt the parameter U , i.e., the frequency for estimating the covariance. This means that we first have to simulate a bivariate Lévy process on [0, 1]. We will draw our observations from a process X t = B t + J t , where X t is a superposition of a two-dimensional Brownian motion B t and a two-dimensional jump process J t . Its jumps are driven by a two-dimensional r i -stable process for i = 1, 2 where r i ∈ (0, 2]. X t thus models a process with both diffusion and jump components. We assume the covariance matrix has the form C = 2 1 1 1 . In each run of our simulation, we will generate n = 1, 000 observations, corresponding to observations taken every 1/1, 000 over a time interval [0, 1] and U i ∈ [0.1, 50] for i ∈ {1, 2, . . . , 500}. We conduct several experiments for U , using different choices for the jump index activity. We start with jumps of finite variation, i.e., r i ∈ [0.1, 0.9], then we continue with jumps of infinite variation, i.e., r i ∈ [1.1, 1.8]. In the following Figs. 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, and 13a, we plot the empirical characteristic function φ n (U i ), the real and positive part of φ n (Ũ i ), log|φ n (U i )|, log|φ n (Ũ i )|, and log|φ n (Ũ i )| − log|φ n (U i )| against the parameter for adaptation U i .  show that the estimator is consistent to the true value when U i ranges from around 5 until 30. Recall that the true value is C 12 = 1. As expected the behavior of the estimator is quite erratic in the beginning and at the end of the interval, because the bias of the estimator is quite high. In principle, we have found that the "optimal" stopping index is for U i = 30.
Next, we plot bivariate Lévy processes with at least one jump component of infinite variation. Figs. 9b, 10b, 11b, 12b, 13b show the behavior of the estimator not being consistent with the theoretical one. Henceforth, Lepskiȋ's method cannot be applied, especially in the case of Figs. 9a and 13a.
Next, we consider some numerical experiments discussing how Algorithm 1 can be approximately implemented for the stopping rule. To illustrate the performance of the method for U or acle start in (4.6), we proceed as follows. Fix r 1 = 0.5, r 2 = 1.5, therefore we assume at least one jump component to be of infinite variation. Hence, the co-jump index activity is given by r = 1.5. In Fig. 14a, b, we observe that the estimator is consistent with the true one C 12 = 1, choosing U or acle start as is (4.6). This implies that the estimator is consistent with the true value even in the beginning of the method compared with the previous Figures, where the behavior of the estimator is quite erratic in the beginning.

Discussion
In this paper, we address the adaptive estimation of the covariance for a two-dimension Lévy process. We extend the minimax results obtained in Papagiannouli (2020), where the class of estimator requires prior knowledge of the process parameters which control the tuning parameter U n = c(r , M) √ n log n. We devise a fully data-dependent method based on a variant of Lepskii's principle, where a balance between bias and stochastic part of the estimator is obtained. We show, in Theorem 3.5 that the adaptive estimator achieves the minimax rates of convergence up to a logarithmic factor. It should be noted that such a logarithmic gap between the minimax and adaptive rates is well-known in the literature. Comments on the stochastic error. The construction of an adaptive estimator is complicated in the current context by the irregular behavior of the stochastic error. The bound (20) ensures that the stochastic error of the estimator is upper-bounded by the truncated characteristic function up to a logarithmic factor and a multiplicative constant C. This means that the bound depends on the random quantity |φ n (u)| but not on the unknown characteristic function φ n (u). With high probability, the inequality (6) allows us to interchange between the unknown characteristic function and the truncated empirical characteristic function, which is datadependent. A direct consequence of the above observation is that we can interchange with high probability between the empirical bounds n (U ) and the theoretical bound s n (U ) for the stochastic error. This procedure was not known up to know in the literature of nonparametric estimation for multi-dimensional Lévy processes.
In Fig. 1, an irregular behavior of the bound for the stochastic error is observed for small values of U , because of the empirical characteristic function in the denominator. To overcome this obstacle, we find an oracle start for U , so as to ensure a monotonically increasing bound for the stochastic error.
Comments on the balancing principle. The construction of a monotonically increasing bound for the stochastic error allows us to apply Lepskiȋ's principle for the adaptive estimator C 12 n,ĵ . As a rule, we use the empirical stochastic error so as our procedure to be completely data-dependent. In this way, we avoid using the deterministic upper bound, which depends on th unknown co-jump index activity r . Theorem 3.5 shows us that the balancing principle can adaptively achieve the best possible rate, which is near-optimal in a minimax sense.
Comments on the proofs. In Sect. 7 we prove the results of Sect. 2 concerning the uniform control of the deviation of the empirical characteristic function on the diagonal of R 2 . Through employing chaining arguments, we prove uniform convergence for the normalized characteristic function. We use concentration inequalities, such as Talagrand inequality, in order to prove uniform control for the empirical characteristic function on a countable set. Finally, we derive a uniform upper bound for the truncated empirical characteristic function after introducing favorable sets for the truncated estimator. The main difficulty here is to define the desired events where we could interchange with probability between truncated and empirical characteristic functions.

Proofs for Section 2
In this section, we provide proofs of the results which are presented in Sect. 2. The proof of Theorem 2.3 follows a chaining argument for the empirical processes. Thus, we recall the following definitions from empirical process theory.

Definition 7.1
We consider measurable functions f , g : F → R. For two such functions f , g we introduce the "bracket" notation: Definition 7.2 By the bracketing entropy number N [·] ( , G) of a class G we mean the minimal number N for which there exist functions f 1 , . . . , f N and g 1 , . . . , g N such that N [·] ( , G) is the minimal number of L 2 (P)-balls of radius which are needed to cover G. The class G is called bracketing compact if N [·] ( , G) < ∞ for any > 0. The entropy integral is defined by The convergence of the integral depends on the size of the bracketing numbers for → 0. Finally, a function F ≥ 0 is called an envelope function for G, if

Proof of Theorem 2.3
We decompose √ n( φ n (u) − φ n (u)) in its real and imaginary parts, We consider the class G, which consists of complexed valued functions, An application of Corollary 19.35 in Van der Vaart (2000) gives where F = 1 is an envelope function in G. It remains to prove that the bracketing integral on the right-hand side of (44) is bounded. We need to cover G with functions such that and find N , the minimum number to cover G. Inspired by Yukich (1985), we characterize the convergence of C n (u) in terms of the tail behavior of P. For every > 0, we set Furthermore, for all j, define the bracket functions for x = (x 1 , x 2 ) where u j = (U j , U j ) and x = (x 1 , x 2 ). We obtain for the size of the brackets that An analogous argument gives It remains to choose U j in such a way that the brackets cover G. We consider an arbitrary U ∈ R and any grid point U j . For a function g U (·) := w(U ) cos(U (x 1 + x 2 )) ∈ G to be contained in the bracket [g − j , g + j ], we have to ensure With the estimate |w(U ) cos( u j , x ) − w(U ) cos( u, x )| ≤ w(U ) + w(U j ) ∧ |w(U ) cos( u, x ) − w(U ) cos( u j , x )|1 [M,M] 2 (x) where Li p(w) is the Lipschitz constant of the weight function w. In the last inequality, we used the estimate |cos( u, x )−cos( u j , x )| ≤ 2|sin((x 1 +x 2 )(U −U j )/2)| ≤ 2M|U −U j |.
This completes the proof.

Proof of Lemma 2.10
To derive the desired upper bound we distinguish between two events, E and E , which are defined as in Lemma 2.6. So, First, we establish an upper bound for the first part of the right hand side sum. Lemma 2.9 on the event E yields that, On the contrary, the event E is negligible, using Lemma 2.6 and this concludes the proof.

Proof of Lemma 2.11.
This is a direct consequence of the Proof of Lemma 2.10. The statement of the corollary can be found in formulas (57), (61), and (64).