On the estimation of the jump activity index in the case of random observation times

We propose a nonparametric estimator of the jump activity index β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} of a pure-jump semimartingale X driven by a β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}-stable process when the underlying observations are coming from a high-frequency setting at irregular times. The proposed estimator is based on an empirical characteristic function using rescaled increments of X, with a limit that depends in a complicated way on β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} and the distribution of the sampling scheme. Utilising an asymptotic expansion we derive a consistent estimator for β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} and prove an associated central limit theorem.


Introduction
Recent years have seen a notable development in the statistical analysis of timecontinuous stochastic processes beyond the somewhat classical case of an Itô semimartingale driven by a Brownian motion.Generalisations of that class of processes are in fact manifold, and one can mention for example the analysis of integrals with respect to fractional Brownian motion (e.g.Brouste and Fukasawa (2018), Bibinger (2020)), the discussion of Lévy-driven moving averages (e.g.Basse-O'Connor et al. (2017), Basse-O'Connor et al. (2018)), inference on the solution of stochastic PDEs (e.g.Bibinger and Trabs (2020), Chong (2020), Kaino and Uchida (2021)) and the behaviour of integrals with respect to stable pure-jump processes (e.g.Heiny and Podolskij (2021), Todorov (2015)).All of the above results are concerned with highfrequency observations of the respective processes, always in the case of regularly spaced observations in time.
On the other hand, it is well understood that the underlying assumption of a regular spacing constitutes an ideal setting that simplifies the theoretical statistical analysis but is typically not met in practical applications.For this reason there has always been a lot of interest in understanding the impact of irregular sampling schemes on the proposed statistical methods.For semimartingales driven by Brownian motion one can mention Hayashi et al. (2011) and Mykland and Zhang (2012) among others, with an almost complete treatment in Chapter 14 of Jacod and Protter (2012).The case of Brownian semimartingales with jumps is treated for example in Bibinger and Vetter (2015) and Martin and Vetter (2019).All of the aforementioned papers deal with exogenous sampling schemes, i.e. when the observation times are essentially independent from the underlying processes.There is also limited research on endogenous observations times, mostly modelled via hitting times.See for example Fukasawa and Rosenbaum (2012) in the continuous case or Vetter and Zwingmann (2017) when additional jumps are present.
In this paper we are discussing the case of a near stable jump semimartingale observed at irregular times, i.e. the underlying process is given by (1.1) and the observation times follow a version of the restricted discretisation scheme from Jacod and Protter (2012), essentially providing observation times independent of X. Loosely speaking and made precise below, L is driven by a β-stable process and Y comprises the residual jumps while α and σ are appropriately chosen adapted processes.Our goal in this work is to provide a consistent estimator for β and to establish an associated central limit theorem.Statistical inference on β has already been conducted in Todorov (2015) and Todorov (2017) for regular observations while Jacod and Todorov (2018) provides the theory in a general model where microstructure noise is present and dominates the statistical analysis.
A first glance, our strategy to estimate β somewhat resembles the procedure from Todorov (2015), but there are some notable challenges that might occur in other situations as well.First, we are computing an empirical characteristic function L n (p, u) which is constructed from local increments of X (and with an auxiliary parameter p), but it is important here to rescale any of these increments relative to the length of the underlying time period.Secondly, one can show convergence of this empirical distribution function to a function L(p, u, β) which not only is a function of u, p and the unknown β but also also depends specifically on the distribution of the discretisation scheme.Unlike in Todorov (2015), where a consistent estimator for β is obtained via a suitable functional of empirical characteristic functions computed at arbitrary values u and v, we have to use sequences u n and v n converging to zero plus an asymptotic expansion to obtain a consistent estimator.This procedure also leads to a drop in the rate of convergence in the associated central limit theorem.
The remainder of this work is as follows: Section 2 deals with the assumptions on X as well as on the discretisation scheme.In Section 3 we establish our statistical method and we also present the main results on the asymptotic properties both of L n (p, u) and of β(p, u n , v n ).A thorough simulation is study is provided in Section 4 where we also discuss issues connected with the estimation of the asymptotic variance in the normal approximation.All proofs are gathered in Section 5.

Setting
Throughout this work we adopt the setting from Todorov (2015) and assume that we are given a univariate pure-jump semimartingale as defined in (1.1), i.e. that we observe where L and Y are pure-jump Itô semimartingales and α and σ are càdlàg.All processes are defined on some filtered probability space (Ω, F, (F t ) t≥0 , P).
Specific assumptions on these processes will be given below, and we start with conditions on the jump processes L and Y .Below, κ(x) denotes a truncation function, i.e. it is the identity in a neighbourhood around zero, odd, bounded and equals zero for large values of |x|.We also set κ (x) = x − κ(x), and whenever we discuss the characteristic triplet of a Lévy process it is to be understood as with respect to this choice of the truncation function.
Condition 2.1.We impose the following conditions on the processes L and Y : (a) L is a Lévy process with characteristic triplet (0, 0, F ) where the Lebesgue density of the Lévy measure F (dx) is given by for some β ∈ (1, 2) and some A > 0. The function h(x) satisfies for some β < 1 and all |x| ≤ x 0 , for some x 0 > 0.
(b) Y is a finite variation jump process of the form where µ Y (ds, dx) denotes the jump measure of Y and its compensator is given by ds ⊗ ν Y s (dx).The process is locally bounded for the parameter β from (a).
Condition 2.1 should be read in such a way that the pure-jump Lévy process L is essentially β-stable while all other jumps (both in L and in Y ) are of much smaller activity and will be dominated by the β-stable part at high frequency.Note that dependence between L and Y is possible, and this will hold for the jump parts of α and σ as well.
These assumptions on α and σ are extremely mild and covered by most processes used in the literature.
Our goal in the following is to estimate β based on irregular observations over the finite time interval [0, 1], say, and we will work in a setting where the observation times are typically random.In order to incorporate this additional randomness into the model we assume that the probability space contains a larger σ-field G, and we keep using F to denote the σ-field with respect to which X is measurable.The following condition is loosely connected with the restricted discretisation schemes introduced in Chapter 14.1 of Jacod and Protter (2012) but with a slightly different predictability assumption and additional moment conditions.
Condition 2.3.For each n ∈ N we observe the process X at stopping times 0 where ∆ n → 0 and (a) λ t is a strictly positive Itô semimartingale w.r.t. the filtration (F t ) t≥0 and fulfills the same conditions as σ t stated in Assumption 2.2; (b) (φ n i ) i≥1 is a family of random variables with respect to the σ-field G and independent of F; (c) φ n i ∼ φ for a strictly positive random variable φ with E[φ] = 1, and for all p > −2 the moments E [φ p ] exist.For all t > 0 we define (F n t ) t≥0 to be the smallest filtration containing (F t ) t≥0 and with respect to which all τ n i are stopping times.We also let N n (t) denote the number of observation times until t, i.e.
and of particular importance for us is the case t = 1 because N n (1) is the (random) number of observations over the trading day [0, 1] from which we construct the relevant statistics later on.Note that due to ∆ n → 0 we are in a high frequency situation where the time between two observations converges to zero while N n (1) diverges to infinity (both in a probabilistic sense).

Results
The essential idea from Todorov (2015) is to base the estimation of the unknown activity index on the estimation of the characteristic function of a certain stable distribution.We will essentially proceed in a similar way but with some subtle changes because the underlying sampling scheme is not regular anymore.On one hand, we have to account for the fact that the time between successive observations is not constant, while on the other hand the characteristic function not only involves this particular stable distribution but also the unknown distribution φ from Assumption 2.3.
Let us become more specific here.We assume that the probability space is large enough to allow for a representation as in Todorov and Tauchen (2012), namely that the pure-jump Lévy process L can be decomposed as where all processes on the right hand side are (possibly dependent) Lévy processes with a characteristic triplet of the form (0, 0, F ) for a Lévy measure of the form F (dx) = F (x)dx.For S the Lévy density satisfies F (x) = A|x| −(1+β) while the Lévy densities of Ś and S are Then S is strictly β-stable, and its characteristic function satisfies for some constant A β > 0 and any u, t > 0.
As a result of the previous decomposition (3.1) and since for some β < β and all |x| ≤ x 0 holds due to Condition 2.1, it is clear that the jump behaviour of L and thus of X is governed by the β-stable process S for small time intervals.This observation is the key to our following estimation procedure: Based on the high-frequency observations of X we will first estimate a function L(p, u, β) which, as noted before, is related to the characteristic function of S but involves the unknown distribution of φ as well.Here, p and u are additional parameters that can be chosen by the statistician.In a second step we will essentially use a Taylor expansion of L (as a function of u) around zero to finally come up with an estimator for β.
In the following, we denote with the ith increment of the process X, but where we have included an additional rescaling in order to account for the different lengths of the intervals in an irregular sampling scheme, and we occasionally also use ∆ n i S for the rescaled increment of the β-stable S. For any p > 0 and u > 0 we then set where the auxiliary sequence k n satisfies k n → ∞ and k n ∆ n → 0 and where is used to estimate the unknown local volatility σ.At first, it seems somewhat odd to include ∆ n in the definition of ∆ n i X because this quantity cannot be observed in practice.We will base our statistical procedure in the following on L n (p, u), however, and it is obvious from its definition that it is in fact independent of ∆ n as the latter appears as a factor both in the numerator and in the denominator.Thus we are safe to work with ∆ n i X, and its definition makes it easier to compare its results with the standard increment ∆ n i X = X τ n i − X τ n i−1 .These obviously coincide in the case of a regular sampling scheme.Note also that, even though its asymptotic condition is stated in terms of ∆ n , the choice of k n can in practice be based on the size of N n (1) which essentially grows as ∆ −1 n .The main part of the upcoming analysis is devoted to the study of the asymptotic behaviour of L n (p, u).Its definition together with the previous discussion suggests that its limit should involve the characteristic function of S, but it also becomes apparent that the limit cannot be independent of φ.We will prove in the following that the first order limit is with the constant C p,β being defined via and where φ (1) and φ (2) denote two independent copies with the same distribution as φ, defined on an appropriate probability space.For simplicity, we still use E[•] to denote the expectation on this generic space.The first main theorem then reads as follows: Theorem 3.1.Suppose that Conditions 2.1-2.3 are in place and let k n ∼ C 1 ∆ − n for some C 1 > 0 and some ∈ (0, 1).Then we have for any fixed u > 0 and any choice of 0 < p < β/2.
While this result is interesting in itself, at first glance it does not help much for the estimation of β because L(p, u, β) depends in a complicated way on the unknown distribution of φ.If we utilize the familiar approximation exp(y) = 1 + y + o(y) for y → 0, however, it seems reasonable to hope that the approximation holds for any choice of a small u > 0, which now is much easier to handle.Namely, an estimator for β is then based on an appropriate combination of two estimators L n (p, u n ) and L n (p, v n ) with u n → 0 and v n = ρu n for some ρ > 0. Precisely, we set which obviously is symmetric upon exchanging u n and v n .
Remark 3.2.We will often choose ρ = 1/2 in which case β(p, u n , v n ) ≤ 2 can be shown.This is of course a desirable property as it resembles the bound for the stability index β itself, but it also bears some restrictions regarding the quality of a limiting normal approximation for values of β close to 2. See Figures 1 and 2 below.
Before we discuss the asymptotic behaviour of the estimator β(p, u n , v n ) we will state a bivariate central limit theorem for L n (p, u n ) − L(p, u n , β) and L n (p, v n ) − L(p, v n , β) with u n and v n chosen as above.
Theorem 3.3.Suppose that Conditions 2.1-2.3 are in place and let k n ∼ C 1 ∆ − n for some C 1 > 0 and some ∈ (0, 1) as well as u n ∼ C 2 ∆ n for some C 2 > 0 and ∈ (0, 1).Suppose further that converges F-stably in law to a limit (X , Y ) which is jointly normal distributed (independent of F) with mean 0 and covariance matrix C given by Remark 3.4.The above choice of the parameters , and p is feasible even if we do not know β.It can easily be seen that e.g.= 1 3 , = 2 3 and any p ∈ ( 3 8 , 1 2 ) satisfies the conditions in Theorem 3.3.
, whereas the final column shows the true asymptotic variance from Theorem 3.5.In brackets the same results are given for ρ = 2.
The following result is the main theorem of this work and it provides the central limit theorem for the estimator β(p, u n , v n ).Its proof builds heavily on Theorem 3.3.Theorem 3.5.Under the conditions of Theorem 3.3 we have the F-stable convergence in law where X is a normal distributed random variable (independent of F) with mean 0 and variance A simple corollary is the consistency of β(p, u n , v n ) as an estimator for β.
Corollary 3.6.Under the conditions of Theorem 3.3 we have For a feasible application of Theorem 3.5 we need a consistent estimator for the variance of the limiting normal distribution which essentially boils down to the estimation of κ β,β .This problem will be discussed in the next section, alongside with a thorough analysis of the finite sample properties of β(p, u n , v n ).

Simulation study
This chapter deals with the numerical assessment of the finite sample properties of β(p, u n , v n ), and we also include a discussion regarding the estimation of the variance in the central limit theorem in order to obtain a feasible result.In the following, let W be a standard Brownian motion and L a symmetric stable process with a Lévy density Table 2: Results for ρ = 1/2 and ∆ −1 n = 10, 000 are shown.The second column shows the empirical mean of the N = 1000 samples of β(p, u n , v n ).The third one is the empirical variance of u , whereas the final column shows the true asymptotic variance from Theorem 3.5.In brackets the same results are given for ρ = 2.
for some β ∈ (1, 2).We then set and assume that we observe which obviously fulfills Conditions 2.1 and 2.2.For the observation scheme we choose with φ ∼ Exp(1) and with the starting values of the processes being α 0 = σ 0 = X 0 = λ 0 = 1.We assume that W is a standard Brownian motion as well, independent of W .The purpose of the minimum in the definition of φ in (4.1) is to ensure the (negative) moment condition from Assumption 2.3 (c) to hold, which (as can be seen from additional simulations) not only seems to be relevant in theory but in practice as well.Note also that the choice of λ 0 = 1 combined with the mean reversion of λ to 5 leads to pronounced changes in the distribution of the τ n i over time.For the simulation of X we use a standard Euler scheme, utilising Proposition 1.7.1 in Samorodnitsky and Taqqu (1994) to obtain symmetric stable random variables.We also set essentially in accordance with Remark 3.4.Below, we present the results for β ∈ {1.1, 1.3, 1.5, 1.7, 1.9} and ρ ∈ {1/2, 2}, generating N = 1000 samples, and we discuss both ∆ −1 n = 1000 and ∆ −1 n = 10, 000.Note that our choice of λ in (4.1) yields about N n (1) ≈ 520 observations in the first case, whereas N n (1) ≈ 5200 in the second one.
Table 1 shows mixed results but nevertheless allows to draw some conclusions: First, we see for a choice of ρ = 1/2 that the estimator for β behaves correctly on average whereas the (relative) difference between the true and the estimated variances grows with β.Note that the larger choice of ρ effectively corresponds to choosing u n twice as large.Hence Table 1 confirms empirically what is already known from the construction of the estimator: It relies on u n → 0, so a larger choice of ρ induces an additional bias.On the other hand, the rate of convergence to the normal distribution improves as u n becomes larger, and it also seems as if the quality of the variance estimation is better in this case.We follow this discussion with Table 2 which is constructed in the same manner as above for ∆ −1 n = 10, 000, and we basically see improvement across the board.Now, the bias for both ρ = 1/2 and ρ = 2 is very small, even for large values for β, and we can also observe that the approximated variance specifically for β ∈ {1.5, 1.7, 1.9} is much closer to the theoretical one than previously.
In a second step we present some QQ-plots to visualise the quality of the approximating normal distribution from Theorem 3.5.In this case we need to estimate the limiting variance in Theorem 3.5, and we note from eq. ( 3.3) in Todorov (2015) that holds.Hence, the only unknown quantities in the variance are β and κ β,β , and from Corollary 3.6 we only need a consistent estimator for κ β,β to obtain a consistent plug-in estimator for the limiting variance.By definition, and the càdlàg property of λ, a natural way to construct an estimator for κ β,β is to build it from sums of adjacent increments of the τ n i , rescaled by the length of the total time interval relative to the number of increments.Whenever β needs to be included, it is replaced by its consistent estimator βn = β(p, u n , v n ).The consistency of such an estimator for κ β,β is formally given in the following lemma.Its proof is rather straightforward but lengthy and therefore omitted.
n for some ψ ∈ (0, 1).Then where We use the same configuration of parameters as discussed earlier and further set r n = N n (1) 4/5 .Due to the choice of ρ = 1/2 and ρ = 2 Remark 3.2 applies.As noted before, this condition prevents the normal approximation from working well in the right tails, and as expected Figure 1 shows that this effect becomes more pronounced as β gets closer to the upper bound.For ∆ −1 n = 1000, this dubious tail behaviour already starts to appear for β = 1.5.Nevertheless, it should be noted that the quality of the distributional approximation increases visibly with the higher sample-size ∆ −1 n = 10, 000 for both choices of ρ. Figure 2 shows for instance that β = 1.5 is not really critical anymore, i.e. an increasing sample size allows for an accurate approximation of larger values of β.Also, a slight improvement from  ρ = 1/2 to ρ = 2 can be noted, in line with the previous discussion regarding the rate of convergence.
A natural way to further improve the finite sample properties is to conduct a bias correction for the higher order terms.Our current approach to estimate β relies on the approximation (3.3) while a more precise one would e.g.be a third order expansion of the form An estimator for β is then given by where deb(β, r, u n , C p,β ) estimates and can be constructed analogously to Lemma 4.1.Ideally, such a correction would allow for a bigger choice of u n in finite samples, thus leading to a better rate of convergence.
As an example, we discuss the case of ∆ −1 n = 1000 with β = 1.7 and ρ = 0.5, but we now chooce u n = N n (1) −0.28 and keep all other variables unchanged.In this case, mean and empirical variance become 1.7042 and 1.7970, both improving the corresponding values from Table 1.Also, the corresponding QQ-plot in Figure 3 clearly shows a better approximation of the limiting normal distribution, still with the original problems in the right tail.

Prerequisites on localisation
As usual one starts with localisation results, i.e. with results that allow to prove the main theorems under conditions which are slightly stronger than Conditions 2.1 and 2.2 for the processes involved and also stronger than Condition 2.3 on the sampling scheme.We begin with the additional assumptions on the processes.
Condition 5.1.In addition to Conditions 2.1 and 2.2 we assume that (a) |σ t | and |σ t | −1 are uniformly bounded; is bounded and the jumps of Y are bounded; (e) the jumps of Ś and S are bounded.
The same properties hold for the processes that govern λ.
The following lemma gives the formal result why we can assume in the following that the strengthened Condition 5.1 holds, namely because we are interested in X on the bounded interval [0, 1] only and eventually E p > 1 for a localising sequence, at least with a probability converging to 1.Its proof closely resembles the one of Lemma 4.4.9 in Jacod and Protter (2012) which is why we refer the reader to part 3) of their proof.
Lemma 5.2.Let X be a process fulfilling Condition 2.1 and 2.2.Then, for each p > 0 there exists a stopping time E p and a process X(p) such that X(p) and its components, α(p), σ(p) and Y (p), fulfill Assumption 5.1, and it also holds that X(p) t = X t for all t < E p .The sequence of stopping times can be chosen such that E p ∞ almost surely when p → ∞.
For all proofs concerning the asymptotics of L n (p, u) and β(p, u n , v n ) it becomes important that the process λ t driving the observation times τ n i is bounded from above and below.This means that we need a stronger assumption than just Condition 2.3 as well, and we also need to assume that for a given n the number of observations until any fixed T is bounded by a constant times ∆ −1 n T .
Condition 5.3.In addition to Condition 2.3 there exists some C > 1 such that (a) The process λ fulfills the same assumptions as σ in Condition 5.1, and in particular we have for all t > 0 (b) For any given n and any T > 0 we have Strengthening Condition 2.3 ultimatively results in changing the entire observation scheme which makes it somewhat harder to formally prove that such an assumption is indeed adequate.We begin with a result on the boundedness of λ as in part (a) above, and for every n let F n be a random variable which not just depends on n but also on the process X and on the discretisation scheme via λ and the variables φ n i .Likewise, a possible stable limit F of F n is assumed to depend on the same factors and is realised on an extension ( Ω, G, P) of the original probability space (Ω, G, P).Furthermore, for each C > 1 we define λ We then set E C to be the stopping time from Lemma 5.2 with C replacing p, and this lemma can be applied because, by Condition 2.3, λ is assumed to satisfy the same structural properties as σ.
and if furthermore Proof.Let E be the expectation w.r.t.P. We need to prove where Y is any bounded random variable on (Ω, G) and f is any bounded continuous function, and using it is sufficient to prove that each of the three summands vanishes.For the first one, by boundedness of Y and f and using (5.2), it is obvious that Here and below, K always denotes a generic positive constant.Thus, lim sup and the same proof applies for the third term.Finally, note that for each fixed C is an immediate consequence of (5.1).
By construction λ t and λ (C) t coincide on the set {E C ≤ T } for all 0 ≤ t ≤ T .As our estimators only deal with observations up to a fixed time horizon T (in our specific case the convenient but arbitrary T = 1) it is clear that condition (5.2) is indeed met.Therefore we may assume for the following proofs that part (a) of Condition 5.3 is in force and only prove (5.1) under this strengthened assumption.
Finally, we need to explain why we can assume that part b) of Condition 5.3 holds as well.Here we refer to part 2) of the proof of Lemma 9 in Jacod and Todorov (2018) where a family of discretisation schemes with the desired properties is constructed and where each member of the family coincides with the original sampling scheme up to some random time S n n .As it is shown that these times converge to infinity almost surely, the same argument as before allows to assume part b) of Condition 5.3 without loss of generality.
For further information on random discretisation schemes one can consult Section 14.1 in Jacod and Protter (2012) where a slightly different version of Lemma 5.4 and other important properties of objects connected to these schemes are proven.We want to name one of those properties in particular because we will use it repeatedly in the following chapters: (14.1.10)in Jacod and Protter (2012) proves that for all t ≥ 0 we have which basically allows us to treat the random N n (1) like the deterministic ∆ −1 n in all asymptotic considerations.

A crucial decomposition
The proofs of Theorems 3.1 and 3.3 rely on a simple decomposition which allows us to identify the terms that play a dominant role in the asymptotic treatment.Precisely, we have where drives the asymptotics while the residual terms are given by Here we have set and we use the short hand notation ].We also introduce the notation and set We will start with a discussion of the asymptotic orders of the residuals for which we always assume that Conditions 5.1 and 5.3 as well as k n ∼ C 1 ∆ − n for some C 1 > 0 and some ∈ (0, 1) are in place.Naturally, we need some preparation to obtain asymptotic negligibility and we will start with a lemma containing a series of bounds for moments of certain increments of (often integrated and rescaled) processes.We will not give a proof of this result but refer to Todorov (2015) and Todorov (2017).In fact, the techniques used for the proof will for most parts resemble the ones given therein.The main difference is that our arguments often involve the additional process λ which sometimes complicates matters considerably.
(f ) For every q > 0 and every l ∈ {0, 1} we have (g) For every q ∈ (0, 2] we have A second lemma, again without proof, discusses bounds for moments of increments of semimartingales.Again, it has some resemblance to results in Todorov (2015) and Todorov (2017) but its proof is slightly more involved due to the random observation scheme.
Lemma 5.6.Let A be a semimartingale satisfying the same properties as σ in Assumption 5.1.Then, for any −1 < p < 1 and any y > 0 and with K possibly depending on y we have After the presentation of these auxiliary claims, we focus on results which directly simplify the discussion of the asymptotic negligibility of the residual terms.We begin with a results that helps in the treatment of R n 1 .
Lemma 5.7.Let ι > 0 and 0 < p < β 2 be arbitrary.Then, for any i ≥ 2, we have Proof.The proof relies on bounds for moments of several stochastic integrals, mostly connected with the jump process L t and its parts.Using (3.1) and (5.5) we may write We obviously have and since p < β/2 < 1 holds, the inequality which is any easy consequence of (d), (e) and (f) of Lemma 5.5, is enough to fully treat the first term on the right hand side.For the second term we will use parts (a), (c) and (g) of Lemma 5.5 plus the algebraic inequality which holds for any > 0 and p ∈ (0, 1] and a constant K that does not depend on and which we apply with a = ∆ . We start with the latter two terms and let 0 < < 1 be arbitrary.Then Markov inequality in combination with Hölder inequality first gives (with a slight abuse of notation but remember that ι > 0 can be chosen arbitrarily) and then as the upper bound in both terms.Finally, we have to distinguish between p > 1/β and p ≤ 1/β.In the first case a simple application of Hölder inequality gives for our specific choice of ι > 0. Note that (a) in Lemma 5.5 was indeed applicable as p > 1/β ensures (p − 1)β/(β − 1) > −1.In the second case we set as above and use Markov inequality with r = 1+ι β − p. Then and as now p + r > 1/β by construction, the same proof as in the first case proves this term to be of the order ∆ (with the same abuse of notation) ends the proof.also need to control the denominators in R n 1 and R n 2 to make sure that they are bounded away from zero with high probability.To this end, we need two auxiliary results, and in both cases we let i ≥ k n + 3 and 0 < p < β 2 be arbitrary.The first result deals with the variables V n i (p) which we introduced before, and it will be used for the treatment of R n 3 later on as well.Its proof is omitted as it is essentially the same as the one for equation (9.4) in Todorov (2015) and exploits standard inequalities for discrete martingales.
Lemma 5.8.Let k n ∼ C 1 ∆ − n for some C 1 > 0 and ∈ (0, 1).Then for all 1 ≤ x < β p we have n where the constant K x might depend on x.
The second result deals with the set and bounds its probability.
Lemma 5.9.For every fixed ι > 0 we have n where the constant K ι might depend on ι.
Proof.Lemmas 5.5 and 5.7 give with |σ| p i being defined as |σλ| p i but with λ = 1.Now, it is a simple consequence of Conditions 5.1 and 5.3 that both |σλ| p i and |σ| p i are uniformly bounded from above and below.Using α n → 0 and ∆ 1/2 n → 0 there then exists some n 0 ∈ N such that for all n ≥ n 0 .For these n, n from Lemma 5.8, with a choice of x arbitrarily close to β p .The claim follows.
Finally, we provide two lemmas that the discussion of R n 4 .The first one gives an alternative representation for the limiting variable L(p, u, β).
Lemma 5.10.It holds that Proof.For the sake of simplicity, let us assume that the probability space can be even further enlarged to accomodate three independent random variables S (1) , S (2)  and S (3) , independent of G, all with the same distribution as S 1 , i.e. distributed as a Lévy process with characteristic triplet (0, 0, F ) at time 1, F (dx) = F (x)dx with F (x) = A|x| −(1+β) .Using standard properties of stable processes (see e.g.Section 1.2 in Samorodnitsky and Taqqu (1994)), for constants σ 1 , σ 2 ∈ R we have 3) , and for our original process S t the stability relation (S t+r − S t ) ∼ r 1/β S 1 for all r, t ≥ 0 holds as well.Because the increments of the process (S t ) t≥τ n i−2 are independent of 1) .Thus, for all Borel sets M we obtain e.g.
using that we assume that all moments of (φ n i ) q for q ∈ (−2, 0) exist, as well as where 1) , S (2) are all independent of F τ n i−2 and of each other.Thus (5.9) after successive conditioning.
Lemma 5.10 it is clear that the treatment of R n 4 hinges on the question how well |σλ| p i can be approximated by |σ for some C 1 > 0 and ∈ (0, 1).Then for all y > 1 we have Proof.We start first with a proof of (5.10) For the first claim, note that the decomposition holds.Now, we can apply Lemma 5.6 for each of the three terms, for the third one together with Cauchy-Schwarz inequality, and using boundedness of σ and λ from below and above plus the fact that 1 < β < 2 and p < β/2 < 1 guarantee the exponents to lie between −1 and 1.A similar reasoning works for the second claim.
We then obtain easily, and by convexity of gives the claim.

Bounding the residual terms
In follows, let u > 0, 0 < p < β 2 and ι > 0 be arbitrary but fixed, and we always assume that k n ∼ C 1 ∆ − n for some C 1 > 0 and ∈ (0, 1).In the following we also use the notation n = ∆ −1 n for convenience.
Lemma 5.12.We have where the constant K might depend on p, β and ι but not on u.
Proof.We decompose , and as cos(x) is bounded we have for any i ≥ k n + 3 by Lemma 5.9 Thus, using Assumption 5.3 we obtain (5.11) On the other hand, on ( V n i (p) is now likewise with a constant possibly depending on p and β.Let us use the notation from the proof of Lemma 5.7 and write Using the boundedness of ∆ and the inequality | cos(x)−cos(y)| ≤ 2|x − y| p for all x, y ∈ R and p ∈ (0, 1] we have We then get parts (c) and (g) of Lemma 5.5 as well as (5.6) which holds for any 0 < p < 2. The claim now follows from the same reasoning as in (5.11), with an additional step of successive conditioning.
Lemma 5.13.We have where the constant K might depend on p, β and ι but not on u.
Proof.We get n with the same arguments that led to (5.11).Similar arguments as in the previous proof plus Assumption 5.1, boundedness of ∆ and β > 1 to ensure the existence of moments give The expectation of the right hand side is bounded by Ku∆ 1/2 n , using (5.10) and successive conditioning.We then obtain n as in the previous proof.
Lemma 5.14.We have where the constant K might depend on p, β and ι but not on u.
Proof.Again we obtain n as in (5.11), this time using boundedness of x → exp(−x) on the positive halfline.
We then use a first order Taylor expansion of the (random) function (5.12) (defined for x > 0) and get (5.13) for any positive F τ n i−2 -measurable random variable X where the constant K does not depend on u.Using the boundedness of ∆ where the last line holds by Lemma 5.8 and (5.8) plus the definition of α n .
Lemma 5.15.We have where the constant K might depend on p and β but not on u.
Proof.Recall the function f n i,u from (5.12) and set (5.16) In the sequel we prove the same rate of convergence for all three terms on the right hand side.Starting with (5.14), from the definition of r 4 i (u) we have using the independence of φ n i−1 and φ n i from F τ n i−2 .A second order Taylor expansion, possible by the usual boundedness assumptions, now gives is an easy consequence of (5.13) and Lemma 5.11 together with the reasoning from (5.11) that the expectation in (5.14) is bounded by Ku β k n with K as in the statement of the lemma.
For (5.16), boundedness of all processes involved gives by (5.13) whereas Lemma 5.11 proves (5.17) Finally, for the treatment of (5.15) we have to be a little more specific.A simple computation proves for some K as above.Thus, Ξ i = r i,n − E i−kn−3 [ r i,n ] we can bound (5.15) by (5.19) An application of the Cauchy-Schwarz inequality bounds (5.18) by the product of Lemma 5.11 together with (5.17) proves and from Lemma 5.6 we have with the same reasoning as when establishing (5.10).Boundedness of L(p, u, β) and N n (1) ≤ C∆ −1 n now prove that (5.18) is bounded by Ku β k n .For (5.19) we use an argument involving discrete martingales, and we first change the upper summation bound from N n (1) to N n (1) + 2k n + 5 as the corresponding error term is of the order Ku β k n by boundedness of σ and λ, so similar to the one from (5.18).The martingale argument is explained the easiest if we first pretend that the factors Ku for every 1 ≤ < i, using the fact that by construction one knows at time τ n (i−1)(kn+3)+(j−1) whether the event {τ n (i−1)(kn+3) ≤ 1} has happened or not.The latter event is equivalent to {i − 1 ≤ N n (1)/(k n + 3) }, so after conditioning on F n (i−1)(kn+3)+(j−1) the claim follows from E n r−kn−3 [Ξ r ] = 0 for every r.Thus, by (5.20), Cauchy-Schwarz inequality and N n (1)/(k n + 3) + 1 ≤ Kn/k n we obtain As the sum over the residual terms in (5.21) has at most k n + 2 elements we obtain , we just get an additional factor u β as usual.This is due to boundedness of σ and λ again (and of the function L) plus the fact that measurability w.r.t.F τ n i−kn−3 keeps the martingale property from above intact.
Lemma 5.16.We have where the constant K might depend on p, β and ι but not on u.
Proof.As usual we have and for the analogous sum involving 1 (C n i ) C , as in the previous proof, we may change the upper summation index to N n (1) + 2 without loss of generality.Now, note that by Lemma 5.10 and using the same arguments for z i (u n ) we have Thus for all i, j ≥ k n + 3 with j − i ≥ 2 we have Now, with (5.9) plus the standard inequalities | cos(x) − cos(y)| 2 ≤ 4|x − y| p and | exp(−x) − exp(−y)| 2 ≤ |x − y| p , which hold for all p ∈ (0, 2], we obtain and part (a) of Lemma 5.5 together with E (φ n i ) 1−β + (φ n i−1 ) 1−β < ∞ and the F n i−2 -measurability of the other terms proves that the term above is bounded by us for a moment only discuss the first term.On (C n i ) C and using Condition 5.1, ∆ −p/β n V n i (p) as well as all quantities involving σ and λ are bounded from above and below by K. Thus, together with |x q − y q | ≤ q| max(x, y) q−1 ||x − y| for q ≥ 1 we get With a similar argument for the second term we then obtain using Lemma 5.8, (5.7) and Lemma 5.11.A similar result holds with the exponent being replaced by β − ι > 1.The claim now follows easily.

Proof of Theorem 3.1
As discussed before, we may assume Conditions 5.1 and 5.3 to hold.First, we have and it is a simple consequence of (5.3), the decomposition in (5.4) and Lemmas 5.12 to 5.16 that holds.Note that we have convergence to zero of all bounds in Lemmas 5.12 to 5.16 as ∆ n → 0, k n → ∞ and k n ∆ n → 0 by assumption.
The proof of follows along the lines of Lemma 5.16.We may first change the upper summation index to N n (1) + 2 which does not change anything asymptotically because of (5.3), and we then have for all i, j ≥ k n + 3 with j − i ≥ 2 using Lemma 5.10.Thus, we obtain and the claim follows from (5.3) again.

Proof of Remark 3.2
From the definition of β(p, u n , v n ) it follows easily that the claim β(p, where we have used the shorthand notation a i = (5.23) Using properties of the cosine and inserting ρ = 1/2 we note g 1 2 (x) = g 1 2 (−x) and g 1 2 (x) = g 1 2 (x + 4π).For (5.23) to hold it then suffices to show g 1 by properties of the trigonometric functions.The claim follows from g 1 2 (0) = 0.

Proof of Theorem 3.3
We will assume throughout that Assumptions 5.1 and 5.3 are in place, and we set k n ∼ C 1 ∆ − n for some C 1 > 0 and some ∈ (0, 1) as well as u n ∼ C 2 ∆ n for some C 2 > 0 and ∈ (0, 1).
Lemma 5.17.Under the conditions We see in particular that exchanging roles of u n and v n is irrelevant to the distributions.Then, with Lemma 5.10 and its proof, We now have for example The same argument for the other terms, including the usage of u n → 0 when dealing with the product of the two expectations, gives Similar arguments lead to are asymptotically equivalent.We note that N n (1) + 1 is a (F τ n i ) i≥1 -stopping time and therefore in order to apply Theorem 2.2.15 in Jacod and Protter (2012) it is sufficient to show that for q = 2 1− β + 2 > 2, hold, where M is either one of the Brownian motions W , W or W or a bounded martingale orthogonal to any of the Brownian motions.First note that Lemma 5.10 gives E n i−1 ζ n i+1 = (0, 0) and therefore E n i−1 [η n i ] = (0, 0) by definition as well.(5.24) then holds.Also, < 1 β gives Thus, the uniform boundedness of z (•) and Assumption 5.3 give which proves (5.26).To show (5.25) we first recall E n i−1 [η n i ] = (0, 0) and then a simple calculation yields , using iterated expectations, E n i−1 ζ n i+1 = (0, 0) and the fact that the distribution of E n i ζ n,j i+1 E n i ζ n,k i+1 is independent of F τ n i−1 .We then prove Nn(1) i=kn+3 i=kn+3 (5.28) everything for an arbitrary m.
We give the arguments for the first convergence result above in detail, the other ones can be treated in exactly the same way.We set X n,1 E n i−1 ((X n,1 i ) jk ) 2 P −→ 0, j, k = 1, 2.
(5.30) 2.2.12 in Jacod and Protter (2012) (plus the usual asymptotic negligibility when adding finitely many summands) finally gives the claim.Now, note first that the distribution of X n,1 i is independent of F τ n i−1 .Therefore i and E n i−1 (X n,1 i ) 2 = E (X n,1 i ) 2 .
(5.29) is then an easy consequence of where we used (5.3) and Lemma 5.18 to prove that the limit on the right hand side exists.To show (5.30) we use Jensen inequality to obtain Finally, to prove (5.27) we use Theorem 4.34 in Chapter III of Jacod and Shiryaev (2003).We set for k n + 3 ≤ i ≤ N n (1) and t ≥ τ n i−2 : where (H t ) t≥τ n i−2 is a predictable process.Then where we used that the martingale (S t ) t≥0 is orthogonal to M in all cases.
) t≥τ n i−2 is the filtration generated by H and σ S r : t ≥ r ≥ τ n i−2 .Now (S t ) t≥τ n i−2 is a process with independent increments w.r.t. to σ S r : r ≥ τ n i−2 .For all t ≥ τ n i−2 we set K t := E [ζ i |H t ] and note that K τ n i = ζ i due to ζ i being H τ n imeasurable.Then with the aforementioned Theorem 4.34 we have 2 and Ht := H σ S r : t ≥ r ≥ τ n i−2 ,i.e.(H t