Asymptotically efficient estimation for diffusion processes with nonsynchronous observations

We study maximum-likelihood-type estimation for diffusion processes when the coefficients are nonrandom and observations occur in nonsynchronous manner. The problem of nonsynchronous observations is important when we consider the analysis of high-frequency data in a financial market. Constructing a quasi-likelihood function to define the estimator, we adaptively estimate the parameter for the diffusion part and the drift part. We consider the asymptotic theory when the terminal time point Tn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_n$$\end{document} and the observation frequency goes to infinity, and show the consistency and the asymptotic normality of the estimator. Moreover, we show local asymptotic normality for the statistical model, and asymptotic efficiency of the estimator as a consequence. To show the asymptotic properties of the maximum-likelihood-type estimator, we need to control the asymptotic behaviors of some functionals of the sampling scheme. Though it is difficult to directly control those in general, we study tractable sufficient conditions when the sampling scheme is generated by mixing processes.

The problem of nonsynchronous observations appears in the analysis of high-frequency financial data.If we analyze the intra-day stock price data, we observe stock price when a new transaction or a new order arrived.Then, the observation times are different for different stocks, and hence, we cannot avoid the problem of nonsynchronous observations.Statistical analysis with such data is much more complicated compared to the analysis with synchronous data.Parametric estimation for diffusion processes with synchronous and equidistant observations have been analyzed through quasi-maximum likelihood methods in Florens-Zmirou [4], Yoshida [18,19], Kessler [11], and Uchida and Yoshida [17].Related to the estimation problem for nonsynchronously observed diffusion processes, estimators for the quadratic covariation have been actively studied.Hayashi and Yoshida [6,7,8] and Malliavin and Mancino [12,13] have independently constructed consistent estimators under nonsynchronous observations.There are also studies of covariation estimation under the simultaneous presence of microstructure noise and nonsynchronous observations (Barndorff-Nielsen et al. [1], Christensen, Kinnebrock, and Podolskij [3], Bibinger et al. [2], and so on).For parametric estimation with nonsynchronous observations, Ogihara and Yoshida [16] have constructed maximum-likelihood-type and Bayes-type estimators and have shown the consistency and the asymptotic mixed normality of the estimators when the terminal time point T n is fixed and the observation frequency goes to infinity.Ogihara [14] have shown local asymototic mixed normality for the model in [16], and the maximum-likelihood-type and Bayes-type estimators have been shown to be asymptotically efficient.On the other hand, we need to consider asymptotic theory that the terminal time point T n goes to infinity to consistently estimate the parameter θ in the drift term.To the best of the author's knowledge, there are no study of the asymptotic theory of parametric estimation for nonsynchronously observed diffusion processes when T n → ∞.
In this work, we consider the asymptotic theory for nonsynchronously observed diffusion processes when T n → ∞, and construct maximum-likelihood-type estimators for the parameter σ in the diffusion part and the parameter θ in the drift part.We show the consistency and the asymptotic normality of the estimators.Moreover, we show local asymptotic normality of the statistical model, and we obtain asymptotic efficiency of our estimator as a consequence.Our estimator is constructed based on the quasi-likelihood function that is similarly defined to the one in [16] though we need some modification to deal with the drift part.To investigate asymptotic theory for the maximum-likelihood-type estimator, we need to specify the limit of the quasi-likelihood function.Then, we need to assume some conditions for the asymptotic behavior of the sampling scheme.In [16], for a matrix generated by the sampling scheme, the existence of the probability limit of n −1 tr((GG ⊤ ) p ) (p ∈ Z + ) is required, where (S n,l i ) i is observation times of X l and ⊤ denotes transpose of a matrix.Since we consider the different asymptotics, the asymptotic behavior of the quasi-likelihood function is different from that in [16].We also need to consider estimation for the drift parameter θ.Then, we need other assumptions for the asymptotic behavior of the sampling scheme (Assumption (A5)).Though these conditions for the sampling scheme is difficult to check directly, we study tractable sufficient conditions in Section 2.4.
As seen in [16], the quasi-likelihood analysis for nonsynchronously observed diffusion processes become much more complicated compared to synchronous observations.In this work, estimation for the drift parameter θ is added, and hence, we consider nonrandom drift and diffusion coefficients to avoid overcomplication.For general diffusion processes with the random drift and diffusion coefficients, we need to set predictable coefficients to use the matingale theory.However, the quasi-likelihood function loses a Markov property with nonsynchronous observations and the coefficients in the quasi-likelihood function contains randomness of future time.Then, we need to approximate the coefficients by predictable functions.This operation is particularly complicated.Moreover, approximating the true likelihood function by the quasi-likelihood function is much more difficult problem when we show local asymtotic normality and asymptotic efficiency of the estimators.Therefore, we left asymptotic theory under general random drift and diffusion coefficients as a future work.
The rest of this paper is organized as follows.In Section 2, we introduce our model settings and the assumptions for main results.Our estimator is constructed in Section 2.1, and the asymptotic normality of the estimator is given in Section 2.2.Section 2.3 deal with local asymptotic normality of our model and asymptotic efficiency of the estimator.Tractable sufficient conditions for the assumptions of the sampling scheme are given in Section 2.4.Section 3 contains the proofs of main results.Section 3.2 is for the consistency of the estimator for σ, Section 3.3 is for the asymptotic normality of the estimator for σ, Section 3.4 is for the consistency of the estimator for θ, and Section 3.5 is for the asymptotic normality of the estimator for θ.Other proofs are collected in Section 3.6.

Settings
For l ∈ {1, 2}, let the observation times {S n,l i } M l i=0 be strictly increasing random times with respect to i, and satisfy S n,l 0 = 0 and S n,l M l = nh n , where M l is a random positive integer depending on n.We assume that {S n,l i } 0≤i≤M l ,l=1,2 is independent of F T and α.We consider nonsynchronous observations of X, that is, we observe {S n,l i } 0≤i≤M l ,l=1,2 and {X l S n,l i } 0≤i≤M l ,l=1,2 .
We denote by • the operator norm of a matrix, and by ⊤ the transpose operator for a matrix or a vector.We often regard a p-dimensional vector v as a p × 1 matrix.For j ∈ N and a vector . For a set A in a topological space, let clos(A) denote the closure of A. For a matrix A, [A] ij denotes its (i, j) element.For a vector , and let

and let
we can calculate the covariance matrix of ∆X as As we will see later, we can ignore the drift term when we consider estimation of σ because the drift term converges to zero very fast.Therefore, we first construct an estimator for σ, and then construct an estimator for θ.Such adaptive estimation can speed up the calculation.
We define the quasi-likelihood function H 1 n (σ) for σ as follows.
We consider estimation for θ in the next.Let V (θ) = (V t (θ)) t≥0 be a two-dimensional stochastic process defined by V t (θ) = ( . We define the quasi-likelihood function H 2 n (θ) for θ as follows.
Then, the maximum-likelihood-type estimator for θ is defined by The quasi-(log-)likelihood function H 1 n is defined in the same way as that in [16].Since ∆X follows normal distribution, we can construct such a Gaussian quasi-likelihood function even for the nonsynchronous data.When the coefficients are random, though the distribution of ∆X is not Gaussian, such Gaussian-type quasi-likelihood function is still valid due to the local Gaussian property of diffusion processes.The Gaussian mean that comes from the drift part is ignored when we construct the quasilikelihood H 1 n .When we estimate the parameter θ for the drift part, we substruct the mean in X(θ) to construct the quasi-likelihood function H 2 n .Since the effect of the drift term on the estimation of σ is small, it works well to estimate σ in this way and then plug in θn to S n to construct the estimator for θ.Thus, we can speed up the calculation by separating the estimation for σ and θ.
Remark 2.1.H 1 n (σ) and H 2 n (θ) are well-defined only if det S n (σ) > 0 and det S n (σ n ) > 0, respectively.For the covariance matrix S n of nonsynchronous observations ∆X, it is not trivial to check these conditions.Proposition 1 in Section 2 of [16] shows that these conditions are satisfied if b t (σ) is continuous on [0, ∞) × clos(Θ 1 ) and inf t,σ det(b t b ⊤ t (σ)) > 0. We assume such conditions in our setting (Assumption (A1) in Section 2.2).

Asymptotic normality of the estimator
In this section, we state the assumptions of our main results, and state the asymtotic normality of the estimator.
For m ∈ N, an open subset U ⊂ R m is said to admit Sobolev's inequality if for any p > m, there exists a positve constant C depending U and p such that sup x∈U |u(x)| ≤ C k=0,1 ( |∂ k x u(x)| p ) 1/p for any u ∈ C 1 (U ).This is the case when U has a Lipschitz boundary.We assume that Θ, Θ 1 , and Θ 2 admit Sobolev's inequality.

Assumption (A4).
There exist positive constants a 1 0 and a 2 0 such that for l ∈ {1, 2} and any partition (s k ) ∞ k=0 ∈ S.Moreover, for any p ∈ N, there exists a nonnegative constant a 1 p such that max Assumption (A5).For p ∈ Z + , there exist nonnegative constants f 1,1 p , f 1,2 p , and as n → ∞ for any partition (s k ) ∞ k=0 ∈ S. Assumption (A4) corresponds to [A3 ′ ] in Ogihara and Yoshida [16].The functionals in (A4) and (A5) appear in H 1 n and H 2 n , and hence, we cannot specify the limits of H 1 n and H 2 n unless we assume existence of the limits of these functionals.It is difficult to directly check (A4) and (A5) for general sampling scheme.We study sufficient conditions for these conditions in Section 2.4.
Assumption (A6).The constant a 1  1 in (A4) is positive, and there exist positive constants c 3 and c 4 such that lim sup Assumption (A6) is necessary to identify the parameter σ and θ from the data.If a 1 1 = 0, then we have a 1 p = 0 for any p ∈ N.This implies that the non-diagonal components of the covariance matrix S n are negligible in the limit.Then, we cannot consistently estimate the parameter in ρ t (σ).This is why we need the assumption a 1 1 > 0 (see Proposition 3.2 and the following discussion to obtain the consistency).

Local asymptotic normality
Next, to discuss the optimality of the estimator, we discuss local asymptotic normality of the statistical model.In this section, local asymptotic normality of our model is shown, and the maximum-likelihoodtype estimator is shown to be asymptotically efficient.
Let N be the set of all positive integers.Let α 0 ∈ Θ, Θ ⊂ R d , and {P α,n } α∈Θ be a family of probability measures defined on a measurable space (X n , A n ) for n ∈ N, where Θ is an open subset of R d .As usual we shall refer to dP α2,n /dP α1,n the derivative of the absolutely continuous component of the measure P α2,n with respect to measure P α1,n at the observation x as the likelihood ratio.The following definition of local asymptotic normality is Definition 2.1 in Chapter II of Ibragimov and Has'minskiȋ [9].Definition 2.1.A family P α,n is called locally asymptotically normal (LAN) at point α 0 ∈ Θ as n → ∞ if for some nondegenerate d × d matrix ǫ n and any u ∈ R d , the representation as n → ∞.
For α ∈ Θ, let P α,n be the probability measure generated by the observation {S n,l i } i,l and {X Theorem 2.2.Assume (A1)-(A6).Then, {P α,n } α,n satisfies the LAN property at α = α 0 with The proof is left to Section 3.6.Theorem 11.2 in Chapter II of Ibragimov and Has'minskiȋ [9] gives lower bounds of estimation errors for any regular estimator of parameters under the LAN property.Then, the optimal asymptotic variance of ǫ −1 n (T n − α 0 ) for regular estimator T n is E d .Therefore, Theorems 2.2 ensures that our estimator (σ n , θn ) is asymptotically efficient in this sense under the assumptions of the theorem (we can show that (σ n , θn ) is regular by the proof of Theorem 2.2, (3.49), (3.9), (3.31), (3.35) and Theorem 2 in [10]).

Sufficient conditions for the assumptions
It is not easy to directly check Assumptions (A4) and (A5) for general random sampling scheme.In this section, we study tractable sufficient conditions for these assumptions.The proofs of the results in this section are left to Section 3.6.
Let q > 0 and N n,l t = M l i=1 1 {S n,l i ≤t} .We consider the following conditions for point process N n,l t .
Assumption (B1-q).sup For example, let ( N 1 t , N 2 t ) be two independent homogeneous Poisson processes with positive intensities λ 1 and λ 2 , respectively, and N n,l t = N l h −1 n t .Then (B1-q) obviously holds for any q > 0.Moreover, (B2-q) holds for any q > 0 since lim sup To give sufficient conditions for (A4) and (A5), we consider mixing properties of N n,l .That is, we assume condtions for the following mixing coefficient α n k .Let and let Proposition 2.1.Assume that (B1-q) and (B2-q) hold and that for any q > 0.Moreover, assume that there exist positive constants a 1 0 and a 2 0 , and a nonnegative constant as n → ∞ for p ∈ Z + , l ∈ {1, 2} and any partition (s k ) ∞ k=0 ∈ S.Then, (A4) holds.In the following, let ( N l t ) t≥0 be an exponential α-mixing point process for l ∈ {1, 2}.Assume that the distribution of ( N l t+t k − N l t+t k−1 ) 1≤k≤K,l=1,2 does not depend on t ≥ 0 for any K ∈ N and 0 Proposition 2.2.Assume that (B1-q) and (B2-q) hold and that (2.1) is satisfied for any q > 0.Moreover, assume that there exist nonnegative constants f 1,1 p , f 1,2 p , and f 2,2 as n → ∞ for p ∈ Z + and any partition (s k ) ∞ k=0 ∈ S.Then, (A5) holds.
By the above results, we obtain simple tractable sufficient conditions for the assumptions of the sampling scheme.

Preliminary results
For a real number a, [a] denotes the maximum integer which is not greater than a.
C denotes generic positive constant whose value may vary depending on context.We often omit the parameters σ and θ in general functions f (σ) and g(θ).
For a sequence p n of positive numbers, let us denote by { Rn (p n )} n∈N a sequence of random variables (which may also depend on 1 where , and let ∆ i,t U = ∆ ψ(i) ϕ(i),t U for t ≥ 0 and a two-dimensional stochastic process (U t ) t≥0 = ((U 1 t , U 2 t )) t≥0 .
Proof.Since all the elements of G are nonnegative, we have Since G⊤ = G , we obtain the conclusion.
Proposition 3.2.There exists a positive constant χ such that Proof.The proof is based on the ideas of proof of Lemma 5 in [16].Let and let Ãl k,p be obtained similarly to Ȧl k,p replacing ), then we have For l ∈ {1, 2}, let then we obtain i=1 be all the eigenvalues of G k G ⊤ k .Then, we have

Moreover, by setting g
k,0 , and F (x) = 1 − x + log x, we have Together with Lemma 11 in [16] and Moreover, since we have By a similar argument for F 2,k , there exists a positive random variable χ which does not depend on k nor n such that Together with (3.19), we have By letting n → ∞, (A4) and (A6) yield the conclusion.
Proof.It is sufficient to show (3.25) and (3.26).Let Thanks to Lemmas A.1, 3.6 and 3.3, the first term in the right-hand side is calculated as Moreover, Lemmas 3.3, 3.5, 3.7 and A.1 yield as n → ∞.Therefore, we have (3.26).
Therefore, by setting sufficiently large q so that nh 1+qη n → 0, we have E max Together with the assumptions, we obtain the conclusion.
Proof of Proposition 2.2.
We use the proof of Proposition 6 in [16] again.We define b n and t k the same as the previous proposition, and define Then, similarly to (31) in the proof, there exists η > 0 such that for any q ≥ 4, there exists C q > 0 such that q ≤ C q (p + 1) q−1 h qη n .
Together with the assumptions and similar estimates for I 1 E 1 (k) (GG ⊤ ) p GI 2 and I 2 E 2 (k) (G ⊤ G) p I 2 , we obtain the conclusion.

Proof of Proposition 2.3.
We can show the results by a similar approach to the proof of Proposition 9 in [16].Under (B2-q), P (N t+N hn − N t = 0) is small enough to estimate the denominator of i,j for sufficiently large n.Then, we obtain estimates for the numerator by using an inequality Proof of Lemma 2.1.
We only show max The other results are similarly obtained.
(2.1) is satisfied because α n k ≤ c 1 e −c2k for some positive constants c 1 and c 2 .Let τ l i be i-th jump time of N l .Then, we have S n,l i = h n τ l i .Let Ḡ be a matrix with infinity side defined by [(GG ⊤ ) p ] ii .