Quasi-likelihood analysis and its applications

The Ibragimov–Khasminskii theory established a scheme that gives asymptotic properties of the likelihood estimators through the convergence of the likelihood ratio random field. This scheme is extending to various nonlinear stochastic processes, combined with a polynomial type large deviation inequality proved for a general locally asymptotically quadratic quasi-likelihood random field. We give an overview of the quasi-likelihood analysis and its applications to ergodic/non-ergodic statistics for stochastic processes.


Introduction
We consider an m-dimensional semimartingale Y = (Y t ) t∈[0,T ] having a decomposition (1.1) on a stochastic basis ( , F , F, P) with a filtration F = (F t ) t∈ [0,T ] . The time horizon T is fixed. The process w = (w t ) t∈ [0,T ] is an r-dimensional Wiener process with respect to F, and σ : R d × → R m ⊗R r is a given function. The process b = (b t ) t∈ [0,T ] and X = (X t ) t∈ [0,T ] are respectively m-dimensional and d-dimensional progressively measurable processes. The process b is unobservable but we observe the data (X t j , Y t j ) j=0,1,...,n for t j = t n j = j T /n. We aim at estimation of the unknown parameter θ ∈ , a p-dimensional bounded open set.  (X t j−1 , X t j , θ) if the distribution of the initial value X 0 does not depend on θ , where p h (x, y, θ) is the transition density of the diffusion process. However, we do not know about the function b o by assumption, so we cannot use the function L n for estimation. Even if we know the function b o , it is not in general easy to compute p h (X t j−1 , X t j , θ) since it is a solution to a partial differential equation, besides, optimization is necessary after getting L n (θ ). The situation is more severe when Y = X since no information about the structure of X is given. In any case, it is more realistic to replace the likelihood function L n by some other easily handled utility function.
A candidate of the utility function is the quasi-log likelihood function H n defined by where S = σ σ , denoting the matrix transpose, and j Y = Y t j − Y t j−1 . The brackets [· · · ] stand for the inner product. For example, M[v ⊗ ] = i, j M i, j v i v j for a square matrix M = (M i, j ) and a vector v = (v i ). A quasi-maximum likelihood estimator θ n is obtained by maximizing H n with respect to θ . Then θ n is asymptotically mixed normal and asymptotically efficient in Hájek's sense. To establish this property, we need to show that the risk function, e.g., the L p -risk E θ | √ n( θ n −θ)| p (locally) asymptotically attains the lower bound of risks, in particular, that the L p -norm of (the scaled error of) θ n is bounded.
By using differentiability in θ of H n , we can derive a stochastic expansion of θ n , namely, u n := √ n( θ n − θ * ) = −1 M n + n −1/2 N n , where θ * is the true value of θ and is the random Fisher information matrix define by (5.1). The variable M n is given by where I r is the r-dimensional identity matrix and j w = w t j −w t j−1 , and the variable N n is of O p (1) having a complicated expression involving multiple stochastic integrals. Asymptotic expansion of the distribution of u n can be obtained if we apply the martingale expansion by Yoshida (2013) (updated by arXiv:1210(updated by arXiv: .3680v3 (2012). More precisely, we can derive asymptotic expansion of the joint distribution of (Z n , ) for Z n = M n + n −1/2 N n and next obtain the expansion for u n by transforming (Z n , ) into it. In this procedure, the Malliavin calculus is applied and then we need L p -boundedness of u n . As a matter of fact, boundedness of a smooth deformation of u n in the Sobolev space D s, p (R p ) can be shown with the aid of the L p -boundedness of u n . Relatively high order of integrability (i.e., a large p) is necessary to carry out this plot because the integration-by-parts formula in the Malliavin calculus requires algebras of variables. Asymptotic expansion of Skorokhod integrals recently presented by Nualart and Yoshida (2019), combined with Yoshida (2020), is also applicable to this problem in place of the martingale expansion. The L p -boundedness of an estimator is a key to develop the asymptotic theory. The celebrated Ibragimov-Khasminskii theory (Ibragimov and Khas'minskii 1973a, b;Ibragimov and Has'minskiȋ 1981) answered this important question. Epoch-making was their introduction of the notion of weak convergence of the likelihood ratio random filed, from which the asymptotic properties of the likelihood estimators (i.e., the maximum likelihood estimator and the Bayesian estimator) are induced in a unified way. The likelihood ratio random field Z n is defined by Z n (u) = L n (θ * +a n u)/L n (θ * ) with the likelihood function L n and a scaling matrix a n . They proved the convergence in a certain space of continuous functions on R p for suitably extended Z n . When Z n is locally asymptotically normal (Le Cam 1960), the limit becomes with the Fisher information matrix I (θ * ) at θ * and a random vector ∼ N p (0, I (θ * )). Thanks to the functional convergence (1.3), roughly speaking, we can apply the argmax θ operator to the both sides of (1.3) to obtain the convergenceû n = a −1 n θ n − θ * → d I (θ * ) −1 . For the Bayesian estimator with respect to the quadratic loss function and a prior density , the error of θ n has the expression u n := a −1 n θ n − θ * = Z n (u) (θ * + a n u)du −1 uZ n (u) (θ * + a n u)du.
Then the convergence (1.3) suggests the joint convergence Z n (u) (θ * + a n u)du, uZ n (u) (θ * + a n u)du and hence if (1.4) holds. One crucial point we should pay attention to is that the integrals appearing in (1.5) are essentially integrals over non-compact space since the domain tends to R p as n → ∞ even if is bounded. To control these integrals, we need fast decay of the random field Z n . The Ibragimov-Khasminskii theory features the large deviation inequality sup n P sup u:|u|≥r for the likelihood ratio random field Z n , where α is a positive constant and e(r ) is a function of the form c 0 e −c 1 r c 2 or c 0 r −L . Then the L p -boundedness of u n is a consequence of (1.6).
The inequality (1.6) is extremely important since it quantitatively estimates the tail of Z n . As well as the maximum likelihood estimator, the L p -boundedness of u n also follows from (1.6). Kutoyants (1984Kutoyants ( , 1994Kutoyants ( , 2004Kutoyants ( , 2012 successfully applied the Ibragimov-Khasminskii theory to semimartingales. Motivated by his pioneering works, the author tried to approach inference for stochastic processes by means of a quasi-likelihood (Yoshida 1990(Yoshida , 2011(Yoshida , 2021. In the applications of this article, the statistical models are differentiable and the used limit theorems are standard. So our interest is in the large deviation inequality (1.6). The aim of this paper is to give an overview of the quasi-likelihood analysis 1 and its applications.

Quasi-likelihood analysis
In this section, we will recall a simplified version of the quasi-likelihood analysis. We refer the interested reader to Yoshida (2021Yoshida ( , 2011 for details.

Polynomial type large deviation inequality
We will work with a sequence of random fields H T : × → R (T ∈ T) for a probability space ( , F , P) and a bounded open set in R p . The set T is driving the asymptotic theory, supposed to satisfy T ⊂ R and sup T = ∞. As for the regularity of the random field H T , since most basic is the case where the map θ → H T (ω, θ ) ∈ R is of class C 2 ( ) for a.s ω, we suppose this regularity and also that the map θ → H T (ω, θ ) ∈ R is continuous for a.s ω. Though this smoothness assumption is much stronger than that assumed in the Ibragimov-Khasminskii theory and Le Cam's LAN theory, it simplifies our theory and still applies to many applications in practice.
The targeted value of θ is denoted by θ * ∈ . The limit of the observed information is denoted by a p × p random symmetric matrix . The minimum eigenvalues of a symmetric matrix M is denoted by λ min (M). We need identifiability of θ * expressed in terms of a random field Y : × → R, that will be related with Y T at (2.4), and non-degeneracy of as follows.
[T1] (i) There exists a positive random variable χ 0 and the following conditions are fulfilled.
Condition [T 1] is almost trivial in ergodic statistics because χ 0 is a constant and is a deterministic matrix. We remark that χ −1 0 ∈ L ∞− = ∩ p>1 L p under [T 1] (i-2). Moreover, Let a T be a p × p regular matrix such that |a T | → 0 as T → ∞. The matrix a T will specify the rate of convergence of the QLA estimators.
for all T ∈ T, for some constant C 0 ∈ [1, ∞). A typical example is n for b T and n −1/2 I p for a T , I p being the identity matrix. Define a p-dimensional random variable T and a p × p random matrix T (θ ) by respectively. Consistency of an estimator based on the random field H T is established when Y is associated with the random field We assume the following set of conditions.
[T2] There exist positive numbers 1 and 2 such that the following conditions are satisfied for all p > 1: To verify Conditions [T2] (ii) and (iii), one can apply Sobolev's embedding inequality, the Garsia-Rodemich-Rumsey inequality, or Kolmogorov's continuity theorem.
Let U T = {u ∈ R p ; θ * + a T u ∈ }. We define the random field Z T on U T by for u ∈ U T . When H T is a log likelihood function, the random field Z T (u) is the likelihood ratio between θ = θ * + a T u and θ * . For r T (u) defined by , the random field Z T admits the representation with r T (u) → p 0 as T → ∞ for every u ∈ R p , i.e., the random field Z T is locally asymptotically quadratic (LAQ) at θ * . The property (2.6) suggests that the tail of the random field Z T is light like a Gaussian kernel. This fact is stated as a polynomial type large deviation inequality. Write V T (r ) = {u ∈ U T ; |u| ≥ r } for r > 0.
for all r > 0 and T ∈ T. The supremum of the empty set should read −∞.
The polynomial type large deviation inequality (2.7) ensures the L p -boundedness of the scaled error of the QLA estimators. Usually, a common L p -inequality based on some kind of orthogonality such as a martingale or mixing property can veirfy Condition [T 2], combined with an uniform estimate like Sobolev's inequality. This enables flexible applications of the scheme to nonlinear stochastic processes, as we will see in various applications later.

Quasi-maximum likelihood estimator
as T → ∞, for some p-dimensional random vector on some extension of ( , F , P).
Such a measurable function always exists, which is ensured by the measurable selection theorem applied to H T , a continuous random filed on the compact . We do not assume

Theorem 2.2 Suppose that Conditions [T 1] and [T 2] are satisfied and that the convergence
for u ∈ R p . From (2.7) and (2.8), we obtain the convergence is equipped with the supremum norm. Roughly speaking, by applying the argmax -operator to the both sides of (2.11), we obtain the convergence u M T → d s (G) u = −1 . This is a smart way but one can bypass the discussion on the space C(R p ). See e.g. Yoshida (2021).

Quasi-Bayesian estimator
The mappingθ is called a quasi-Bayesian estimator (QBE) with respect to the prior density . The QBÊ θ B T is the Bayesian estimator with respect to the quadratic loss function when H T is the log likelihood function. The QBEθ B T takes values in the convex-hull of , therefore the values are bounded but may be outside of . It is assumed that is continuous and 0 As seen above, the QLA is constructed in an abstract way, and Conditions [T 1] and [T 2] are easy to verify. By this reason, the QLA theory has been widely applied, in particular, to nonlinear stochastic processes. We will discuss several applications in the following sections.

Quasi-likelihood analysis for ergodic diffusion processes
Suppose that a d-dimensional stationary mixing diffusion process X = (X t ) t∈R + satisfies the stochastic differential equation We also assume that the functions a and b are continuously extended to the boundaries R d × ∂ i , respectively, and that b is uniformly non-degenerate.
The process X is observed at discrete times t j = jh ( j ∈ {0, 1, ..., n}) for a positive value h = h n depending on n. We assume that h → 0, nh → ∞ and nh 2 → 0 as n → ∞, that is, we have high frequency and long-run data. For estimation of the parameter θ = (θ 1 , θ 2 ), we consider a random field H n given by The asymptotic properties of the QMLE θ M n = ( θ M n,1 , θ M n,2 ) with respect to H n of (3.2) can be shown by the QLA approach recalled in Sect. 2.2, as was done in Yoshida (2011). Denote by θ * = (θ * 1 , θ * 2 ) the true value of θ . To obtain the asymptotic properties for θ M 1,n for θ 1 , we can use the random field in the proof, for H T in the general theory. In the second step, the random field is used for H T in the proof of the asymptotic properties of θ 2,n . Since the relation (a) of Theorem 2.2 has been obtained for each component θ i,n , we have the joint convergence of these components. As a matter of fact, Yoshida (2011) gave the convergence as n → ∞ for any continuous function f of at most polynomial growth, where ξ i is a p idimensional centered Gaussian random vector with covariance matrix −1 i for each i = 1, 2, and ξ 1 and ξ 2 are independent. More precisely, where ν is the stationary probability measure of X , and The condition (2.2) is not restrictive since one can choose a suitable random field H n with a single scale at each step of the proof of asymptotic properties, although there are two different scales of estimators. Condition [T 1] is trivial in the present situation, and Condition [T 2] can be checked with the help of Sobolev's inequality and a Rosenthal type inequality. This task is easy. For quasi-Bayesian inference, Yoshida (2011) proposed the adaptive Bayesian method (adaBayes). Though the adaptive Bayesian estimator was defined for the general H T having k scales, it becomes in the present situation as follows. For an arbitrarily fixed value θ 0 2 of θ 2 , we define the quasi-Bayesian estimator θ B 1,n for θ 1 by for a prior density 1 for θ 1 , and next define the quasi-Bayesian estimator θ B 2,n for θ 2 by for a prior density 2 for θ 2 . Then, for the adaptive Bayesian estimator θ B n = θ B 1,n , θ B 2,n , we can apply the scheme in Sect. 2.3 twice to obtain the convergence as n → ∞ for f and ξ i (i = 1, 2) described at (3.3).
The YUIMA is an R package for simulation and statistical analysis for stochastic processes. It constructs a yuima object from the user's data of a stochastic differential equation. The YUIMA function "qmle" implemented the QMLE, and the function "adaBaye" the adaptive Bayesian estimator. For example, the "qmle" applied to a yuima object returns the estimated value and the standard error of the QMLE. See Brouste et al. (2014) and Iacus and Yoshida (2018).

Adaptive methods
An advantage of the adaptive method is that it can suppress the dimension of the integral in computation of the Bayesian estimate. The maximum likelihood type estimator also enjoys this merit by the idea of adaptive estimation. Adaptive methods for diffusion models were studied by Yoshida (1992) and Kessler (1995). The condition nh 2 → 0 is called the condition for rapidly increasing experimental design (Prakasa Rao 1983;Prakasa Rao 1988). The relaxation of this condition to nh 3 → 0 was by Yoshida (1992) with a higher-order expansion of the transition probability of the diffusion process, and this was extended by Kessler (1997) to achieve nh p → 0 for any p > 0. Uchida and Yoshida (2012) proposed various adaptive methods of the maximum likelihood type for the stochastic differential equation (3.1). They introduced a ladder of annealed random fields for Z T , and applied the QLA to prove that their adaptive schemes gave the same convergence as (3.3) at the last stage of the algorithm under the assumption nh p → 0, as explained below. Consider a sequence of estimating functions U p,n ( p = 1, 2, ...) as follows. Let Denote by θ 1,n a maximum likelihood (ML) type estimator for θ 1 : For p ≥ 2, let k 0 = p/2 and l 0 = ( p − 1)/2 . Then the function U p,n for p ≥ 2 is defined by where the functions r (k 0 ) (h, x, θ), D (k) (x, θ) and E (k) (x, θ) are coming from an expansion of the semigroup of the diffusion process satisfying (3.1). The adaptive ML type estimators for k = 1, 2, ..., k 0 . Remark that l 0 ≤ k 0 ≤ l 0 + 1. In Uchida and Yoshida (2012), the QLA provided L ∞− = ∩ q>1 L q -boundedness of (nh) k/( p−1) θ (k) 2, p,n − θ * 2 when p ≥ 2k + 1, and of n (k+1)/ p θ (k) 1, p,n − θ * 1 when p ≥ 2(k + 1). Climbing up the ladder, under the balance condition nh p → 0, they proved the convergence as n → ∞ for any continuous function f of at most polynomial growth, where ξ 1 and ξ 2 are given in Sect. 3.1. Uchida and Yoshida (2014) provided three types of algorithms for adaptive quasi-Bayesian estimation. The QLA worked effectively to show that the spiral of the estimators based on the annealed quasi-likelihood functions attains the convergence (3.3). To reduce computational load for the adaptive estimators, Kamatani and Uchida (2014) proposed hybrid multi-step estimators for diffusion processes. According to their numerical studies, the hybrid multi-step estimator with an initial QBE gives stable estimates.
Recently, Kutoyants (2017) proposed a multi-step MLE for ergodic diffusion, and Kutoyants (2014) presented approximation of the solution of the backward stochastic differential equation with a multi-step method. Dabye et al. (2018) gave moments estimators and multistep MLE for Poisson Processes.

Jump diffusion process
Let us consider a d-dimensional ergodic process X = (X t ) t∈R + satisfying the stochastic differential equation to satisfy some mild regularity conditions. In (4.1), w = (w t ) t∈R + is an r-dimensional F-Wiener process, and p(dt, dz) is a Poinsson random measure on R + × E with the deterministic Fcompensator q θ 2 (dt, dz). We want to estimate the true value θ * = (θ * 1 , θ * 2 ) of the unknown parameter θ = (θ 1 , θ 2 ) by observing the data (X t j ) j=0,1,...,n with t j = jh.
Estimation of a semimartingale with jumps has a different technical aspect than that of a continuous semimartingale. A natural idea for constructing an estimator is to apply a Gaussian type likelihood to the continuous part of X and a Poissonian likelihood to the jump component of X . However, this idea is naive. It is impossible to tell whether an increment j X = X t j − X t j−1 of X has jumps since only temporally discrete observations are available. We need some filter that detects jumps from the increments ( j X ) j=1,...,n . Shimizu and Yoshida (2006) proposed an estimator for θ by a threshold method and showed its asymptotic normality. Threshold method is a standard technique going back to studies of Lévy processes at latest.
The QLA was presented by Ogihara and Yoshida (2011) for the jump-diffusion process (4.1) satisfying a mixing condition. They considered the time-discretization step-size h = h n satisfying n −3/5 < ∼ h < ∼ n −4/7 , where a n < ∼ b n means s n ≤ Cb n for all n ∈ N, for some constant C, for sequences (a n ) and (b n ) of numbers. This balance condition is equivalent to n 2/5 < ∼ nh < ∼ n 3/7 and n −1/5 < ∼ nh 2 < ∼ n −1/7 . Suppose that for each (θ 2 , x), the mapping z → y = c(x, z, θ 2 ) is an injection from E into E and has an inverse z = c −1 (x, y, θ 2 ) from the image of c onto E. We assume that q θ 2 (dt, dz) = f θ 2 (z)dzdt with a density f θ 2 (z) (possibly f θ 2 (z)dz = 1). Let We suppose that B(x) = Im c(x, ·, θ 2 ) is independent of θ 2 ∈ 2 . With B = bb , a positive constant ρ less than 1/2 and a positive constant D, Ogihara and Yoshida (2011) where j X = j X − ha(X t j−1 , θ 2 ) and ϕ n is a truncation function that removes extremely small or extremely large increments. Denote by ν the invariant probability measure of the jump diffusion process X . By applying the QLA theory, Ogihara and Yoshida (2011) obtained the convergence (3.3) of the QMLE θ M n = θ M 1,n , θ M 2,n with respect to H n of (4.2), 1 given by (3.4) and 2 by instead of (3.5), where A(x) := {y ∈ B(x); θ 2 (y, x) = 0} is supposed to be independent of θ 2 ∈ 2 . The adaptive QBE θ B n = θ B 1,n , θ B 2,n is defined by (3.6) and (3.7) with H n of (4.2). The QLA ensures the convergence (3.8) with 1 of (3.4) and 2 of (4.3).

Gaussian quasi-likelihood to Lévy driven stochastic differential equation
Given a stochastic basis ( , F , F, P) with a filtration F = (F t ) R + , we consider an F-adapted process X = (X t ) t∈R + satisfying the stochastic differential equation where w = (w t ) t∈R + is an r-dimensional Wiener process, and J = (J t ) t∈R + is an r 1dimensional pure-jump Lévy process with Lévy measure λ. The functions a : are supposed to satisfy certain regularity conditions. The parameter spaces 1 and 2 are bounded open sets in R p 1 and R p 2 , respectively, each having a nice boundary.
In this section, our interest is in the phenomena when the Gaussian quasi-likelihood is applied to the Lévy driven stochastic differential equation. Nothing to say, this is a quasilikelihood approach. This problem is of practical importance because implementation is easy with the Gaussian quasi-likelihood, as YUIMA (Brouste et al. 2014, Iacus andYoshida 2018).
We assume that |J 1 | ∈ L ∞− , E[J 1 ] = 0 and E[J ⊗2 1 ] = I r 1 , the r 1 -dimensional identity matrix. Let V = bb + cc and assume the non-degeneracy of V . The fact that b and c share a common parameter θ 1 and the use of the function V suggest that only the variance structure of the increments will be paid attention in what follows.
We assume high frequency long-run data, that is, the data consists of (X t j ) j=0,1,...,n with t j = jh, h = h n , and nh → ∞ and nh 2 → 0 as n → ∞. Let = 1 × 2 , θ = (θ 1 , θ 2 ) ∈ and p = p 1 + p 2 . For estimation of θ , we will work with the Gaussian quasi-likelihood A measurable mapping θ n = ( θ 1,n , θ 2,n ) is called a Gaussian quasi-maximum likelihood estimator (GQMLE) if Q n ( θ n ) = max θ ∈ Q n (θ ). By a not straightforward application of the QLA theory, under ergodicity, Masuda (2013) showed the convergence as n → ∞ for any continuous function f : R p → R of at most polynomial growth, where θ * denotes the true value of θ , ξ * ∼ N p (0, ), and is a p × p positive-definite matrix. Matrix is not necessarily a block diagonal matrix, in other words, θ 1 and θ 2 are not necessarily orthogonal.

Estimation of volatility
We treated inference for ergodic processes in Sects. 3 and 4 . If the time horizon is finite, two probability measures for different values of the drift parameter can be absolutely continuous, and this means the drift parameter cannot be estimated consistently. Therefore, the diffusion parameter is only targeted by asymptotic statistics. Let us go back to the estimation problem for the unknown parameter θ in the m-dimensional semimartingale Y = (Y t ) t∈[0,T ] having the decomposition (1.1). We observe the data (X t j , Y t j ) j=0,1,...,n for t j = t n j = j T /n with fixed T , while the process b is unobservable. The parameter space is supposed to be a p-dimensional bounded open set having a good boundary to admit Sobolev's embedding inequality. Uchida and Yoshida (2013) constructed the QLA for the model (1.1). The quasi-log likelihood function H n is defined by (1.2). Based on H n of (1.2), the QMLE θ M n is characterized by (2.9) with n in place of T , and the QBE θ B n is defined by (2.12) with n for T . The true value of θ is denoted by θ * . The information matrix at θ * is defined by for u ∈ R p . The matrix is symmetric and random because it involves the process X . We prepare the random field Y : → R as where I m is the m-dimensional identity matrix. The key index to estimation is If the key index χ 0 satisfies the non-degeneracy (2.1), i.e. [T 1] (i-2), then under mild regularity conditions, as n → ∞ for any continuous function f : R p → R of at most polynomial growth and any F -measurable random variable ∈ ∪ p>1 L p , where ζ is a p-dimensional standard Gaussian random vector independent of F . This statistics is non-ergodic. Handy criteria for Condition (2.1) are available. See Uchida and Yoshida (2013) for details. The asymptotic mixed normality of the QMLE was presented by Genon-Catalot and Jacod (1993).

Jump filters
It is necessary to modify the estimating function (1.2) when the process Y has a jump component. Instead of (1.1), we will consider a semmartingale Y having the decomposition and estimate θ from the data (X t j , Y t j ) j=0,1,...,n . The jump component J = (J t ) t∈[0,T ] is a random step process. As discussed in Sect. 4.1, we need some jump filter. The classical filter | j Y | > Dh ρ is a possibility. However, it is known that the performance of the classical filter strongly depends on the tuning parameters; see e.g. Iacus and Yoshida (2018).
Recently, Inatsugu and Yoshida (2021a) (updated by arXiv:1806.10706 (2018) proposed global jump filters to enable stable and precise estimation of the volatility parameter θ . The global filter uses the order statistics of the increments. Let V j = |(S n, j−1 ) −1/2 j Y | with an initial estimator S n, j−1 for the spot volatility S(X t j−1 , θ * ) up to a possibly unknown scaling constant. The r -th order statistic of {V j } j=1,...,n is denoted by V (r ) . For a preset constant α ∈ (0, 1), the global jump filter is specified by the index set J n (α) = j ∈ {1, ..., n}; V j < V (s n (α)) with s n (α) = n(1 − α) . Then the α-quasi-log likelihood function is given by where the constant c(α) is the upper α-quantile of the chi-square distribution of degree m, i.e., P[V ≤ c(α)] = 1 − α for a random variable V ∼ χ 2 (m). The cap function K n, j = 1 {| j Y |<C * n −1/4 } with positive constant C * . This function is a very loose filter only for removing technical assumptions on the distribution of the jumps J t . Practically K n, j will be almost always 1. A measurable mapping θ M,α n maximizing H n (θ ; α) in θ ∈ is called an α-quasi-maximum likelihood estimator (α-QMLE). Inatsugu and Yoshida (2021a) (updated by arXiv:1806.10706 (2018) gave a rate of convergence in L p -norm of the α-QMLE θ M,α n with the help of an annealed quasi-likelihood ratio process and a resulting polynomial type large deviation inequality. A quasi-Bayesian estimator is also treated there. Morevoer, they introduced the QMLE θ M,α n n and the QBE θ B,α n n with a shrinking α n to obtain the same convergence as (5.2), that is, as n → ∞ for any continuous function f : R p → R of at most polynomial growth and any F -measurable random variable ∈ ∪ p>1 L p , where ζ is a p-dimensional standard Gaussian random vector independent of F . The global filter is applied to the realized volatility by Inatsugu and Yoshida (2021b). The realized volatility with a global jump filter outperforms the bipower variation and the minimum realized volatility, that were regarded as estimators robust against jumps.

Non-synchronous observations
Different components of high frequency data often have asynchronous timestamps. A seemingly natural idea for estimation of the covariance between two components is to use the ordinary realized co-volatility after synchronizing the data by some interpolation method. However, it is known as the Epps effect (Epps 1979) that any such interpolation causes a severe bias and the estimated correlation disappears when the frequency of the observations diverges. This problem of measurement of multivariate volatilities was solved by Malliavin and Mancino (2002) with a Fourier series method and by Hayashi and Yoshida (2005) with an association kernel. These estimators are nonparametric. Ogihara and Yoshida (2014) studied parametric estimation of volatility with nonsynchronous data. Consider the model (1.1). Suppose that Y is two-dimensional, X is possibly multi-dimensional and different components in (Y , X ) are observed in a non-synchronous manner. The quasi-likelihood function is based on local Gaussian approximation but a global quadratic form having many off-diagonal nonzero elements appears due to non-synchronicity. Though theoretical treatment is fairly complicated, it is possible to construct a QLA, and consequently to prove asymptotic mixed normality and moments convergence of QMLE and QBE. The nonsynchronous covariance estimator (H-Y estimator) has a central limit theorem Yoshida 2008, 2011). It is said that the H-Y estimator attains the minimum asymptotic variance among nonparametric estimators. If we consider a simple model having a constant diffusion matrix, then we can compare the two estimates of covariance between the two components of Y , one is obtained as a product of estimates by the QMLE and another one from the H-Y estimator. It is shown that the parametric estimator achieves better precision, as expected since it uses information about the structure of the model. Ogihara (2015) proved the local asymptotic mixed normality for a non-synchronously observed diffusion process, and concluded that the QMLE and QBE are asymptotically optimal.
6 Model selection Eguchi and Masuda (2018) applied the QLA to a Schwarz type model selection criterion for stochastic processes. In short, they considered the integral for a quasi-log likelihood random field H n : × → R, n ∈ N, given a probability space ( , F , P). The parameter space is a bounded open set in R p , and n is a prior density on . The technical essence of the quasi-Bayesian information criterion (QBIC) is to validate an L 1 -approximation of F n = −2 log I n by the statistic QBIC n given by where θ n is a QMLE for θ . The expression (6.1) of I n is useful for the estimate since F n = −2H n (θ * ) − 2 log | det a n | − 2 log U n Z n (u) n (θ * + a n u)du.
Then it is possible to estimate the L 1 -norm of F n − QBIC n by using a polynomial type large deviation inequality for Z n . Finally, they reached the QBIC n defined by The QBIC is valid for non-ergodic models, not only for ergodic models. Eguchi and Masuda (2018) showed that QBIC performs well for volatility model selection and selection of ergodic diffusion models. The results are practically promising as well as showing effectiveness of the QLA theory. Further considerations may be possible in some philosophical and technical aspects.
In the context of information criteria for semimartingales, the QLA was first applied in Uchida (2010) by Uchida, who treated the exact likelihood of a sampled diffusion process by means of the Malliavin calculus to derive the contrast based information criterion (CIC).
The QLA found another application in Umezu et al. (2019) to AIC for the non-concave penalized likelihood method. Sparse estimation is a new direction of the QLA theory. The polynomial type large deviation (PLD) inequality for a random field H T having the LAQ property can be transferred to a PLD inequality for the penalized random field H † T = H T − (penalty terms) (Kinoshita and Yoshida 2019). Therefore, the PLD inequality is basic even in the theory of sparse estimation for stochastic processes. Related papers are Masuda and Shimizu (2017) and Suzuki and Yoshida (2020). Partial quasi-likelihood analysis is another direction of extension of the theory (Yoshida 2018).

Point processes
Recently, modeling with point processes is attracting attention in applications of ultra-high frequency financial data (cf. Abergel and Jedidi 2015). Multi-dimensional point processes are used to model the limit order book.
On a stochastic basis ( , F , F, P), F = (F t ) t∈R + , we consider a d-dimensional point process N = (N t ) t∈R + = (N α t ) α∈I,t∈R + , I = {1, ..., d}. More precisely, N 0 = 0 and each N α = (N α ) t∈R + is a simple counting process. We will assume that the components N α (α ∈ I) have not common jump. Further, we assume the F-compensator of N has an intensity process λ * = (λ * t ) t∈R + . That is, λ * is a nonnegative locally integrable predictable process such that N − · 0 λ * s ds is a d-dimensional local martingale with respect to F. For statistical modeling of this point process, we consider a random field λ : ×R + × → R + . The set is a bounded open set in R p as usual. Suppose that the map R + t → λ(t, θ) = λ α (t, θ) α∈I is an R d + -valued left-continuous adapted process for every θ ∈ , and the map θ → λ(s, θ) is of class C 3 ( ) for every s ∈ R + . We refer the reader to Clinet and Yoshida (2017) for other regularity conditions and details of this section. We assume that the λ * t = λ(t, θ * ) for some θ * ∈ . Statistically, the value θ * is unknown and to be estimated from the data (X t ) t∈ [0,T ] . The quasi-log likelihood function for estimation of θ is Let E = R + × R + × R p . For ergodicity, we assume that for each α ∈ I, there exists a mapping π α : C b (E) × → R such that 1 T T 0 ψ λ α (t, θ * ), λ α (t, θ), ∂ θ λ α (t, θ) dt → p π α (ψ, θ ) (T → ∞) for every (ψ, θ ) ∈ C b (E) × . Then, under an identifiability condition, Clinet and Yoshida (2017) showed asymptotic normality and moments convergence for QMLE and QBE with respect to T . These results were applied to a multivariate Hawkes process, for which ergodicity can be verified. Today the Hawkes process is one of standard models in analysis of limit order books.
Muni Toke and Yoshida discussed about modeling intensities of order flows in a limit order book in Muni Toke and Yoshida (2017), analysis of order flows in limit order books with ratios of Cox-type intensities in Muni Toke and Yoshida (2019), and marked point processes and intensity ratios for limit order book modeling in Muni Toke and Yoshida (2020). QLA and information criteria for stochastic processes were applied. Flexible modeling is possible thanks to the formulation of the QLA . The resulting models incorporate various effective covariates. This approach enables us to predict the next market order more precisely than the traditional models. and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.