Marked point processes and intensity ratios for limit order book modeling

This paper extends the analysis of Muni Toke and Yoshida (2020) to the case of marked point processes. We consider multiple marked point processes with intensities defined by three multiplicative components, namely a common baseline intensity, a state-dependent component specific to each process, and a state-dependent component specific to each mark within each process. We show that for specific mark distributions, this model is a combination of the ratio models defined in Muni Toke and Yoshida (2020). We prove convergence results for the quasi-maximum and quasi-Bayesian likelihood estimators of this model and provide numerical illustrations of the asymptotic variances. We use these ratio processes to model transactions occurring in a limit order book. Model flexibility allows us to investigate both state-dependency (emphasizing the role of imbalance and spread as significant signals) and clustering. Calibration, model selection and prediction results are reported for high-frequency trading data on multiple stocks traded on Euronext Paris. We show that the marked ratio model outperforms other intensity-based methods (such as “pure” Hawkes-based methods) in predicting the sign and aggressiveness of market orders on financial markets.


Introduction
The limit order book is the central structure that aggregates buy and sell intentions of all the market participants on a given exchange. This structure typically evolves at a very high frequency: on the Paris Euronext stock exchange, the limit order book of a common stock is modified several hundreds of thousand times per day. Among these changes, thousands or tens of thousand events account for a transaction between two participants. The rest of the events indicate either the intention to buy/sell at a limit price lower/higher than available, or the cancellation of such intentions (Abergel et al. 2016).
Empirical observation of high-frequency events on a limit order book may reveal irregular interval times (durations), clustering, intraday seasonality, etc. (Chakraborti et al. 2011). Stochastic point processes are, thus, natural candidates for the modeling of such systems and their time series (Hautsch 2011). In particular, Hawkes processes have been successfully suggested for the modeling of limit order book events (Bowsher 2007;Large 2007;Bacry et al. 2012Bacry et al. , 2013Muni Toke and Pomponio 2012;Lallouache and Challet 2016;Lu and Abergel 2018).
One drawback of such models is the difficulty to account for high intraday variability. Another drawback of such models is the lack of state-dependency: the observed state of the limit order book does not influence the dynamics of the events. One may try to include state-dependency by specifying a fully parametric model (Muni Toke and Yoshida 2017), which is a cumbersome solution. Another solution is to extend the Hawkes framework with marks (Rambaldi et al. 2017) or with state-dependent kernels (Morariu-Patrichi and Pakkanen 2018). Muni Toke and Yoshida (2020) has shown that state-dependency can be efficiently tackled by a multiplicative model with two components: a shared baseline intensity and a state-dependent process-specific component. An intensity ratio model can then allow for efficient estimation of statedependency. Several microstructure examples are worked out, including a ratio model for the prediction of the next trade sign 1 .
In this work, we extend the framework of Muni Toke and Yoshida (2020) to some cases of marked point processes, by adding a third term to the multiplicative definition of the intensity, which accounts for some mark distribution. We use this extension to deepen our investigation of limit order book data. In financial microstructure, one of the characteristics of an order sent to a financial exchange is its aggressiveness (Biais et al. 1995;Harris and Hasbrouck 1996). We will say here that an order is aggressive if it moves the price. A ratio model with marks can, thus, be used to analyze both the side (bid or ask) and aggressiveness of market orders.
The rest of the paper is organized as follows. In Sect. 2, we show that some marked models can be viewed as combinations of intensity ratios of non-marked processes. Section 3 defines the quasi-likelihood maximum and Bayesian estimators and proceeds to the analysis of the estimation. Theorem 1 states the convergence result and a numerical illustration follows. We then turn to the main financial application in Sect. 4, and show how the two-step ratio model can efficiently predict (in a theoretical set-ting) the sign and aggressiveness of the next trade. Finally, the full proof of Theorem 1 is given in Sect. 5, and for completeness elements on quasi-likelihood analysis are recalled in Sect. 6.

Marked process models as two-step ratio models
Let I = {0, 1, ...,ī }. We consider certain marked point processes N i = (N i t ) t∈R + , i ∈ I and R + = [0, ∞). For each i ∈ I, letk i be a positive integer, and let K i = {0, 1, ...,k i } be a space of marks for the process N i . We denote by N i,k i = (N i,k i t ) t∈R + the process counting events of type i with mark k i ∈ K i . We have obviously N i = k i ∈K i N i,k i . LetǏ = ∪ i∈I {i} × K i . We assume that the intensity of the process N i with mark k i , i.e., the intensity of N i,k i , is given by at time t for (i, k i ) ∈Ǐ, where ϑ i = (ϑ i j ) j∈J (i ∈ I) and i (i ∈ I) are unknown parameters. More precisely, given a probability space (Ω, F , P) equipped with a rightcontinuous filtration F = (F t ) t∈R + , λ 0 = (λ 0 (t)) t∈R + is a non-negative predictable process, X j = (X j (t)) t∈R + is a predictable process for each j ∈ J = {1, ...,j }, and p k i i (t, ρ i ) is a non-negative predictable process for each (i, k i ) ∈Ǐ. Later, we will put a condition so that the mapping t → λ i,k i (t, ϑ i , i ) is locally integrable with respect to dt, and we assume that N i,k i 0 = 0, and for each (i, k i ) ∈Ǐ, the process is a local martingale for a value (ϑ i ) * , ( i ) * of the parameter ϑ i , i . We assume that the counting processes N i,k i (i ∈ I; k i ∈ K i ) have no common jumps.
In what follows, we consider the processes p k i i (t, i ) such that gives the conditional distribution of the event k i when the event i occurred. Under (2.1), the intensity process of N i becomes The process λ 0 is called a baseline intensity, whose structure will not be specified, in other words, λ 0 will be treated as a nuisance parameter, differently from the use of Cox regression as in Muni Toke and . The baseline intensity may represent the global market activity in finance, for example, and its irregular change may limit the reliability of estimation procedures and predictions for any model fitted to it. Muni Toke and Yoshida (2020) took an approach with an unstructured baseline intensity process and showed advantages of such modeling. Statistically, the process X(t) = (X j (t)) j∈J is an observable covariate process. Since the effect of these covariate processes to the amplitude of λ i (t, ϑ i ) is contaminated by the unobservable and structurally unknown baseline intensity, a more interesting measure of dependency of λ i (t, ϑ i ) to X(t) is the ratio , (θ 0 j = 0 in particular) and consider the ratios ..,ī }. In this paper, we further assume that the factor p k i i (t, i ) is given by are observable covariate processes, J i being a finite index set. This is a multinomial logistic regression model.
Let Θ be a bounded open convex set in R p with p =īj. For each i ∈ I, R i denotes a bounded open convex set in R p i with p i =j iki . Write ρ = (ρ i ) i∈I . Let R = Π i∈I R i . We will consider Θ × R as the parameter space of (θ, ρ).

Remark 1 The marked ratio model
is in general not equivalent to a non-marked ratio model in larger dimension, in which we would write the intensity of the counting process of events of type i ∈ I with mark for some covariate processes Z j , j ∈J. Equivalence of the models would require these expressions to coincide for some sets of covariates and parameters. However, if Z j (t) = 0 for all j ∈J, then necessarily X j (t) = 0 for all j ∈ J and Y i j i (t) = 0 for all i ∈ I and j i ∈ J i . This in turn implies 1 |K i | =λ 0 (t) λ 0 (t) for all i ∈ I, which is generally not true. In Sect. 4.5, a non-marked ratio model is used as a benchmark to assess the performances of the marked ratio model. Prediction performances are indeed shown to be different.

Quasi-maximum likelihood estimator and quasi-Bayesian estimator
The two step marked ratio model consists of the two kinds of ratio models (2.3) and (2.4). Estimation of this model can be carried out with multiple successive ratio models.
In the first step, we consider the parameter θ = (θ i j ) i∈I 0 , j∈J and the ratios (2.3) for i ∈ I. The quasi-log-likelihood based on observations on [0, T ] for this ratio model is This comes from the multinomial logistic regression. A quasi-maximum likelihood estimator (QMLE) for θ is a measurable mappingθ M T : Ω → Θ satisfying for all ω ∈ Ω. 2 In the second step, we consider the ratios (2.4) and the associated quasi-loglikelihood It is possible to pool these estimating functions by the single estimating function In other words, The collection of QMLEs θ M T , (ρ i,M T ) i∈I is a QMLE for H T (θ, ρ). Use of H T (θ, ρ) is convenient when we consider asymptotic distribution of the estimatorsθ M T andρ i,M T (i ∈ I) jointly. The quasi-Bayesian estimator (QBE) θ B T , (ρ i,B T ) i∈I is defined bŷ for a prior probability density (θ, ρ) on Θ × R. We assume that : Θ × R → R + is continuous and Since H T (θ ) and H (i) T (ρ i ) have no common parameters, the maximization of H T (θ, ρ) with respect to the parameters θ and ρ i (i ∈ I) can be carried out separately. However, these components are not always individually treated for the QBE. If (θ, ρ) is a product of prior densities as (θ, ρ) = (θ )Π i∈I i (ρ i ), then the each integral in (3.5) and (3.6) is simplified and we can computeθ B T andρ i,B T (i ∈ I) separately:

Quasi-likelihood analysis
Let We consider the following conditions.

Condition [M1]
is not restrictive since the covariates can often be regarded as bounded in applications.
The alpha mixing coefficient α(h) is defined by where for I ⊂ R + , B I denotes the σ -field generated by λ 0 (t), (X j (t)) j∈J , In the two-step ratios model, the category (i, k i ) is selected with twofold multinomial distributions of sample size equal to 1. First the class i ∈ I is selected when ξ i = 1 for some random variable ξ = (ξ 0 , ..., ξ¯i ) ∼ Multinomial(1; π 0 , ...., π¯i ).
If ξ i = 1 for a class i ∈ I, then the class k i ∈ K i is chosen as k i = k when η i k = 1 for some independent random variable Let us introduce some notations used in the following analysis. For a tensor T = .., ] stand for a multilinear mapping. We denote by u ⊗r = u ⊗ · · · ⊗ u the r times tensor product of u.
Denote by ∂ (θ,ρ) the differential operator with respect to (θ, ρ). Let and let Γ T = Γ T (θ * , ρ * ). Then, as detailed in Section A.2, Therefore, Hence, The symmetric matrices Γ (θ) and Γ i (ρ i ) are defined by for u ∈ R p , and and in particular set An identifiability condition will be imposed.
as T → ∞ for A ∈ {M, B} and every f ∈ C(Rp) of at most polynomial growth, where ζ is ap-dimensional standard Gaussian random vector.
Example 1 As an illustration we consider the case with two processes (I = {0, 1}), and two marks for each process (K 0 = K 1 = {0, 1}). The first state-dependent term takes into account one covariate X 1 (i.e., J = {1}). The mark distributions both depend on another covariate Y 1 (i.e. J 0 = J 1 = {1}). In this example, we assume that X 1 and Y 1 are independent Markov chains with values in {−1, 1} and constant transition intensities λ X and λ Y . We assume that λ 0 is the intensity of a Hawkes process In this specific case, the matrix Γ of Eq. (3.13) is a 3 × 3-diagonal matrix, and a direct computation shows that the diagonal coefficients are We run 1000 simulations of the processes (N 0 , N 1 ) with their marks for various values of horizon T . Numerical values used in these simulations are the following: For each simulation, we compute the quasimaximum likelihood estimators (θ 1 1 ,ρ 0,1 1 ,ρ 1,1 1 ) with the two-step ratios described above. Table 1  Asymptotic values predicted by Theorem 1 are indeed empirically retrieved, which ends this numerical illustration.

Intensities of the processes counting market orders
We consider the market orders submitted to a given limit order book. Let N 0 be the process counting the market orders submitted on the bid side (sell market orders) and N 1 the process counting the market orders submitted on the ask side (buy market orders). On each side, we further consider whether the order is an aggressive order that moves the price (labeled with mark 1), or a non-aggressive order that does not move the price (labeled with mark 0). We assume that the intensity of an order of type i ∈ I = {0, 1} with mark . (4.1) In the following applications, we will consider several possible models defined with various sets of covariates X j , j ∈ J and Y i j , j ∈ J i , i = 0, 1. The tested sets of covariates X j , j ∈ J and Y i j , j ∈ J i , i = 0, 1 will all be subsets of the following list of possible covariates (besides Z 0 = 1 common to all models): is the quantity available at the best bid (resp.ask) at time t (i.e., the imbalance); -Z 2 : (t), where (t) is the sign of the last market order at time t (1 for an ask market order, −1 for a bid market order ; -Z 3 : s(t) (t) the signed spread, where s(t) is the observed spread in currency at time t ; s (Hawkes covariate for ask market orders).
With these Hawkes covariates, the ratio model can actually be seen as a kind of nonlinear Hawkes process. When the theory applied, the ergodicity is an assumption. In the present model, it depends on the nature of the process λ 0 , that was set generally. Brémaud and Massoulié (1996) treated a stability problem of a nonlinear Hawkes process. If the system has a Markovian representation, there is a possibility of applying a drift condition like Abergel and Jedidi (2015) and Clinet and Yoshida (2017). On the other hand, the intraday stationarity (ergodicity) is not essentially important. As described in Section 3.2 of Muni Toke and Yoshida (2020), in quite parallel to the simple stationary case, we can relax the assumption of intraday stationarity by considering a repeated measurements model. Then, we only need a more realistic ergodicity of the data across the long-run repeated measurements, and after all, we can validate the methods.

Limit order book data
We use tick-by-tick data for 36 stocks traded on Euronext Paris. The sample spans the whole year 2015, i.e., roughly 200 trading days for each stock, although some days are missing for some stocks. Table 3 in Sect. 7 lists the stocks investigated and the number of trading days available. Rough data consist in a TRTH (Thomson-Reuters Tick History) database: for each trading day and each stock, one file lists the transactions (quantities and prices) and one file lists the modifications of the limit order book (level, price and quantities). Timestamps are given with a millisecond precision. Synchronization of both files and reconstruction of the limit order book are carried out with the procedure described in Muni Toke (2016). One strong advantage of the ratio model is that it does not require precise timestamps in itself, since timestamps do not appear explicitly in the quasi-likelihood of the ratios, while fitting other intensity-based models (e.g., Hawkes processes) requires unique precise timestamps for log-likelihood computation. Here, if Hawkes fits are used as covariates (covariates Z 4 to Z 9 in our application), then we choose to consider only unique timestamps, i.e., we aggregate orders of the same type occurring at the same timestamp.

Estimation procedure of the two-step ratio model
Following Sects. 2 and 3, estimation of the model defined at Eq. (4.1) can be carried out with multiple successive ratio models. In the first step, we consider the difference parameters θ i j = ϑ i j − ϑ 0 j , i ∈ I \ {0}, j ∈ J and the ratios (i ∈ I \ {0}): (4. 2) The quasi-log-likelihood based on the observation on [0, T ] for this ratio model is defined at Eq. (3.1). In the second step, we consider the ratios and the associated quasi-log-likelihood of Eq. (3.2). Consistency and asymptotic normality of the quasi-maximum likelihood estimators are guaranteed by Theorem 1.

In-sample model selection with QAIC and QBIC
In this first application, we perform in-sample model selection to assess the relevance of the different possible sets of covariates. For each stock and each trading day, we fix a set of covariates. We use the indices of the tested covariates to name the models: the model 146 is, thus, the model with covariates (Z 1 , Z 4 , Z 6 ). If required, we estimate the parameters of all the Hawkes covariates on the previous day and then compute the Hawkes covariates using these fitted parameters. This procedure ensures that the predictability of the covariates is not violated. We finally fit three ratio models following the above procedure : one for the processes (N 0 , N 1 ) (signature of the marker orders), one for the processes (N 0,0 , N 0,1 ) (aggressiveness of the bid market orders) and one for the processes (N 1,0 , N 1,1 ) (aggressiveness of the ask market orders). For each trading day, we then select the model minimizing some information criterion. For the ratio for the side determination, the criterion is where |J| is the cardinality of the set of J and a T = 2 for the QAIC criterion, and a T = log(T ) for the QBIC criterion. For the aggressiveness ratios, the criterion is − 2H (4.5) For side determination, the models 14689, 124689, 134689 and 1234689 are the four most often chosen models: the selected model is among these four models more than 80% of the time in average across stocks using QAIC, and close the 90% of the time using QBIC. As expected, QBIC favors the smallest model 14689. Imbalance, Hawkes covariates for bid and ask market orders, and Hawkes covariates for aggressive bid and ask market orders, thus, appear to be the most informative covariates.
For aggressiveness determination, the model 146 is the most often selected by QBIC. This is in line with intuition: imbalance is known to be a significant proxy for price change (see, e.g., Lipton et al. 2013) and Hawkes covariates for aggressive bid and aggressive ask are specific to the targeted events. QAIC selection is more widespread and favors a larger model (as expected), namely 12346. Note also that for several stocks, models with "symmetric" sets of covariates can also be chosen: for ask aggressiveness, 1679 is often selected, i.e., imbalance and all available ask Hawkes   covariates; symmetrically, 1458 is selected for ask aggressiveness, i.e., imbalance and all available bid Hawkes covariates.
One may in particular observe that these results confirm the primary role of the spread measured in ticks in the theory of financial microstructure. Stocks for which the observed spread is mostly equal to one tick are labeled 'large-tick stocks', implying that market participants are constrained by the price grid when submitting orders to the limit order book. Other stocks may be labeled 'small-tick stocks' (Eisler et al. 2012). Using our sample, we compute the mean observed spread in ticks for each stock and each available trading day, and group these values in bins of equal sizes. Then inside each bin, we compute the frequency of selection of the covariate Z 3 (signed spread) by QBIC for the aggressiveness ratio estimation of Equation (4.3). Bar plot is provided in Fig. 5 (left). We observe an increase of the frequency of the selection of the spread covariate when the mean observed spread increases from 1 tick (its minimal possible value) to roughly 2.5 ticks. For larger spread values, frequency remains high then seems to decrease at high values. This indicates that the significancy of covariates, especially the spread, is not the same for large-tick and small-tick stocks, and that even for small-tick stocks, dependency is not constant/uniform. This visual observation can, for example, be complemented by the following statistical test. For all stocks and trading days, we compute the empirical cumulative distributions functions of the daily mean spread in ticks (i) when the spread covariate is selected by QBIC in the aggressiveness ratios, and (ii) when the spread covariate is not selected. A one-sided Kolmogorov-Smirnov test rejects (with p-value 10 −53 ) the fact that both distributions are identical, and chooses the alternative hypothesis that the spread covariate is more selected for larger observed spreads. Recall that many microstructure models are developed for large-tick stocks, since assuming a constant spread equal to one tick often simplifies the analysis of the limit order book dynamics. Our observation advocates for the definition of specific microstructure models for small-tick stocks, taking into account the spread dynamics.
Model selection consistency validates the use of QBIC. See Eguchi and Masuda (2018), or follow Muni Toke and Yoshida (2020) for a direct proof for consistency including other criteria. However, the real performance in prediction of a selected model is more important than the model selection consistency. It is worth trying QAIC, or the consistent QAIC.

Out-of-sample prediction performance
In this section, we use intensity and ratio models to predict the sign and aggressiveness of an incoming market order. For all tested models, the procedure is the following. On a given trading day, the model is fitted. Fitted parameters are then used on the following trading day (available in the database) to compute the intensities (or ratios for ratio models), at all time. The type of an incoming event is then predicted to be the type of highest intensity or ratio. The exercise is theoretical in the sense that we assume that these computations are instantaneous, so that intensities or ratios are available at all times.
Recall the notation N = (N i,k i ) i∈{0,1},k i ∈{0,1} for the four-dimensional point process counting bid aggressive market orders, bid non-aggressive market orders, ask aggressive market orders and ask non-aggressive market orders. We use two benchmark models.
The first benchmark model is the Hawkes model. Here, N is assumed to be a fourdimensional Hawkes process with a single exponential kernel. In vector notation, the intensity is written as Estimation and ratio computation can be found in, e.g., Bowsher (2007); Muni Toke and Pomponio (2012). This model is labeled 'Hawkes'. The second benchmark model is the four-dimensional ratio model without marks (Muni Toke and Yoshida (2020)). In this model, the intensity of the counting process with some unobserved baseline intensity λ 0,R (t). Given the previous observations, we choose the set of covariates (Z 1 , Z 4 , Z 6 , Z 8 , Z 9 ) for this benchmark. It is natural to choose these covariates (imbalance, Hawkes for aggressive orders and Hawkes for all orders) given the results on model selection of Sect. 4.4. Estimation and ratio computation are detailed in Muni Toke and Yoshida (2020). This model is labeled 'Ratio-14689'.

Fig. 6
Out-of-sample prediction performances for the benchmark models and the marked ratio models. Label explanation is in the text These two benchmarks are used to assess the performances of two marked ratio models (or two-step ratio models) described in this paper. The first marked ratio model uses the covariates (Z 4 , Z 5 , Z 6 , Z 7 ) for both steps. These covariates are based on the Hawkes processes of the benchmark Hawkes model. The second marked ratio model uses the covariates (Z 1 , Z 4 , Z 6 , Z 8 , Z 9 ) for the first-step ratio (side determination) and (Z 1 , Z 4 , Z 6 ) for both second-step ratios (bid and ask aggressiveness). Again, these choices are natural given the results on model selection of Sect. 4.4. These models are labeled 'MarkedRatio-4567-4567-4567' and 'MarkedRatio-14689-146-146', respectively. Figure 6 plots the results for each stock for the two benchmark models and the two marked ratio models. For completeness, the partial performances for side determination and aggressiveness determination of the trades are provided on Fig. 7. Finally, Table 2 lists the partial and global prediction performances of these models averaged across stocks. The benchmark Hawkes model correctly predicts the sign and aggressiveness of an incoming order with an accuracy in the range [40%, 60%] for all stocks, with a 50% average. The marked ratio model with only Hawkes parameters  and no dependency on the state of the limit order book actually reproduces closely these performances. The non-marked ratio model 'Ratio-14689' improves slightly the global performances of the two previous models. When looking at the partial accuracies, we observe that this improvement is mainly due to a better side prediction. Finally, the 'MarkedRatio-14689-146-146', which appeared to be in average the best model with respect to the QBIC selection, results strongly outperforms all other models. The global accuracy is in the range [60%, 80%] for all stocks, with a 67% average, i.e., we are theoretically able to correctly predict both the sign and aggressiveness of an incoming market order two times out of three. Finally, we observe by comparing side determination of ' Ratio-14689' and 'MarkedRatio-14689-146-146' that the decoupling of the side and aggressiveness Out-of-sample partial prediction performances for the side prediction (left) and aggressiveness prediction (right), for the benchmark models and the marked ratio models. Label explanation is in the text Side accuracy gives the fraction of correctly signed trades. Aggressiveness accuracy gives the proportion of trade with a correctly predicted accuracy. Global accuracy gives the fraction of orders with correctly predicted side and aggressiveness ratios in the marked ratio model significantly improves the prediction performance over the one-step four-dimensional case, while using the same covariates. These results show that the two-step ratio model for marked point processes is a significant improvement to existing intensity models. As in the standard ratio model of Muni Toke and Yoshida (2020), this provides an easy way to have both clustering and state-dependency. However, it is important to note that the two-step ratio strongly improves the performance of the standard ratio model in multidimensional setting. In this example, flexibility in the choice of covariates allows for precise model selection for both sign and aggressiveness.

Proof of Theorem 1
The convergence given in Theorem 1 can be obtained by the quasi-likelihood analysis, which we recall in Section 6. We will apply Theorems 3 and 5 in Sect. 6 to the double ratio model. In the present situation, the scaling factor is b T = T , the joint parameter (θ, ρ) is for θ in Section 6, and the dimension of the full parameter space isp in place of p of Section 6. Fix a set of values of parameters (α, β 1 , β 2 , ρ, ρ 1 , ρ 2 ) so that Condition [L1] (Section 6) is met with ρ = 2.

Score functions and a central limit theorem
The score function for ρ i is given by Then, . By some calculus with (2.1) and p k i i (t, i ) = q k i i (t, ρ i ), we see We are assuming that the counting processes N i,k i (i ∈ I; j i ∈ K i ) have no common jumps. Then, the p i × p i matrix valued process and Therefore, the mixing property [M2] gives the convergence The score function for θ is the p-dimensional process where r (t, θ) = (r i (t, θ)) i∈I 0 . Evaluated at θ * , Then, the p × p matrix valued process F has the expression Then, the mixing property [M2] provides the convergence The full information matrix is thep ×p block diagonal matrix Let Δ T = T −1/2 F T , (F (i) T ) i∈I . Now, by the martingale central limit theorem, it is easy to obtain the convergence where ζ is ap-dimensional standard Gaussian random vector. The joint convergence (Δ T , Γ ) → d (Γ 1/2 ζ, Γ ) is obvious since Γ is deterministic.

Condition [L4]
According to (6.2), we define the random field Y T : for H T (θ, ρ) given in (3.3). From the expression (3.4) of H T (θ, ρ), we have By definition, where C is a constant depending on the diameters of Θ and R. Therefore, under for k ∈ N, where the constant appearing at each < ∼ depends only onp, k and the constant of the Burkholder-Davis-Gundy inequality. By induction, we obtain for every p > 1 and ∈ {0, 1}. Then, Sobolev's inequality gives for every p > 1. Let More precisely, where (3.12) was used. Similarly, from (5.5), equivalently, for i, i ∈ I 0 and j, j ∈ J. Obviously, In a way similar to the derivation of (5.12), as a matter of fact it is easier, we can show

Proof of Theorem 1
We have verified Conditions [L1]-[L4] in the present situation. Theorem 1 now follows from Theorems 3 and 5.

Quasi-likelihood analysis
This section recalls the quasi-likelihood analysis. Let Θ be a bounded open set in R p . Given a probability space (Ω, F, P), suppose that H T : Ω × Θ → R is of class C 3 , that is, the mapping Θ θ → H T (ω, θ ) ∈ R p is continuously extended to Θ and of class C 3 for every ω ∈ Ω, and the mapping Ω ω → H T (ω, θ ) ∈ R p is measurable for every θ ∈ Θ. Let Γ be a p × p random matrix.
Let θ * ∈ Θ. For a sequence a T ∈ G L(p) satisfying lim T →∞ |a T | = 0, let where denotes the matrix transpose. We consider a random field 2) which will be assumed to converge to a random field Y : Ω × Θ → R. Only for simplifying presentation, we will assume that a T = b −1/2 T I p for diverging sequence (b T ) T >0 of positive numbers, where I p is the identity matrix. In what follows, we fix a positive number L.
for all T > 0 and r > 0. Here, the supremum of the empty set should read −∞ by convention.
We comment some points. Parameters satisfying [L1] exist. Nondegeneracy conditions in [L3] are obvious in ergodic cases. In this paper, we will apply Theorem 2 under ergodicity of the stochastic system. Theorem 2 asserts a polynomial type large deviation inequality can be obtained once the boundedness of moments of some random variables is verified. Condition [L4] is easy to obtain because each variable is usually a simple additive functional. The polynomial type large deviation inequality in Theorem 2 enables us to easily apply the scheme by Ibragimov and Has'minskiȋ (1981) and Kutoyants (1984Kutoyants ( , 2012 to various dependence structures. Let u ∈ R p . Define r T (u) (u ∈ U T ) by It is said that Z T is locally asymptotically quadratic (LAQ) at θ * if r T (u) → p 0 as T → ∞ for every u ∈ R p , and hence log Z T (u) is asymptotically approximated by a random quadratic function of u.
We will confine our attention to a very standard case where Z T is locally asymptotically mixed normal, though the general theory of the quasi-likelihood analysis is framed more generally.
Any measurable mappingθ M T : Ω → Θ is called a quasi-maximum likelihood estimator (QMLE) for H T if When H T is continuous on the compact Θ, such a measurable function always exists, which is ensured by the measurable selection theorem.
Proof We will sketch the proof to convey the concepts of the quasi-likelihood analysis to the reader. See Yoshida (2011)  for u ∈ R p . The term r T (u) admits the expression In this situation, we can apply Taylor's formula even though the whole Θ is not convex. Condition [L4] (iii) and the convergence of Δ T ensures tightness of the random fields Z T | B(0,R) T >T 0 for every R > 0, where B(0, R) = {u ∈ R p } and T 0 is a sufficiently large number depending on R. Combining this property with the polynomial type large deviation inequality given by Theorem 2, we obtain the convergence Z T → Z inĈ(R p ) for the random field Z T extended as an element ofĈ(R p ) so that sup R p \U T Z T (u) ≤ sup u∈∂U T Z T (u). Consequently,û T →û = argmax u∈R p Z(u). It is known that a measurable version of extension of Z T exists.
A polynomial type large deviation, even weaker than the one in Theorem 2, serves to show L q -boundedness of {|û T | q } for L > q > p. Then, the family {û T } is uniformly integrable, and hence we obtain the convergence of E[ f (û T )].

Remark 2 In Theorem 3, if Δ
An advantage of the quasi-likelihood analysis is that the asymptotic behavior of the quasi-Bayesian estimator can be obtained as well as that of the quasi-maximum likelihood estimator and its moments convergence. The mappinĝ is called a quasi-Bayesian estimator (QBE) with respect to the prior density . The QBEθ B T takes values in the convex-hull of Θ. We will assume is continuous and 0 < inf θ∈Θ (θ) ≤ sup θ∈Θ (θ) < ∞. We will give a concise exposition in the following among many possible ways. The reader is referred to Yoshida (2011) for further information. Recall that p is the dimension of Θ, and B(R) denotes the open ball of radius R centered at the origin. C(B(R)) is the space of all continuous functions on B(R), and it is equipped with the supremum norm. Recall V T (r ) = {u ∈ U T ; |u| ≥ r }. As before,û = Γ −1/2 ζ with a p-dimensional standard Gaussian random vector ζ independent of Γ . Writeû B T = a −1 T (θ B T − θ * ). Theorem 4 Let p ≥ 1, L > p +1, D > p+ p. Suppose that (Δ T , Γ ) → d (Γ 1/2 ζ, Γ ) as T → ∞, where ζ is a p-dimensional standard Gaussian random vector independent of Γ . Moreover, suppose the following conditions are satisfied.
(i) For every R > 0, as T → ∞, where Z is given in (6.4). (ii) There exist positive constants T 0 , C 1 and C 2 such that P sup for all T ≥ T 0 and r > 0. (6.8) Then, as T → ∞ for any continuous function f : R p → R satisfying sup u∈R p (1 + |u|) − p | f (u)| < ∞.
Proof We will give a brief summary of the proof; see Yoshida (2011)  The random field Z inherits a tail estimate from (6.7), and henceû(R) is approximated by Combining these estimates, we can concludeû B T → dû as T → ∞. Convergence of the expectation is a consequence of uniform integrability of |û B T | p ensured by (6.7). Remark 3 (a) It is possible to relax the conditions of Theorem 4 to only ensure the convergenceû B T →û. (b) In Theorem 4, if Δ T → d Γ 1/2 ζ F-stably, thenû B T →û F-stably. (c) Usually, the condition (iii) of Theorem 4 is easily verified; See Lemma 2 of Yoshida (2011). (d) We refer the reader to Yoshida (2021) for a simplified quasilikelihood analysis for a locally asymptotically quadratic random field.
The following result follows from Theorem 4.

Proof
The convergence (6.6) holds, as shown in the proof of Theorem 3. The polynomial type large deviation inequality (6.7) is a consequence of Theorem 2; the number D is arbitrary. Fix δ > 0. Then, there exists T 0 > 0 such that B(δ) ⊂ Θ. In particular, r T (u) admits the representation (6.5) for all u ∈ B(δ). Since M 3 = L(β − ρ 1 ) −1 > p, M 4 = L(2β 1 (1 − α) −1 − ρ 1 ) −1 > p and p > p, we have p := min{M 3 , M 4 , p} > p and E[|r T (u)| p ] ≤ C 0 |u| p (u ∈ B(δ)) for some constant C 0 . Then Lemma 2 of Yoshida (2011) gives the estimate by a constant C 1 depending on ( p , p, δ, C 0 ) and the supremums appearing in [L4](i),(iii),(iv), but C 1 is independent of T ≥ T 0 . Therefore (6.8) holds true. Thus, we can apply Theorem 4 to conclude the proof.  on the computational resources used for this paper, some trading days for few very liquid stocks were not used for some of the marked ratio models tested in Sect. 4.4. In this case, only the trading days where all models have been computed have been used. This is the last column of the table.