1 Introduction

The limit order book is the central structure that aggregates buy and sell intentions of all the market participants on a given exchange. This structure typically evolves at a very high frequency: on the Paris Euronext stock exchange, the limit order book of a common stock is modified several hundreds of thousand times per day. Among these changes, thousands or tens of thousand events account for a transaction between two participants. The rest of the events indicate either the intention to buy/sell at a limit price lower/higher than available, or the cancellation of such intentions (Abergel et al. 2016).

Empirical observation of high-frequency events on a limit order book may reveal irregular interval times (durations), clustering, intraday seasonality, etc. (Chakraborti et al. 2011). Stochastic point processes are, thus, natural candidates for the modeling of such systems and their time series (Hautsch 2011). In particular, Hawkes processes have been successfully suggested for the modeling of limit order book events (Bowsher 2007; Large 2007; Bacry et al. 2012, 2013; Muni Toke and Pomponio 2012; Lallouache and Challet 2016; Lu and Abergel 2018).

One drawback of such models is the difficulty to account for high intraday variability. Another drawback of such models is the lack of state-dependency: the observed state of the limit order book does not influence the dynamics of the events. One may try to include state-dependency by specifying a fully parametric model (Muni Toke and Yoshida 2017), which is a cumbersome solution. Another solution is to extend the Hawkes framework with marks (Rambaldi et al. 2017) or with state-dependent kernels (Morariu-Patrichi and Pakkanen 2018). Muni Toke and Yoshida (2020) has shown that state-dependency can be efficiently tackled by a multiplicative model with two components: a shared baseline intensity and a state-dependent process-specific component. An intensity ratio model can then allow for efficient estimation of state-dependency. Several microstructure examples are worked out, including a ratio model for the prediction of the next trade signFootnote 1.

In this work, we extend the framework of Muni Toke and Yoshida (2020) to some cases of marked point processes, by adding a third term to the multiplicative definition of the intensity, which accounts for some mark distribution. We use this extension to deepen our investigation of limit order book data. In financial microstructure, one of the characteristics of an order sent to a financial exchange is its aggressiveness (Biais et al. 1995; Harris and Hasbrouck 1996). We will say here that an order is aggressive if it moves the price. A ratio model with marks can, thus, be used to analyze both the side (bid or ask) and aggressiveness of market orders.

The rest of the paper is organized as follows. In Sect. 2, we show that some marked models can be viewed as combinations of intensity ratios of non-marked processes. Section 3 defines the quasi-likelihood maximum and Bayesian estimators and proceeds to the analysis of the estimation. Theorem 1 states the convergence result and a numerical illustration follows. We then turn to the main financial application in Sect. 4, and show how the two-step ratio model can efficiently predict (in a theoretical setting) the sign and aggressiveness of the next trade. Finally, the full proof of Theorem 1 is given in Sect. 5, and for completeness elements on quasi-likelihood analysis are recalled in Sect. 6.

2 Marked process models as two-step ratio models

Let \({{\mathbb {I}}}=\{0,1,...,{\bar{i}}\}\). We consider certain marked point processes \(N^i=(N^i_t)_{t\in {{\mathbb {R}}}_+}\), \(i\in {{\mathbb {I}}}\) and \({{\mathbb {R}}}_+=[0,\infty )\). For each \(i\in {{\mathbb {I}}}\), let \({\bar{k}}_i\) be a positive integer, and let \({{\mathbb {K}}}_i=\{0,1,...,{\bar{k}}_i\}\) be a space of marks for the process \(N^i\). We denote by \(N^{i,k_i}=(N^{i,k_i}_t)_{t\in {{\mathbb {R}}}_+}\) the process counting events of type i with mark \(k_i\in {{\mathbb {K}}}_i\). We have obviously \(N^i=\sum _{k_i\in {{\mathbb {K}}}_i}N^{i,k_i}\). Let \({\check{{{\mathbb {I}}}}}=\cup _{i\in {{\mathbb {I}}}}\big (\{i\}\times {{\mathbb {K}}}_{i}\big )\). We assume that the intensity of the process \(N^i\) with mark \(k_i\), i.e., the intensity of \(N^{i,k_i}\), is given by

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^{i})= & {} \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^i_jX_j(t)\bigg )\>p^{k_i}_i(t,\varrho ^i) \end{aligned}$$

at time t for \((i,k_i)\in {\check{{{\mathbb {I}}}}}\), where \(\vartheta ^i=(\vartheta ^i_j)_{j\in {{\mathbb {J}}}}\) (\(i\in {{\mathbb {I}}}\)) and \(\varrho ^i\) (\(i\in {{\mathbb {I}}}\)) are unknown parameters. More precisely, given a probability space \((\varOmega ,\mathcal{F},P)\) equipped with a right-continuous filtration \({{\mathbb {F}}}=(\mathcal{F}_t)_{t\in {{\mathbb {R}}}_+}\), \(\lambda _0=(\lambda _0(t))_{t\in {{\mathbb {R}}}_+}\) is a non-negative predictable process, \(X_j=(X_j(t))_{t\in {{\mathbb {R}}}_+}\) is a predictable process for each \(j\in {{\mathbb {J}}}=\{1,...,{\bar{j}}\}\), and \(p_i^{k_i}(t,\rho ^i)\) is a non-negative predictable process for each \((i,k_i)\in {\check{{{\mathbb {I}}}}}\). Later, we will put a condition so that the mapping \(t\mapsto \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^i)\) is locally integrable with respect to \(\mathrm{d}t\), and we assume that \(N^{i,k_i}_0=0\), and for each \((i,k_i)\in {\check{{{\mathbb {I}}}}}\), the process

$$\begin{aligned} {\tilde{N}}^{i,k_i}_t= & {} N^{i,k_i}_t-\int _0^t\lambda ^{i,k_i}(s,(\vartheta ^i)^*,(\varrho ^i)^*)\mathrm{d}s \end{aligned}$$

is a local martingale for a value \(\big ((\vartheta ^i)^*,(\varrho ^i)^*\big )\) of the parameter \(\big (\vartheta ^i,\varrho ^i\big )\). We assume that the counting processes \(N^{i,k_i}\) (\(i\in {{\mathbb {I}}};\>k_i\in {{\mathbb {K}}}_i\)) have no common jumps.

In what follows, we consider the processes \(p^{k_i}_i(t,\varrho ^i)\) such that

$$\begin{aligned} \sum _{k_i\in {{\mathbb {K}}}_i}p^{k_i}_i(t,\varrho ^i)= & {} 1 \end{aligned}$$
(2.1)

for \(i\in {{\mathbb {I}}}\). Then, the \({\bar{k}}_i\)-dimensional process \((p^{k_i}_i(t,\varrho ^i))_{k_i\in {{\mathbb {K}}}_i}\) gives the conditional distribution of the event \(k_i\) when the event i occurred. Under (2.1), the intensity process of \(N^i\) becomes

$$\begin{aligned} \lambda ^i(t,\vartheta ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^i) \>=\>\lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jX_j(t)\bigg ). \end{aligned}$$
(2.2)

The process \(\lambda _0\) is called a baseline intensity, whose structure will not be specified, in other words, \(\lambda _0\) will be treated as a nuisance parameter, differently from the use of Cox regression as in Muni Toke and Yoshida (2017). The baseline intensity may represent the global market activity in finance, for example, and its irregular change may limit the reliability of estimation procedures and predictions for any model fitted to it. Muni Toke and Yoshida (2020) took an approach with an unstructured baseline intensity process and showed advantages of such modeling. Statistically, the process \({{\mathbb {X}}}(t)=(X_j(t))_{j\in {{\mathbb {J}}}}\) is an observable covariate process. Since the effect of these covariate processes to the amplitude of \(\lambda ^i(t,\vartheta ^i)\) is contaminated by the unobservable and structurally unknown baseline intensity, a more interesting measure of dependency of \(\lambda ^i(t,\vartheta ^i)\) to \({{\mathbb {X}}}(t)\) is the ratio

$$\begin{aligned} \lambda ^i(t,\vartheta ^i)/\sum _{i'\in {{\mathbb {I}}}}\lambda ^{i'}(t,\vartheta ^{i'}) \end{aligned}$$

for \(i\in {{\mathbb {I}}}\). Thus, we introduce the difference parameters \(\theta ^i_j=\vartheta ^i_j-\vartheta ^0_j\) (\(i\in {{\mathbb {I}}},\>j\in {{\mathbb {J}}}\)), (\(\theta ^0_j=0\) in particular) and consider the ratios

$$\begin{aligned} r^i(t,\theta )= & {} \frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jX_j(t)\bigg )}{\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^{i'}_jX_j(t)\bigg )} \>=\>\frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^i_jX_j(t)\bigg )}{1+\sum _{i'\in {{\mathbb {I}}}_0}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^{i'}_jX_j(t)\bigg )} \end{aligned}$$
(2.3)

for \(i\in {{\mathbb {I}}}\), where \(\theta =(\theta ^i_j)_{i\in {{\mathbb {I}}}_0,j\in {{\mathbb {J}}}}\) with \({{\mathbb {I}}}_0={{\mathbb {I}}}\setminus \{0\}=\{1,...,{\bar{i}}\}\).

In this paper, we further assume that the factor \(p^{k_i}_i(t,\varrho ^i)\) is given by

$$\begin{aligned} p^{k_i}_i(t,\varrho ^i)= & {} \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \varrho ^{i,k_i'}_{j_i}Y^i_{j_i}(t)\bigg )} \end{aligned}$$

for \((i,k_i)\in {\check{{{\mathbb {I}}}}}\), \({{\mathbb {J}}}_i=\{1,...,{\bar{j}}_i\}\). Obviously, \(p^{k_i}_i(t,\varrho ^i)=q^{k_i}_i(t,\rho ^i)\) defined by

$$\begin{aligned} q^{k_i}_i(t,\rho ^i)= & {} \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\rho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{1+\sum _{k_i'\in {{\mathbb {K}}}_{i,0}}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \rho ^{i,k_i'}_{j_i}Y^i_{j_i}(t)\bigg )} \end{aligned}$$
(2.4)

for \((i,k_i)\in {\check{{{\mathbb {I}}}}}\), where \(\rho ^{i,k_i}_{j_i}=\varrho ^{i,k_i}_{j_i}-\varrho ^{i,0}_{j_i}\) \((k_i\in {{\mathbb {K}}}_i,\>j\in {{\mathbb {J}}}_i,\>i\in {{\mathbb {I}}})\), \(\rho ^{i,0}_{j_i}=0\) in particular, and \(\rho ^i=(\rho ^{i,k_i}_{j_i})_{k_i\in {{\mathbb {K}}}_{i,0},j_i\in {{\mathbb {J}}}_i}\) (\(i\in {{\mathbb {I}}}\)) with \({{\mathbb {K}}}_{i,0}={{\mathbb {K}}}_i\setminus \{0\}=\{1,...,{\bar{k}}_i\}\). The predictable processes \((Y^i_{j_i}(t))_{t\in {{\mathbb {R}}}_+}\) (\(i\in {{\mathbb {I}}},\>j_i\in {{\mathbb {J}}}_i\)) are observable covariate processes, \({{\mathbb {J}}}_i\) being a finite index set. This is a multinomial logistic regression model.

Let \(\varTheta \) be a bounded open convex set in \({{\mathbb {R}}}^{\textsf {p}}\) with \({\textsf {p}}={\bar{i}}\>{\bar{j}}\). For each \(i\in {{\mathbb {I}}}\), \(\mathcal{R}_i\) denotes a bounded open convex set in \({{\mathbb {R}}}^{{\textsf {p}}_i}\) with \({\textsf {p}}_i={\bar{j}}_i\>{\bar{k}}_i\). Write \(\rho =(\rho ^i)_{i\in {{\mathbb {I}}}}\). Let \(\mathcal{R}=\varPi _{i\in {{\mathbb {I}}}}\mathcal{R}_i\). We will consider \({\overline{\varTheta }}\times {\overline{\mathcal{R}}}\) as the parameter space of \((\theta ,\rho )\).

Remark 1

The marked ratio model

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^{i})= & {} \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^i_jX_j(t)\bigg )\> \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i'}_{j_i} Y^i_{j_i}(t)\bigg )} \end{aligned}$$

is in general not equivalent to a non-marked ratio model in larger dimension, in which we would write the intensity of the counting process of events of type \(i\in {\mathbb {I}}\) with mark \(k_i\in {\mathbb {K}}_i\) as

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^{i,k_i})= & {} {\tilde{\lambda }}_0(t)\exp \bigg (\sum _{j\in {\tilde{{{\mathbb {J}}}}}} \vartheta ^{i,k_i}_jZ_j(t)\bigg ) \end{aligned}$$

for some covariate processes \(Z_j, j\in {\tilde{{{\mathbb {J}}}}}\). Equivalence of the models would require these expressions to coincide for some sets of covariates and parameters. However, if \(Z_j(t)=0\) for all \(j\in {\tilde{{{\mathbb {J}}}}}\), then necessarily \(X_j(t)=0\) for all \(j\in {{\mathbb {J}}}\) and \(Y^i_{j_i}(t)=0\) for all \(i\in {\mathbb {I}}\) and \(j_i\in {\mathbb {J}}_i\). This in turn implies \( \frac{1}{|{\mathbb {K}}_i|} = \frac{{\tilde{\lambda }}_0(t)}{\lambda _0(t)}\) for all \(i\in {\mathbb {I}}\), which is generally not true. In Sect. 4.5, a non-marked ratio model is used as a benchmark to assess the performances of the marked ratio model. Prediction performances are indeed shown to be different.

3 Quasi-likelihood estimation of two-step ratio model

3.1 Quasi-maximum likelihood estimator and quasi-Bayesian estimator

The two step marked ratio model consists of the two kinds of ratio models (2.3) and (2.4). Estimation of this model can be carried out with multiple successive ratio models.

In the first step, we consider the parameter \(\theta =(\theta ^i_j)_{i\in {{\mathbb {I}}}_0,j\in {{\mathbb {J}}}}\) and the ratios (2.3) for \(i\in {{\mathbb {I}}}\). The quasi-log-likelihood based on observations on [0, T] for this ratio model is

$$\begin{aligned} {{\mathbb {H}}}_T(\theta )= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T\log r^i(t,\theta )\>\mathrm{d}N^i_t. \end{aligned}$$
(3.1)

This comes from the multinomial logistic regression. A quasi-maximum likelihood estimator (QMLE) for \(\theta \) is a measurable mapping \({\hat{\theta }}_T^M:\varOmega \rightarrow {\overline{\varTheta }}\) satisfying

$$\begin{aligned} {{\mathbb {H}}}_T({\hat{\theta }}_T^M)= & {} \max _{\theta \in {\overline{\varTheta }}}{{\mathbb {H}}}_T(\theta ) \end{aligned}$$

for all \(\omega \in \varOmega \).Footnote 2

In the second step, we consider the ratios (2.4) and the associated quasi-log-likelihood

$$\begin{aligned} {{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log q^{k_i}_i(t,\rho ^i)\>\mathrm{d}N^{i,k_i}_t \end{aligned}$$
(3.2)

for \(i\in {{\mathbb {I}}}\). Then, a measurable mapping \({\hat{\rho }}^{i,M}_T:\varOmega \rightarrow {\overline{\mathcal{R}}}_i\) is called a quasi-maximum likelihood estimator (QMLE) for \(\rho ^i\) if

$$\begin{aligned} {{\mathbb {H}}}_T^{(i)}({\hat{\rho }}^{i,M}_T)= & {} \max _{\rho ^i\in {\overline{\mathcal{R}}}_i}{{\mathbb {H}}}_T^{(i)}(\rho ^i). \end{aligned}$$

It is possible to pool these estimating functions by the single estimating function

$$\begin{aligned} {{\mathbb {H}}}_T(\theta ,\rho )= & {} {{\mathbb {H}}}_T(\theta ) + \sum _{i\in {{\mathbb {I}}}}{{\mathbb {H}}}_T^{(i)}(\rho ^i). \end{aligned}$$
(3.3)

In other words,

$$\begin{aligned} {{\mathbb {H}}}_T(\theta ,\rho )= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\>\mathrm{d}N^{i,k_i}_t. \end{aligned}$$
(3.4)

The collection of QMLEs \(\big ({\hat{\theta }}_T^M,({\hat{\rho }}_T^{i,M})_{i\in {{\mathbb {I}}}}\big )\) is a QMLE for \({{\mathbb {H}}}_T(\theta ,\rho )\). Use of \({{\mathbb {H}}}_T(\theta ,\rho )\) is convenient when we consider asymptotic distribution of the estimators \({\hat{\theta }}_T^M\) and \({\hat{\rho }}_T^{i,M}\) (\(i\in {{\mathbb {I}}}\)) jointly.

The quasi-Bayesian estimator (QBE) \(\big ({\hat{\theta }}_T^B,({\hat{\rho }}_T^{i,B})_{i\in {{\mathbb {I}}}}\big )\) is defined by

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _{\varTheta \times \mathcal{R}}\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \bigg ]^{-1} \nonumber \\&\times \int _{\varTheta \times \mathcal{R}}\theta \exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \end{aligned}$$
(3.5)

and

$$\begin{aligned} {\hat{\rho }}^{i,B}_T= & {} \bigg [\int _{\varTheta \times \mathcal{R}}\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \bigg ]^{-1} \nonumber \\&\times \int _{\varTheta \times \mathcal{R}}\rho ^i\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \end{aligned}$$
(3.6)

for a prior probability density \(\varpi (\theta ,\rho )\) on \(\varTheta \times \mathcal{R}\). We assume that \(\varpi :\varTheta \times \mathcal{R}\rightarrow {{\mathbb {R}}}_+\) is continuous and

$$\begin{aligned} 0<\inf _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\varpi (\theta ,\rho ) \le \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\varpi (\theta ,\rho )<\infty . \end{aligned}$$
(3.7)

Since \({{\mathbb {H}}}_T(\theta )\) and \({{\mathbb {H}}}_T^{(i)}(\rho ^i)\) have no common parameters, the maximization of \({{\mathbb {H}}}_T(\theta ,\rho )\) with respect to the parameters \(\theta \) and \(\rho ^i\) \((i\in {{\mathbb {I}}})\) can be carried out separately. However, these components are not always individually treated for the QBE. If \(\varpi (\theta ,\rho )\) is a product of prior densities as \(\varpi (\theta ,\rho )=\varpi '(\theta )\varPi _{i\in {{\mathbb {I}}}}\varpi ^i(\rho ^i)\), then the each integral in (3.5) and (3.6) is simplified and we can compute \({\hat{\theta }}_T^B\) and \({\hat{\rho }}_T^{i,B}\) (\(i\in {{\mathbb {I}}}\)) separately:

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _{\varTheta }\exp \big ({{\mathbb {H}}}_T(\theta )\big )\>\varpi '(\theta )\>d \theta \bigg ]^{-1} \int _{\varTheta }\theta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\>\varpi '(\theta )\>d \theta \end{aligned}$$

and

$$\begin{aligned} {\hat{\rho }}^{i,B}_T= & {} \bigg [\int _{\mathcal{R}_i}\exp \big ({{\mathbb {H}}}_T^{(i)}(\rho ^i)\big )\>\varpi ^i(\rho ^i)\>d \rho ^i\bigg ]^{-1} \int _{\mathcal{R}_i}\rho ^i\exp \big ({{\mathbb {H}}}_T^{(i)}(\rho ^i)\big )\>\varpi ^i(\rho ^i)\>d \rho ^i \end{aligned}$$

for \(i\in {{\mathbb {I}}}\).

3.2 Quasi-likelihood analysis

Let \({{\mathbb {X}}}(t)=(X_j(t))_{j\in {{\mathbb {J}}}}\) and let \({{\mathbb {Y}}}^i(t)=\big (Y^i_{j_i}(t)\big )_{j_i\in {{\mathbb {J}}}_i}\) for \(i\in {{\mathbb {I}}}\). We consider the following conditions.

  1. [M1]

    The process \(\big (\lambda _0(t), {{\mathbb {X}}}(t),{{\mathbb {Y}}}(t)\big )\) is a stationary process and the random variables \(\lambda _0(0)\), \(\exp (|X_j(0)|)\) and \(\exp (|Y^i_{j_i}(0)|)\) are in \(L^{\infty \text {--}}=\cap _{p>1}L^p\) for \(j\in {{\mathbb {J}}}\), \(j_i\in {{\mathbb {J}}}_i\) and \(i\in {{\mathbb {I}}}\).

Condition [M1] is not restrictive since the covariates can often be regarded as bounded in applications.

The alpha mixing coefficient \(\alpha (h)\) is defined by

$$\begin{aligned} \alpha (h)= & {} \sup _{t\in {{\mathbb {R}}}_+}\sup _{\genfrac{}{}{0.0pt}{}{A\in \mathcal{B}_{[0,t]}}{B\in \mathcal{B}_{[t+h,\infty )}}} \big |P[A\cap B]-P[A]P[B]\big |, \end{aligned}$$

where for \(I\subset {{\mathbb {R}}}_+\), \(\mathcal{B}_I\) denotes the \(\sigma \)-field generated by \(\big (\lambda _0(t),(X_j(t))_{j\in {{\mathbb {J}}}}, (Y^{i,k_i}_{j_i}(t))_{i\in {{\mathbb {I}}}, j_i\in {{\mathbb {J}}}_i,k_i\in {{\mathbb {K}}}_i}\big )_{t\in I}\).

  1. [M2]

    The alpha mixing coefficient \(\alpha (h)\) is rapidly decreasing in that \(\alpha (h)h^L\rightarrow 0\) as \(h\rightarrow \infty \) for every \(L>0\).

In the two-step ratios model, the category \((i,k_i)\) is selected with twofold multinomial distributions of sample size equal to 1. First the class \(i\in {{\mathbb {I}}}\) is selected when \(\xi _i=1\) for some random variable

$$\begin{aligned} \xi =(\xi _0,...,\xi _{{\bar{i}}})\sim \text {Multinomial}(1;\pi _0,...., \pi _{{\bar{i}}}). \end{aligned}$$

If \(\xi _i=1\) for a class \(i\in {{\mathbb {I}}}\), then the class \(k_i\in {{\mathbb {K}}}_i\) is chosen as \(k_i=k\) when \(\eta ^i_k=1\) for some independent random variable

$$\begin{aligned} \eta ^i=(\eta ^i_0,...,\eta ^i_{{\bar{k}}_i})\sim \text {Multinomial} (1;\pi _0',...,\pi '_{{\bar{k}}_i}). \end{aligned}$$

Denote by \(\mathsf{V}(x,\theta )\) the variance matrix of the \((1+{\overline{i}})\)-dimensional multinomial distribution \(\text {Multinomial}(1;\pi _0,\pi _1,...,\pi _{{\overline{i}}})\) with \(\pi _i={\dot{r}}^i(x,\theta )\), \(i\in {\mathbb {I}}\), where

$$\begin{aligned} {\dot{r}}^i(x,\theta )= & {} \frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jx_j\bigg )}{\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^{i'}_jx_j\bigg )} \>=\>\frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^i_jx_j\bigg )}{1+\sum _{i'\in {{\mathbb {I}}}_0}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^{i'}_jx_j\bigg )}, \quad x=(x_j)_{j\in {{\mathbb {J}}}}. \end{aligned}$$

Denote by \(\mathsf{V}^i(y^i,\rho ^i)\) the variance matrix of the \((1+{\overline{k}}_i)\)-dimensional multinomial distribution \(\text {M{ultinomial}}(1;\pi _0',\pi _1',...,\pi _{{\overline{k}}_i}')\) with \(\pi _{k_i}'={\dot{q}}_i^{k_i}(y^i,\rho ^i)\), \(k_i\in {{\mathbb {K}}}_i\), where

$$\begin{aligned} {\dot{q}}_i^{k_i}(y^i,\rho ^i)&= \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}y^i_{j_i}\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i'}_{j_i} y^i_{j_i}\bigg )} \\&= \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\rho ^{i,k_i}_{j_i}y^i_{j_i}\bigg )}{1+\sum _{k_i'\in {{\mathbb {K}}}_{i,0}}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \rho ^{i,k_i'}_{j_i}y^i_{j_i}\bigg )}, \quad y^i=(y^i_{j_i})\in {{\mathbb {R}}}^{{\bar{j}}_i}\quad (i\in {{\mathbb {I}}}). \end{aligned}$$

Let us introduce some notations used in the following analysis. For a tensor \(\mathsf{T}=(\mathsf{T}_{i_1,...,i_k})_{i_1,...,i_k}\), we write

$$\begin{aligned} \mathsf{T}[u_1,...,u_k] = \mathsf{T}[u_1\otimes \cdots \otimes u_k] = \sum _{i_1,...,i_k}\mathsf{T}_{i_1,...,i_k} u_1^{i_1}\cdots u_k^{i_k} \end{aligned}$$
(3.8)

for \(u_1=(u_1^{i_1})_{i_1}\),..., \(u_k=(u_k^{i_k})_{i_k}\). Brackets \([\ ,..., \ ]\) stand for a multilinear mapping. We denote by \(u^{\otimes r}=u\otimes \cdots \otimes u\) the r times tensor product of u.

Denote by \(\partial _{(\theta ,\rho )}\) the differential operator with respect to \((\theta ,\rho )\). Let

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} -T^{-1}\partial _{(\theta ,\rho )}^2{{\mathbb {H}}}_T(\theta ,\rho ) \end{aligned}$$

and let \(\varGamma _T=\varGamma _T(\theta ^*,\rho ^*)\). Then, as detailed in Section A.2,

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} \text {diag}\big [ \varGamma _T(\theta ),\varGamma _T^1(\rho ^1),...,\varGamma _T^{{\bar{i}}} (\rho ^{{\bar{i}}}) \big ], \end{aligned}$$

where

$$\begin{aligned} \varGamma _T(\theta ) [u^{\otimes 2}] = \frac{1}{T}\int _0^T \bigg (\mathsf{V}_0({\mathbb {X}}(t),\theta )\otimes {\mathbb {X}}(t)^{\otimes 2}\bigg ) [u^{\otimes 2}]\sum _{i\in {\mathbb {I}}}\mathrm{d}N^i_t \quad (u\in {{\mathbb {R}}}^{{\textsf {p}}}) \end{aligned}$$
(3.9)

with \(\mathsf{V}_0(x,\theta )=(\mathsf{V}(x,\theta )_{i,i'})_{i,i'\in {\mathbb {I}}_0}\), and

$$\begin{aligned} \varGamma ^i_T(\rho ^i) [(u^i)^{\otimes 2}] = \frac{1}{T}\int _0^T \bigg (\mathsf{V}^i_0({{\mathbb {Y}}}^i(t),\rho ^i)\otimes {{\mathbb {Y}}}^i(t)^{\otimes 2}\bigg ) [(u^i)^{\otimes 2}]\mathrm{d}N^i_t \quad (u^i\in {{\mathbb {R}}}^{{\textsf {p}}_i}) \end{aligned}$$

with \(\mathsf{V}^i_0(y^i,\rho ^i)=(\mathsf{V}^i(y^i,\rho ^i)_{k_i,k_i'})_{k_i,k_i'\in {{\mathbb {K}}}_{i,0}}\).

Let

$$\begin{aligned} \varLambda (w,x) = w\sum _{i\in {\mathbb {I}}}\exp \big (x\big [\vartheta ^{*i}\big ]\big ) \end{aligned}$$
(3.10)

for \(w\in {\mathbb {R}}_+\) and \(x\in {\mathbb {R}}^{{\overline{j}}}\).

We have

$$\begin{aligned} \mathsf{V}(x,\theta )_{i,i'}= & {} 1_{\{i=i'\}}{\dot{r}}^i(x,\theta )-{\dot{r}}^i(x,\theta ) {\dot{r}}^{i'}(x,\theta ). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathsf{V}({{\mathbb {X}}}(t),\theta )_{i,i'}= & {} 1_{\{i=i'\}}r^i(t,\theta )-r^i(t,\theta )r^{i'}(t,\theta ) \end{aligned}$$
(3.11)

and \(\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}=\mathsf{V}({{\mathbb {X}}}(t),\theta )_{i,i'}\) for \(i,i'\in {{\mathbb {I}}}_0\). Write \(\mathsf{V}_0(x)=\mathsf{V}_0(x,\theta ^*)\).

We have

$$\begin{aligned} \mathsf{V}^i(y^i,\rho ^i)_{k_i,k_i'}= & {} 1_{\{k_i=k_i'\}}{\dot{q}}^{k_i}(y^i,\rho ^i)-{\dot{q}}^{k_i} (y^i,\rho ^i){\dot{q}}^{k_i'}(y^i,\rho ^i). \end{aligned}$$

Hence,

$$\begin{aligned} \mathsf{V}^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}= & {} 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i)-q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \end{aligned}$$
(3.12)

and \(\mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}=\mathsf{V}^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}\) for \(k_i,k_i'\in {{\mathbb {K}}}_{i,0}\). We denote \(\mathsf{V}^i_0(y^i)=\mathsf{V}^i_0(y^i,(\rho ^i)^*)\).

The symmetric matrices \(\varGamma (\theta )\) and \(\varGamma ^i(\rho ^i)\) are defined by

$$\begin{aligned} \varGamma (\theta )[u^{\otimes 2}]= & {} E\bigg [\bigg (\mathsf{V}_0({\mathbb {X}}(0),\theta )\otimes {\mathbb {X}}(0)^{\otimes 2}\bigg )[u^{\otimes 2}] \varLambda (\lambda _0(0),{\mathbb {X}}(0))\bigg ] \end{aligned}$$

for \(u\in {{\mathbb {R}}}^{\textsf {p}}\), and

$$\begin{aligned} \varGamma ^i(\rho ^i)[(u^i)^{\otimes 2}]= & {} E\bigg [\bigg (\mathsf{V}^i_0({{\mathbb {Y}}}^i(0),\rho ^i)\otimes {{\mathbb {Y}}}^i(0)^{\otimes 2}\bigg )[(u^i)^{\otimes 2}] \varLambda (\lambda _0(0),{{\mathbb {X}}}(0))r^i(0,\theta ^*)\bigg ] \end{aligned}$$

for \(u^i\in {{\mathbb {R}}}^{{\textsf {p}}_i}\), \(i\in {{\mathbb {I}}}\), respectively. Let \({\check{{\textsf {p}}}}={\textsf {p}}+\sum _{i\in {{\mathbb {I}}}}{\textsf {p}}_i={\bar{i}}\>{\bar{j}} +\sum _{i\in {{\mathbb {I}}}}{\bar{k}}_i{\bar{j}}_i\). The full information matrix is the \({\check{{\textsf {p}}}}\times {\check{{\textsf {p}}}}\) block diagonal matrix

$$\begin{aligned} \varGamma (\theta ,\rho )= & {} \text {diag}\big [\varGamma (\theta ),\varGamma ^0(\rho ^0), \varGamma ^1(\rho ^1),...,\varGamma ^{{\bar{i}}}(\rho ^{{\bar{i}}})\big ], \end{aligned}$$

and in particular set

$$\begin{aligned} \varGamma = \varGamma (\theta ^*,\rho ^*). \end{aligned}$$
(3.13)

An identifiability condition will be imposed.

  1. [M3]

    \(\displaystyle \inf _{\theta \in \varTheta }\inf _{u\in {{\mathbb {R}}}^{\textsf {p}}:\>|u|=1}\varGamma (\theta ) [u^{\otimes 2}]>0\) and \(\displaystyle \inf _{\rho ^i\in \mathcal{R}_i}\inf _{u\in {{\mathbb {R}}}^{{\textsf {p}}_i}:\>|u^i|=1}\varGamma ^i(\rho ^i) [(u^i)^{\otimes 2}]>0\) for every \(i\in {{\mathbb {I}}}\).

For the QMLE \({\hat{\psi }}_T^M=({\hat{\theta }}^M_T,{\hat{\rho }}^M_T)\) and the QBE \({\hat{\psi }}_T^B=({\hat{\theta }}^B_T,{\hat{\rho }}^B_T)\) of \(\psi =(\theta ,\rho )=(\theta ,\rho ^1,...,\rho ^{{\bar{i}}})\), let

$$\begin{aligned} {\hat{u}}_T^\mathsf{A}= & {} T^{1/2}\big ({\hat{\psi }}_T^\mathsf{A}-\psi ^*) \qquad (\mathsf{A}\in \{M,B\}). \end{aligned}$$

Theorem 1

Suppose that Conditions [M1], [M2] and [M3] are satisfied. Then,

$$\begin{aligned} E[f({\hat{u}}_T^\mathsf{A})]\rightarrow & {} {{\mathbb {E}}}[f(\varGamma ^{-1/2}\zeta )] \end{aligned}$$

as \(T\rightarrow \infty \) for \(\mathsf{A}\in \{M,B\}\) and every \(f\in C({{\mathbb {R}}}^{{\check{{\textsf {p}}}}})\) of at most polynomial growth, where \(\zeta \) is a \({\check{{\textsf {p}}}}\)-dimensional standard Gaussian random vector.

Example 1

As an illustration we consider the case with two processes (\({\mathbb {I}}=\{0,1\}\)), and two marks for each process (\({\mathbb {K}}_0={\mathbb {K}}_1=\{0,1\}\)). The first state-dependent term takes into account one covariate \(X_1\) (i.e., \({\mathbb {J}}=\{1\}\)). The mark distributions both depend on another covariate \(Y_1\) (i.e. \({\mathbb {J}}_0={\mathbb {J}}_1=\{1\}\)). In this example, we assume that \(X_1\) and \(Y_1\) are independent Markov chains with values in \(\{-1,1\}\) and constant transition intensities \(\lambda _X\) and \(\lambda _Y\). We assume that \(\lambda _0\) is the intensity of a Hawkes process \((H_t)_{t\ge 0}\) with a single exponential kernel, i.e., \(\lambda _0(t)=\mu +\int _{0}^t \alpha e^{-\beta (t-s)}\,dH_s\), with \((\alpha ,\beta )\in ({\mathbb {R}}_ +^*)^2, \frac{\alpha }{\beta }<1\).

The two-step ratio model estimates the parameters \((\theta ^1_1,\rho ^{0,1}_1,\rho ^{1,1}_1)\) defined as \(\theta ^1_1=\vartheta ^1_1-\vartheta ^0_1\) and \(\rho ^{i,1}_1=\varrho ^{i,1}_1-\varrho ^{i,0}_1\), \(i=0,1\). In this specific case, the matrix \(\varGamma \) of Eq. (3.13) is a \(3\times 3\)-diagonal matrix, and a direct computation shows that the diagonal coefficients are

$$\begin{aligned} \varGamma _{0,0}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\theta ^1_1}}{1+e^{\theta ^1_1}}\left( \cosh \vartheta ^0_1 + \cosh \vartheta ^1_1 \right) , \\ \varGamma _{1,1}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\rho ^{0,1}_1}}{1+e^{\rho ^{0,1}_1}} \frac{e^{\theta ^1_1/2}}{1+e^{\theta ^1_1}} \left( \cosh \frac{\vartheta ^0_1+\vartheta ^1_1}{2} +\cosh \frac{3\vartheta ^1_1-\vartheta ^0_1}{2}\right) , \\ \varGamma _{2,2}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\rho ^{1,1}_1}}{1+e^{\rho ^{1,1}_1}} \frac{e^{\theta ^1_1/2}}{1+e^{\theta ^1_1}} \left( \cosh \frac{\vartheta ^0_1+\vartheta ^1_1}{2} +\cosh \frac{3\vartheta ^1_1-\vartheta ^0_1}{2}\right) . \end{aligned}$$
Table 1 Numerical results for the estimation of the model of Example 1

We run 1000 simulations of the processes \((N^0, N^1)\) with their marks for various values of horizon T. Numerical values used in these simulations are the following: \(\mu =0.5\), \(\alpha =1.0\), \(\beta = 2.0\), \(\lambda _X=\lambda _Y=0.5\), \(\vartheta ^0_1=-0.75\), \(\vartheta ^1_1=0.75\), \(\varrho ^{0,0}_1=-0.5\), \(\varrho ^{0,1}_1=0.5\), \(\varrho ^{1,0}_1=-1.0\), \(\varrho ^{1,1}_1=1.0\). For each simulation, we compute the quasi-maximum likelihood estimators \(({{\hat{\theta }}}^1_1,{{\hat{\rho }}}^{0,1}_1,{{\hat{\rho }}}^{1,1}_1)\) with the two-step ratios described above. Table 1 gives the mean estimators and the true values of the parameters, as well as the empirical standard deviation, compared to the theoretical values \(T^{-\frac{1}{2}}\varGamma _{i,i}^{-\frac{1}{2}}\), \(i=0,1,2\) from Theorem 1, for various values of T.For completeness, Figure 1 also plots the empirical standard deviations of the three estimators and the theoretical standard deviation \(T^{-\frac{1}{2}}\varGamma _{i,i}^{-\frac{1}{2}}\), \(i=0,1,2\) of Theorem 1, as a function of the horizon T.

Fig. 1
figure 1

Empirical and theoretical standard deviation of the quasi-maximum likelihood estimators \({{\hat{\theta }}}^1_1\) (left), \({{\hat{\rho }}}^{0,1}_1\) (center) and \({{\hat{\rho }}}^{1,1}_1)\) (right)

Asymptotic values predicted by Theorem 1 are indeed empirically retrieved, which ends this numerical illustration.

4 Modeling and predicting sign and aggressiveness of market orders

4.1 Intensities of the processes counting market orders

We consider the market orders submitted to a given limit order book. Let \(N^0\) be the process counting the market orders submitted on the bid side (sell market orders) and \(N^1\) the process counting the market orders submitted on the ask side (buy market orders). On each side, we further consider whether the order is an aggressive order that moves the price (labeled with mark 1), or a non-aggressive order that does not move the price (labeled with mark 0).

We assume that the intensity of an order of type \(i\in {\mathbb {I}}=\{0,1\}\) with mark \(k_i\in {\mathbb {K}}={\mathbb {K}}_0={\mathbb {K}}_1=\{0,1\}\) is

$$\begin{aligned} \lambda ^{i, k_i}(t, \vartheta ^i, \varrho ^{i}) = \lambda _0(t) \exp \left( \! \sum _{j\in {\mathbb {J}}} \vartheta ^i_j X_j(t) \!\right) \frac{\exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k_i}_{j_i} Y^i_{j_i}(t) \right) }{\sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \!\sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k'_i}_{j_i} Y^i_{j_i}(t)\! \right) }. \end{aligned}$$
(4.1)

In the following applications, we will consider several possible models defined with various sets of covariates \(X_j\), \(j\in {\mathbb {J}}\) and \(Y^i_j\), \(j\in {\mathbb {J}}_i\), \(i=0,1\). The tested sets of covariates \(X_j\), \(j\in {\mathbb {J}}\) and \(Y^i_j\), \(j\in {\mathbb {J}}_i\), \(i=0,1\) will all be subsets of the following list of possible covariates (besides \(Z_0=1\) common to all models):

  • \({Z_1} \): \(\frac{q^B(t)-q^A(t)}{q^B(t){+}q^A(t)}\) where \(q^B(t)\) (resp. \(q^A(t)\)) is the quantity available at the best bid (resp.ask) at time t (i.e., the imbalance);

  • \({Z_2}\): \(\epsilon (t)\), where \(\epsilon (t)\) is the sign of the last market order at time t (1 for an ask market order, \(-1\) for a bid market order ;

  • \({Z_3}\): \({s(t)}\epsilon (t)\) the signed spread, where s(t) is the observed spread in currency at time t ;

  • \(Z_4\): \(H^{0,1}(t) = \log \left( \mu ^{0,1} + \int _0^t \alpha ^{0,1} e^{-\beta ^{0,1}(t-s)} \mathrm{d}N^{0,1}_s \right) \) (Hawkes covariate for aggressive bid market orders)

  • \(Z_5\): \(H^{0,0}(t) = \log \left( \mu ^{0,0} + \int _0^t \alpha ^{0,0} e^{-\beta ^{0,0}(t-s)} \mathrm{d}N^{0,0}_s \right) \) (Hawkes covariate for non-aggressive bid market orders)

  • \(Z_6\): \(H^{1,1}(t) = \log \left( \mu ^{1,1} + \int _0^t \alpha ^{1,1} e^{-\beta ^{1,1}(t-s)} \mathrm{d}N^{1,1}_s \right) \) (Hawkes covariate for aggressive ask market orders)

  • \(Z_7\): \(H^{1,0}(t) = \log \left( \mu ^{1,0} + \int _0^t \alpha ^{1,0} e^{-\beta ^{1,0}(t-s)} \mathrm{d}N^{1,0}_s \right) \) (Hawkes covariate for non-aggressive ask market orders)

  • \(Z_8\): \(H^{0}(t) = \log \left( \mu ^{0} + \int _0^t \alpha ^{0} e^{-\beta ^{0}(t-s)} \mathrm{d}N^{0}_s \right) \) (Hawkes covariate for bid market orders)

  • \(Z_9\): \(H^{1}(t) = \log \left( \mu ^{1} + \int _0^t \alpha ^{1} e^{-\beta ^{1}(t-s)} \mathrm{d}N^{1}_s \right) \) (Hawkes covariate for ask market orders).

With these Hawkes covariates, the ratio model can actually be seen as a kind of non-linear Hawkes process. When the theory applied, the ergodicity is an assumption. In the present model, it depends on the nature of the process \(\lambda _0\), that was set generally. Brémaud and Massoulié (1996) treated a stability problem of a nonlinear Hawkes process. If the system has a Markovian representation, there is a possibility of applying a drift condition like Abergel and Jedidi (2015) and Clinet and Yoshida (2017). On the other hand, the intraday stationarity (ergodicity) is not essentially important. As described in Section 3.2 of Muni Toke and Yoshida (2020), in quite parallel to the simple stationary case, we can relax the assumption of intraday stationarity by considering a repeated measurements model. Then, we only need a more realistic ergodicity of the data across the long-run repeated measurements, and after all, we can validate the methods.

4.2 Limit order book data

We use tick-by-tick data for 36 stocks traded on Euronext Paris. The sample spans the whole year 2015, i.e., roughly 200 trading days for each stock, although some days are missing for some stocks. Table 3 in Sect. 7 lists the stocks investigated and the number of trading days available. Rough data consist in a TRTH (Thomson-Reuters Tick History) database: for each trading day and each stock, one file lists the transactions (quantities and prices) and one file lists the modifications of the limit order book (level, price and quantities). Timestamps are given with a millisecond precision. Synchronization of both files and reconstruction of the limit order book are carried out with the procedure described in Muni Toke (2016). One strong advantage of the ratio model is that it does not require precise timestamps in itself, since timestamps do not appear explicitly in the quasi-likelihood of the ratios, while fitting other intensity-based models (e.g., Hawkes processes) requires unique precise timestamps for log-likelihood computation. Here, if Hawkes fits are used as covariates (covariates \(Z_4\) to \(Z_9\) in our application), then we choose to consider only unique timestamps, i.e., we aggregate orders of the same type occurring at the same timestamp.

4.3 Estimation procedure of the two-step ratio model

Following Sects. 2 and 3, estimation of the model defined at Eq. (4.1) can be carried out with multiple successive ratio models. In the first step, we consider the difference parameters \(\theta ^i_j = \vartheta ^i_j - \vartheta ^0_j, i\in {\mathbb {I}}\setminus \{0\}, j\in {\mathbb {J}}\) and the ratios \((i\in {\mathbb {I}}\setminus \{0\})\):

$$\begin{aligned} r^i(t,\theta )&= \frac{ \exp \left( \sum _{j\in {\mathbb {J}}} \vartheta ^i_j X_ j(t)\right) }{ \sum _{i'\in {\mathbb {I}}} \exp \left( \sum _{j\in {\mathbb {J}}} \vartheta ^{i'}_j X_ j(t)\right) } = \left[ \sum _{i'\in {\mathbb {I}}} \exp \left( \sum _{j\in {\mathbb {J}}} ( \theta ^{i'}_j - \theta ^i_j) X_ j(t)\right) \right] ^{-1}. \end{aligned}$$
(4.2)

The quasi-log-likelihood based on the observation on [0, T] for this ratio model is defined at Eq. (3.1). In the second step, we consider the ratios

$$\begin{aligned} p^{k_i}_i(t, \varrho ^i)&= \frac{\exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k_i}_{j_i} Y^i_{j_i}(t) \right) }{\sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k'_i}_{j_i} Y^i_{j_i}(t) \right) } = \left[ \sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} ( \varrho ^{i,k'_i}_{j_i} - \varrho ^{i,k_i}_{j_i}) Y^i_ {j_i}(t)\right) \right] ^{-1}, \end{aligned}$$
(4.3)

and the associated quasi-log-likelihood of Eq. (3.2). Consistency and asymptotic normality of the quasi-maximum likelihood estimators are guaranteed by Theorem 1.

4.4 In-sample model selection with QAIC and QBIC

In this first application, we perform in-sample model selection to assess the relevance of the different possible sets of covariates. For each stock and each trading day, we fix a set of covariates. We use the indices of the tested covariates to name the models: the model 146 is, thus, the model with covariates \((Z_1,Z_4,Z_6)\). If required, we estimate the parameters of all the Hawkes covariates on the previous day and then compute the Hawkes covariates using these fitted parameters. This procedure ensures that the predictability of the covariates is not violated. We finally fit three ratio models following the above procedure : one for the processes \((N^0, N^1)\) (signature of the marker orders), one for the processes \((N^{0,0}, N^{0,1})\) (aggressiveness of the bid market orders) and one for the processes \((N^{1,0}, N^{1,1})\) (aggressiveness of the ask market orders).

For each trading day, we then select the model minimizing some information criterion. For the ratio for the side determination, the criterion is

$$\begin{aligned} -2 {\mathbb {H}}_T({\hat{\theta }}^M_T)+ {a_T} |{\mathbb {J}}|, \end{aligned}$$
(4.4)

where \(|{\mathbb {J}}|\) is the cardinality of the set of \({\mathbb {J}}\) and \(a_T=2\) for the QAIC criterion, and \(a_T=\log (T)\) for the QBIC criterion. For the aggressiveness ratios, the criterion is

$$\begin{aligned} -2 {\mathbb {H}}^{(i)}_T({{\hat{\varrho }}}^{i})+{a_T} |{\mathbb {J}}_i| \quad (i\in {\mathbb {I}}). \end{aligned}$$
(4.5)

We finally compute for each stock the frequencies of selection of different sets of covariates (i.e., the number of trading days in which a model is selected by QAIC or QBIC over the total number of trading days in the sample for this stock). Figures 2, 3 and 4 plot the results as a model \(\times \) stock heatmap for each of these three ratiosFootnote 3. For completeness, Tables 4, 5 and 6 in Sect. 8 list for each ratio model (side, bid aggressiveness, ask aggressiveness) the frequency of selection averaged across stocks for each model and each information criterion.

Fig. 2
figure 2

Side of market orders—Frequency of selection of each model by the QAIC and QBIC criteria, for each stock

Fig. 3
figure 3

Aggressiveness of bid market orders—Frequency of selection of each model by the QAIC and QBIC criteria, for each stock

Fig. 4
figure 4

Aggressiveness of ask market orders—Frequency of selection of each model by the QAIC criterion and QBIC criteria, for each stock

For side determination, the models 14689, 124689, 134689 and 1234689 are the four most often chosen models: the selected model is among these four models more than \(80\%\) of the time in average across stocks using QAIC, and close the \(90\%\) of the time using QBIC. As expected, QBIC favors the smallest model 14689. Imbalance, Hawkes covariates for bid and ask market orders, and Hawkes covariates for aggressive bid and ask market orders, thus, appear to be the most informative covariates.

For aggressiveness determination, the model 146 is the most often selected by QBIC. This is in line with intuition: imbalance is known to be a significant proxy for price change (see, e.g., Lipton et al. 2013) and Hawkes covariates for aggressive bid and aggressive ask are specific to the targeted events. QAIC selection is more widespread and favors a larger model (as expected), namely 12346. Note also that for several stocks, models with “symmetric” sets of covariates can also be chosen: for ask aggressiveness, 1679 is often selected, i.e., imbalance and all available ask Hawkes covariates; symmetrically, 1458 is selected for ask aggressiveness, i.e., imbalance and all available bid Hawkes covariates.

Fig. 5
figure 5

Frequency of spread selection among all stocks and trading days in the aggressiveness ratio model as a function of the mean observed spread in ticks (5%-quantile bins)

One may in particular observe that these results confirm the primary role of the spread measured in ticks in the theory of financial microstructure. Stocks for which the observed spread is mostly equal to one tick are labeled ’large-tick stocks’, implying that market participants are constrained by the price grid when submitting orders to the limit order book. Other stocks may be labeled ’small-tick stocks’ (Eisler et al. 2012). Using our sample, we compute the mean observed spread in ticks for each stock and each available trading day, and group these values in bins of equal sizes. Then inside each bin, we compute the frequency of selection of the covariate \({Z_3}\) (signed spread) by QBIC for the aggressiveness ratio estimation of Equation (4.3). Bar plot is provided in Fig. 5 (left). We observe an increase of the frequency of the selection of the spread covariate when the mean observed spread increases from 1 tick (its minimal possible value) to roughly \(2.{5}\) ticks. For larger spread values, frequency remains high then seems to decrease at high values. This indicates that the significancy of covariates, especially the spread, is not the same for large-tick and small-tick stocks, and that even for small-tick stocks, dependency is not constant/uniform. This visual observation can, for example, be complemented by the following statistical test. For all stocks and trading days, we compute the empirical cumulative distributions functions of the daily mean spread in ticks (i) when the spread covariate is selected by QBIC in the aggressiveness ratios, and (ii) when the spread covariate is not selected. A one-sided Kolmogorov–Smirnov test rejects (with p-value \(10^{-53}\)) the fact that both distributions are identical, and chooses the alternative hypothesis that the spread covariate is more selected for larger observed spreads. Recall that many microstructure models are developed for large-tick stocks, since assuming a constant spread equal to one tick often simplifies the analysis of the limit order book dynamics. Our observation advocates for the definition of specific microstructure models for small-tick stocks, taking into account the spread dynamics.

Model selection consistency validates the use of QBIC. See Eguchi and Masuda (2018), or follow Muni Toke and Yoshida (2020) for a direct proof for consistency including other criteria. However, the real performance in prediction of a selected model is more important than the model selection consistency. It is worth trying QAIC, or the consistent QAIC.

4.5 Out-of-sample prediction performance

In this section, we use intensity and ratio models to predict the sign and aggressiveness of an incoming market order. For all tested models, the procedure is the following. On a given trading day, the model is fitted. Fitted parameters are then used on the following trading day (available in the database) to compute the intensities (or ratios for ratio models), at all time. The type of an incoming event is then predicted to be the type of highest intensity or ratio. The exercise is theoretical in the sense that we assume that these computations are instantaneous, so that intensities or ratios are available at all times.

Recall the notation \(N=(N^{i,k_i})_{i\in \{0,1\}, k_i\in \{0,1\}}\) for the four-dimensional point process counting bid aggressive market orders, bid non-aggressive market orders, ask aggressive market orders and ask non-aggressive market orders. We use two benchmark models.

The first benchmark model is the Hawkes model. Here, N is assumed to be a four-dimensional Hawkes process with a single exponential kernel. In vector notation, the intensity is written as

Estimation and ratio computation can be found in, e.g., Bowsher (2007); Muni Toke and Pomponio (2012). This model is labeled ‘Hawkes’.

The second benchmark model is the four-dimensional ratio model without marks (Muni Toke and Yoshida (2020)). In this model, the intensity of the counting process \((i,k_i)\) is

$$\begin{aligned} \lambda _{R}^{i,k_i}(t) = \lambda _{0,R}(t) \exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^{i,k_i}_j X_j(t) \bigg ), \end{aligned}$$

with some unobserved baseline intensity \(\lambda _{0,R}(t)\). Given the previous observations, we choose the set of covariates \((Z_1,Z_4,Z_6,Z_8,Z_9)\) for this benchmark. It is natural to choose these covariates (imbalance, Hawkes for aggressive orders and Hawkes for all orders) given the results on model selection of Sect. 4.4. Estimation and ratio computation are detailed in Muni Toke and Yoshida (2020). This model is labeled ’Ratio-14689’.

These two benchmarks are used to assess the performances of two marked ratio models (or two-step ratio models) described in this paper. The first marked ratio model uses the covariates \((Z_4,Z_5,Z_6,Z_7)\) for both steps. These covariates are based on the Hawkes processes of the benchmark Hawkes model. The second marked ratio model uses the covariates \((Z_1,Z_4,Z_6,Z_8,Z_9)\) for the first-step ratio (side determination) and \((Z_1,Z_4,Z_6)\) for both second-step ratios (bid and ask aggressiveness). Again, these choices are natural given the results on model selection of Sect. 4.4. These models are labeled ’MarkedRatio-4567-4567-4567’ and ’MarkedRatio-14689-146-146’, respectively.

Fig. 6
figure 6

Out-of-sample prediction performances for the benchmark models and the marked ratio models. Label explanation is in the text

Fig. 7
figure 7

Out-of-sample partial prediction performances for the side prediction (left) and aggressiveness prediction (right), for the benchmark models and the marked ratio models. Label explanation is in the text

Figure 6 plots the results for each stock for the two benchmark models and the two marked ratio models. For completeness, the partial performances for side determination and aggressiveness determination of the trades are provided on Fig. 7. Finally, Table 2 lists the partial and global prediction performances of these models averaged across stocks. The benchmark Hawkes model correctly predicts the sign and aggressiveness of an incoming order with an accuracy in the range \([40\%,60\%]\) for all stocks, with a \(50\%\) average. The marked ratio model with only Hawkes parameters (’MarkedRatio-4567-4567-4567’) and no dependency on the state of the limit order book actually reproduces closely these performances. The non-marked ratio model ’Ratio-14689’ improves slightly the global performances of the two previous models. When looking at the partial accuracies, we observe that this improvement is mainly due to a better side prediction. Finally, the ’MarkedRatio-14689-146-146’, which appeared to be in average the best model with respect to the QBIC selection, results strongly outperforms all other models. The global accuracy is in the range \([60\%,80\%]\) for all stocks, with a \(67\%\) average, i.e., we are theoretically able to correctly predict both the sign and aggressiveness of an incoming market order two times out of three. Finally, we observe by comparing side determination of ’Ratio-14689’ and ’MarkedRatio-14689-146-146’ that the decoupling of the side and aggressiveness ratios in the marked ratio model significantly improves the prediction performance over the one-step four-dimensional case, while using the same covariates.

Table 2 Prediction performances of selected models averaged across stocks.

These results show that the two-step ratio model for marked point processes is a significant improvement to existing intensity models. As in the standard ratio model of Muni Toke and Yoshida (2020), this provides an easy way to have both clustering and state-dependency. However, it is important to note that the two-step ratio strongly improves the performance of the standard ratio model in multidimensional setting. In this example, flexibility in the choice of covariates allows for precise model selection for both sign and aggressiveness.

5 Proof of Theorem 1

The convergence given in Theorem 1 can be obtained by the quasi-likelihood analysis, which we recall in Section 6. We will apply Theorems 3 and 5 in Sect. 6 to the double ratio model. In the present situation, the scaling factor is \(b_T=T\), the joint parameter \((\theta ,\rho )\) is for \(\theta \) in Section 6, and the dimension of the full parameter space is \({\check{{\textsf {p}}}}\) in place of \({\textsf {p}}\) of Section 6. Fix a set of values of parameters \((\alpha ,\beta _1,\beta _2,\rho , \rho _1,\rho _2)\) so that Condition [L1] (Section 6) is met with \(\rho =2\).

5.1 Score functions and a central limit theorem

The score function for \(\rho ^i\) is given by

$$\begin{aligned} F^{(i)}_T(\rho ^i)= & {} \partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i) \>=\>\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\partial _{\rho ^i}\log q^{k_i}_i(t,\rho ^i)\mathrm{d}N^{i,k_i}_t. \end{aligned}$$

Then,

$$\begin{aligned} F^{(i)}_T(\rho ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big ( 1_{\{k_i\}}(\cdot )-q^\flat _i(t,\rho ^i)\big )\otimes {{\mathbb {Y}}}^i(t)\mathrm{d}N^{i,k_i}_t, \end{aligned}$$
(5.1)

where \(q^\flat _i(t,\rho ^i)=(q_i^{k_i}(t,\rho ^i))_{k_i\in {{\mathbb {K}}}_{i,0}}\), \({{\mathbb {Y}}}^i(t)=(Y^i_{j_i}(t))_{j_i\in {{\mathbb {J}}}_i}\) and \(1_{\{k_i\}}(\kappa )=\big (1_{\{\kappa =k_i\}}\big )_{\kappa \in {{\mathbb {K}}}_{i,0}}\). By some calculus with (2.1) and \(p^{k_i}_i(t,\varrho ^i)=q^{k_i}_i(t,\rho ^i)\), we see

$$\begin{aligned} F^{(i)}_T:=F^{(i)}_T((\rho ^i)^*)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big ( 1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big )\otimes {{\mathbb {Y}}}^i(t)d {\tilde{N}}^{i,k_i}_t. \end{aligned}$$
(5.2)

We are assuming that the counting processes \(N^{i,k_i}\) (\(i\in {{\mathbb {I}}};\>j_i\in {{\mathbb {K}}}_i\)) have no common jumps. Then, the \({\textsf {p}}_i\times {\textsf {p}}_{i'}\) matrix valued process

$$\begin{aligned} \langle F^{(i)},F^{(i')}\rangle _T= & {} 0\quad (i,i'\in {{\mathbb {I}}},\>i\not =i') \end{aligned}$$
(5.3)

and

$$\begin{aligned} \langle F^{(i)}\rangle _T= & {} \sum _{k_{i}\in {{\mathbb {K}}}_i}\int _0^T\bigg \{{\big (}1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big )\otimes {{\mathbb {Y}}}^i(t) \bigg \}^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho ^i)^*)\mathrm{d}t \\ {}= & {} \int _0^T\mathsf{V}^i_0({{\mathbb {Y}}}^i(t),(\rho ^i)^*)\otimes ({{\mathbb {Y}}}^i(t))^{\otimes 2} \>\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))r^i(t,\theta ^*)\mathrm{d}t \quad (i\in {{\mathbb {I}}}). \end{aligned}$$

Therefore, the mixing property [M2] gives the convergence

$$\begin{aligned}&T^{-1}\langle F^{(i)}\rangle _T \rightarrow ^p \varGamma ^{(i)}((\rho ^i)^*)\nonumber \\&\quad = E\bigg [\mathsf{V}^i_0({{\mathbb {Y}}}^i(0),(\rho _i)^*)\otimes ({{\mathbb {Y}}}^i(0))^{\otimes 2} \>\varLambda (\lambda _0(t),{{\mathbb {X}}}(0))r^i(0,\theta ^*)\bigg ] \end{aligned}$$
(5.4)

as \(T\rightarrow \infty \), with the aid of [M1].

The score function for \(\theta \) is the \({\textsf {p}}\)-dimensional process

$$\begin{aligned} F_T(\theta )= & {} \partial _{\theta }{{\mathbb {H}}}_T(\theta ) \>=\>\sum _{i\in {{\mathbb {I}}}}\int _0^T\partial _{\theta }\log r^i(t,\theta )\mathrm{d}N^i_t \nonumber \\ {}= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T \big (1_{\{i\}}(\cdot )-r^\flat (t,\theta )\big )\otimes {{\mathbb {X}}}(t)\mathrm{d}N^i_t, \end{aligned}$$
(5.5)

where \(r^\flat (t,\theta )=(r^i(t,\theta ))_{i\in {{\mathbb {I}}}_0}\). Evaluated at \(\theta ^*\),

$$\begin{aligned} F_T= & {} F_T(\theta ^*) \>=\>\sum _{i\in {{\mathbb {I}}}}\int _0^T\big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*)\big )\otimes {{\mathbb {X}}}(t)d{\tilde{N}}^i_t \nonumber \\= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\big (1_{\{i\}}(\cdot ) -r^\flat (t,\theta ^*)\big )\otimes {{\mathbb {X}}}(t)d{\tilde{N}}^{i,k_i}_t. \end{aligned}$$
(5.6)

Then, the \({\textsf {p}}\times {\textsf {p}}\) matrix valued process \(\langle F\rangle \) has the expression

$$\begin{aligned} \langle F\rangle _T= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\big (1_{\{i\}}(\cdot ) -r^\flat (t,\theta ^*)\big )^{\otimes 2} \otimes {{\mathbb {X}}}(t)^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho _i)^*)\mathrm{d }t \\ {}= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T\big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*) \big )^{\otimes 2} \otimes {{\mathbb {X}}}(t)^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \\ {}= & {} \int _0^T\mathsf{V}_0({{\mathbb {X}}}(t))\otimes {{\mathbb {X}}}(t)^{\otimes 2}\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t. \end{aligned}$$

Then, the mixing property [M2] provides the convergence

$$\begin{aligned} T^{-1}\langle F\rangle _T&\rightarrow ^p&\varGamma (\theta ^*) \>=\>E\bigg [\bigg (\mathsf{V}_0({\mathbb {X}}(0))\otimes {\mathbb {X}}(0)^{\otimes 2}\bigg ) \varLambda (\lambda _0(0),{\mathbb {X}}(0))\bigg ] \end{aligned}$$
(5.7)

as \(T\rightarrow \infty \).

For \(i\in {{\mathbb {I}}}\),

$$\begin{aligned} \langle F,F^{(i)}\rangle _T= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*)\big )\otimes \big (1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big ) \otimes {{\mathbb {X}}}(t)\otimes {{\mathbb {Y}}}^i(t) \nonumber \\&\times r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho _i)^*)\mathrm{d}t \nonumber \\= & {} 0 \end{aligned}$$
(5.8)

since

$$\begin{aligned} \sum _{k_i\in {{\mathbb {K}}}_i}\big (1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*) \big )q^{k_i}_i(t,(\rho _i)^*)= & {} 0. \end{aligned}$$

The full information matrix is the \({\check{{\textsf {p}}}}\times {\check{{\textsf {p}}}}\) block diagonal matrix

$$\begin{aligned} \varGamma \>=\>\varGamma (\theta ^*,\rho ^*)= & {} \text {diag}\big [\varGamma (\theta ^*),\varGamma ^0((\rho ^0)^*), \varGamma ^1((\rho ^1)^*),...,\varGamma ^{{\bar{i}}}((\rho ^{{\bar{i}}})^*)\big ]. \end{aligned}$$

Let \(\varDelta _T=T^{-1/2}\big (F_T,(F^{(i)}_T)_{i\in {{\mathbb {I}}}}\big )\). Now, by the martingale central limit theorem, it is easy to obtain the convergence

$$\begin{aligned} \varDelta _T&\rightarrow ^d&\varGamma ^{1/2}\zeta \quad (T\rightarrow \infty ), \end{aligned}$$

where \(\zeta \) is a \({\check{{\textsf {p}}}}\)-dimensional standard Gaussian random vector. The joint convergence \((\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )\) is obvious since \(\varGamma \) is deterministic.

5.2 Condition [L4]

According to (6.2), we define the random field \({{\mathbb {Y}}}_T:\varOmega \times {\overline{\varTheta }}\times {\overline{\mathcal{R}}}\rightarrow {{\mathbb {R}}}\) by

$$\begin{aligned} {{\mathbb {Y}}}_T(\theta ,\rho ) = T^{-1}\big ({{\mathbb {H}}}_T(\theta ,\rho )-{{\mathbb {H}}}_T(\theta ^*,\rho ^*)\big ) \end{aligned}$$

for \({{\mathbb {H}}}_T(\theta ,\rho )\) given in (3.3). From the expression (3.4) of \({{\mathbb {H}}}_T(\theta ,\rho )\), we have

$$\begin{aligned} T^{-1}{{\mathbb {H}}}_T(\theta ,\rho )= & {} T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big ) \mathrm{d}N^{i,k_i}_t \\= & {} T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t \\&+T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big \{\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i) \big \}\\&\times \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^i_jX_j(t)\bigg )\>p_i^{k_i}(t,(\varrho ^*)^{i,k_i})\mathrm{d}t. \end{aligned}$$

By definition,

$$\begin{aligned} \big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |\le & {} C\bigg (1+\sum _{j\in {{\mathbb {J}}}}|X_j(t)|+\sum _{i\in {{\mathbb {I}}}}\sum _{j_i\in {{\mathbb {J}}}_i} |Y^{k_i}_{j_i}(t)|\bigg )\, (\ell =0,1), \end{aligned}$$

where C is a constant depending on the diameters of \(\varTheta \) and \(\mathcal{R}\). Therefore, under Condition [M1],

$$\begin{aligned}&E\bigg [\bigg |\partial _{(\theta ,\rho )}^\ell T^{-1/2} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\mathrm{d}{\tilde{N}}^{i,k_i}_t \bigg |^{2^k}\bigg ] \\&\quad {{\mathop {\sim }\limits ^{{\textstyle<}}}}\ E\bigg [\bigg (T^{-1} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 \mathrm{d}N^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ]\\&\quad {{\mathop {\sim }\limits ^{{\textstyle <}}}}\ E\bigg [T^{-1} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^{2^k} \lambda ^{i,k_i}(t,(\vartheta ^i)^*,(\varrho ^{i,k_i})^*)\mathrm{d}t\bigg ]\\&\qquad +T^{-2^{k-2}}E\bigg [\bigg (T^{-1/2} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 d{\tilde{N}}^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ] \\&\quad = O(1)+T^{-2^{k-2}}E\bigg [\bigg (T^{-1/2} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 d{\tilde{N}}^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ] \end{aligned}$$

for \(k\in {{\mathbb {N}}}\), where the constant appearing at each \({{\mathop {\sim }\limits ^{{\textstyle <}}}}\ \) depends only on \({\check{{\textsf {p}}}}\), k and the constant of the Burkholder–Davis–Gundy inequality. By induction, we obtain

$$\begin{aligned} \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\sup _{T\ge 1}\bigg \Vert \partial _{(\theta ,\rho )}^\ell T^{-1/2} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t\bigg \Vert _p< & {} \infty \end{aligned}$$
(5.9)

for every \(p>1\) and \(\ell \in \{0,1\}\). Then, Sobolev’s inequality gives

$$\begin{aligned} \sup _{T\ge 1} \bigg \Vert \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \bigg |T^{-1/2}\int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t\bigg | \>\bigg \Vert _p< & {} \infty \end{aligned}$$
(5.10)

for every \(p>1\).

Let

$$\begin{aligned} \varPhi (t,\theta ,\rho )= & {} {\sum _{i\in {{\mathbb {I}}}} \sum _{k_i\in {{\mathbb {K}}}_i}\bigg \{r^i(t,\theta ^*) p_i^{k_i}(t,(\varrho ^i)^*) \log \frac{r^i(t,\theta )q_i^{k_i}(t,\rho ^i)}{r^i(t,\theta ^*)q_i^{k_i} (t,(\rho ^i)^*)}\bigg \} } \\&\qquad \times \lambda _0(t)\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^{i'}_jX_j(t)\bigg ). \end{aligned}$$

Then, Conditions [M1] and [M2] imply a Rosenthal type inequality under the mixing condition (cf. Rio Rio (2017))

$$\begin{aligned} \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \sup _{T\ge 1}\bigg \Vert T^{-1/2}\int _0^T\partial _{(\theta ,\rho )}^\ell \big (\varPhi (t,\theta ,\rho )-E[\varPhi (t,\theta ,\rho )]\big )\mathrm{d}t\bigg \Vert _p< & {} \infty \end{aligned}$$

for every \(p>1\) and \(\ell \in \{0,1\}\). This entails

$$\begin{aligned} \sup _{T\ge 1}\bigg \Vert T^{1/2}\sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \bigg |T^{-1}\int _0^T\big (\varPhi (t,\theta ,\rho )-E[\varPhi (t,\theta ,\rho )] \big )\mathrm{d}t\bigg |\bigg \Vert _p< & {} \infty \end{aligned}$$
(5.11)

for every \(p>1\).

Combining (5.11) with (5.10), we obtain

$$\begin{aligned} \sup _{T\ge 1}E\bigg [\bigg (T^{1/2}\sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \big |{{\mathbb {Y}}}_T(\theta ,\rho )-{{\mathbb {Y}}}(\theta ,\rho )\big |\bigg )^p\bigg ]< & {} \infty \end{aligned}$$
(5.12)

for every \(p>1\), if we set

$$\begin{aligned} {{\mathbb {Y}}}(\theta ,\rho )= & {} {E[\varPhi (0,\theta ,\rho )]} \nonumber \\= & {} E\bigg [{\sum _{i\in {{\mathbb {I}}}} \sum _{k_i\in {{\mathbb {K}}}_i}\bigg \{r^i(t,\theta ^*) p_i^{k_i}(0,(\varrho ^i)^*) \log \frac{r^i(0,\theta )q_i^{k_i}(0,\rho ^i)}{r^i(0,\theta ^*)q_i^{k_i} (0,(\rho ^i)^*)}\bigg \} } \\&\qquad \times \lambda _0(0)\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^{i'}_jX_j(0)\bigg )\bigg ]. \end{aligned}$$

This verifies Condition [L4](ii).

As (6.1), we define \(\varGamma _T(\theta ,\rho )\) by

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} -T^{-1}\partial _{(\theta ,\rho )}^2{{\mathbb {H}}}_T(\theta ,\rho ). \end{aligned}$$

From (5.1),

$$\begin{aligned} \partial _{\rho ^i}^2{{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} -\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \partial _{\rho ^i}q^\flat _i(t,\rho ^i)\otimes {{\mathbb {Y}}}^i(t)\mathrm{d}N^{i,k_i}_t. \end{aligned}$$

More precisely,

$$\begin{aligned} \partial _{\rho ^{i,k_i}_{j_i}}\partial _{\rho ^{i,k_i'}_{j_i'}} {{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t)\mathrm{d}N^{i,k_i''}_t \\ {}= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t)\mathrm{d}{\tilde{N}}^{i,k_i''}_t \\&-\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t) \\&\qquad \qquad \qquad \times r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \\ {}= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'} (t)d{\tilde{N}}^{i,k_i''}_t \\&-\int _0^T \mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'} {{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t) \varLambda (\lambda _0(t),{{\mathbb {X}}}(t))r^i(t,\theta ^*)\mathrm{d}t \end{aligned}$$

for \(k_i,k_i'\in {{\mathbb {K}}}_{i,0}\), \(j_i,j_i'\in {{\mathbb {J}}}_i\) and \(i\in {{\mathbb {I}}}\), where (3.12) was used. Similarly, from (5.5),

$$\begin{aligned} \partial _\theta ^2{{\mathbb {H}}}_T(\theta )= & {} -\sum _{i\in {{\mathbb {I}}}}\int _0^T\partial _\theta r^\flat (t,\theta )\otimes {{\mathbb {X}}}(t)\mathrm{d}N^i_t, \end{aligned}$$

equivalently,

$$\begin{aligned} \partial _{\theta ^i_j}\partial _{\theta ^{i'}_{j'}}{{\mathbb {H}}}_T(\theta )= & {} -\sum _{i''\in {{\mathbb {I}}}}\int _0^T\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}X_j(t)X_{j'}(t)d{\tilde{N}}^{i''}_t \\&-\int _0^T\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}X_j(t)X_{j'}(t)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \end{aligned}$$

for \(i,i'\in {{\mathbb {I}}}_0\) and \(j,j'\in {{\mathbb {J}}}\). Obviously,

$$\begin{aligned} \partial _{\theta }\partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i)\>=\>0 \quad \text {and}\quad \partial _{\rho ^{i'}}\partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} 0 \quad (i',i\in {{\mathbb {I}}}:\>i'\not =i). \end{aligned}$$

In a way similar to the derivation of (5.12), as a matter of fact it is easier, we can show

$$\begin{aligned} \sup _{T\ge 1}E\big [\big (T^{1/2}|\varGamma _T(\theta ^*,\rho ^*) -\varGamma |\big )^p\big ]< & {} \infty \end{aligned}$$

for every \(p>1\) under Conditions [M1] and [M2]. Therefore, Condition [L4](iv) for \(\beta _1=1/2\) was verified. It is also possible to show [L4](iii) in a similar fashion using the mixing property and Sobolev’s inequality. Condition [L4](i) is already checked in (5.9). Thus, Condition [L4] has been verified.

5.3 Conditions [L2] and [L3]

We see

$$\begin{aligned} \partial _{(\theta ,\rho )}^2{{\mathbb {Y}}}(\theta ,\rho )=\varGamma (\theta ,\rho ), \end{aligned}$$

and by [M3], we conclude \({{\mathbb {Y}}}(\theta ,\rho )\) is strictly convex function on \({\overline{\varTheta }}\times {\overline{\mathcal{R}}}={\overline{\varTheta }} \times \varPi _{i\in {{\mathbb {I}}}}{\overline{\mathcal{R}}}_i\). For some neighborhood U of \((\theta ^*,\rho ^*)\) and some positive number \(\chi _1\),

$$\begin{aligned} {{\mathbb {Y}}}(\theta ,\rho )\le -\chi _1|(\theta ,\rho )-(\theta ^*,\rho ^*)|^2\qquad \big ((\theta ,\rho )\in U\big ) \end{aligned}$$

by the non-degeneracy of \(\varGamma (\theta ^*,\rho ^*)\). Moreover, \(\sup _{(\theta ,\rho )\in (\varTheta \times \mathcal{R})\setminus U}{{\mathbb {Y}}}(\theta ,\rho )<0\). In fact, if there was a point \((\theta ^+,\rho ^+)\not \in U\) such that \({{\mathbb {Y}}}(\theta ^+,\rho ^+)=0\), then at a point on the segment connecting \((\theta ^*,\rho ^*)\) and \((\theta ^+,\rho ^+)\), \(\varGamma (\theta ,\rho )\) would degenerate, and this contradicts [M3]. As a consequence, Condition [L2] is verified for \(\rho =2\) and some (deterministic) positive number \(\chi _0\) since the parameter space is bounded. Condition [L3] is now obvious.

5.4 Proof of Theorem 1

We have verified Conditions [L1]-[L4] in the present situation. Theorem 1 now follows from Theorems 3 and 5. \(\square \)

6 Quasi-likelihood analysis

This section recalls the quasi-likelihood analysis. Let \(\varTheta \) be a bounded open set in \({{\mathbb {R}}}^{\textsf {p}}\). Given a probability space \((\varOmega ,\mathcal{F},P)\), suppose that \({{\mathbb {H}}}_T:\varOmega \times {\overline{\varTheta }}\rightarrow {{\mathbb {R}}}\) is of class \(C^3\), that is, the mapping \(\varTheta \ni \theta \mapsto {{\mathbb {H}}}_T(\omega ,\theta )\in {{\mathbb {R}}}^{\textsf {p}}\) is continuously extended to \({\overline{\varTheta }}\) and of class \(C^3\) for every \(\omega \in \varOmega \), and the mapping \(\varOmega \ni \omega \mapsto {{\mathbb {H}}}_T(\omega ,\theta )\in {{\mathbb {R}}}^{\textsf {p}}\) is measurable for every \(\theta \in \varTheta \). Let \(\varGamma \) be a \({\textsf {p}}\times {\textsf {p}}\) random matrix.

Let \(\theta ^*\in \varTheta \). For a sequence \(a_T\in GL({\textsf {p}})\) satisfying \(\lim _{T\rightarrow \infty }|a_T|=0\), let

$$\begin{aligned} \varDelta _T = \partial _\theta {{\mathbb {H}}}_T(\theta ^*)a_T \quad \text {and}\quad \varGamma _T(\theta ) = -a_T^\star \partial _\theta ^2{{\mathbb {H}}}_T(\theta )a_T, \end{aligned}$$
(6.1)

where \(\star \) denotes the matrix transpose. We consider a random field

$$\begin{aligned} {{\mathbb {Y}}}_T(\theta ) = b_T^{-1}\big ({{\mathbb {H}}}_T(\theta )-{{\mathbb {H}}}_T(\theta ^*)\big ), \end{aligned}$$
(6.2)

which will be assumed to converge to a random field \({{\mathbb {Y}}}:\varOmega \times \varTheta \rightarrow {{\mathbb {R}}}\). Only for simplifying presentation, we will assume that \(a_T=b_T^{-1/2}I_{\textsf {p}}\) for diverging sequence \((b_T)_{T>0}\) of positive numbers, where \(I_{\textsf {p}}\) is the identity matrix. In what follows, we fix a positive number L.

We will give a simplified exposition of Yoshida (2011) on the polynomial type large deviation inequality. Let \(\alpha \), \(\beta _1\), \(\beta _2\), \(\rho \), \(\rho _1\) and \(\rho _2\) be numbers.

[L1:

] The numbers \(\alpha \), \(\beta _1\), \(\beta _2\), \(\rho \), \(\rho _1\) and \(\rho _2\) satisfy the following inequalities:

$$\begin{aligned}&0<\alpha<1,\quad 0<\beta _1<1/2,\quad 0<\rho _1<\min \{1,\alpha (1-\alpha )^{-1},2 \beta _1(1-\alpha )^{-1}\},\\&\alpha \rho <\rho _2,\quad \beta _2\ge 0\quad \text {and}\quad 1-2\beta _2-\rho _2>0. \end{aligned}$$

Let \(\beta =\alpha (1-\alpha )^{-1}\).

[L2:

] There is a positive random variable \(\chi _0\) such that

$$\begin{aligned} {{\mathbb {Y}}}(\theta )\>=\>{{\mathbb {Y}}}(\theta )-{{\mathbb {Y}}}(\theta ^*)\le & {} -\chi _0|\theta -\theta ^*|^\rho \end{aligned}$$

for all \(\theta \in \varTheta \).

[L3:

] There exists a \(C_L\) such that

$$\begin{aligned} P\big [\chi _0\le r^{-(\rho _2-\alpha \rho )}\big ]\le & {} \frac{C_L}{r^L}\quad (r>0) \end{aligned}$$

and

$$\begin{aligned} P\big [\lambda _{\text {min}}(\varGamma )<4r^{-\rho _1}\big ]\le & {} \frac{C_L}{r^L}\quad (r>0). \end{aligned}$$
[L4:

] (i) For \(M_1=L(1-\rho _1)^{-1}\), \(\displaystyle \sup _{T>0}E\big [|\varDelta _T|^{M_1}\big ] <\infty . \)

(ii):

For \(M_2=L(1-2\beta _2-\rho _2)^{-1}\),

$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (\sup _{h:|h|\ge b_T^{-\alpha /2}} b_T^{\frac{1}{2}-\beta _2}\big |{{\mathbb {Y}}}_T(\theta ^*+h)-{{\mathbb {Y}}}(\theta ^*+h) \big |\bigg )^{M_2}\bigg ]< & {} \infty . \end{aligned}$$
(iii):

For \(M_3=L(\beta -\rho _1)^{-1}\),

$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (b_T^{-1}\sup _{\theta \in \varTheta }\big | \partial _\theta ^3{{\mathbb {H}}}_T(\theta ) \big |\bigg )^{M_3}\bigg ]< & {} \infty . \end{aligned}$$
(iv):

For \(M_4=L\big (2\beta _1(1-\alpha )^{-1}-\rho _1\big )^{-1}\),

$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (b_T^{\beta _1} \big |\varGamma _T(\theta ^*)-\varGamma \big |\bigg )^{M_4}\bigg ]< & {} \infty . \end{aligned}$$

Let \({{\mathbb {U}}}_T=\{u\in {{\mathbb {R}}}^{\textsf {p}};\>\theta ^*+a_Tu\in \varTheta \}\) and \({{\mathbb {V}}}_T(r)=\{u\in {{\mathbb {U}}}_T;\>|u|\ge r\}\) for \(r>0\).

Theorem 2

(Yoshida (2011)) Suppose that Conditions [L1]-[L4] are satisfied. Then, there exists a constant C such that

$$\begin{aligned} P\bigg [\sup _{u\in {{\mathbb {V}}}_T(r)}{{\mathbb {Z}}}_T(u)\ge \exp \big (-2^{-1}r^{2-(\rho _1\vee \rho _2)}\big ) \bigg ]\le & {} \frac{C}{r^L} \end{aligned}$$

for all \(T>0\) and \(r>0\). Here, the supremum of the empty set should read \(-\infty \) by convention.

We comment some points. Parameters satisfying [L1] exist. Nondegeneracy conditions in [L3] are obvious in ergodic cases. In this paper, we will apply Theorem 2 under ergodicity of the stochastic system. Theorem 2 asserts a polynomial type large deviation inequality can be obtained once the boundedness of moments of some random variables is verified. Condition [L4] is easy to obtain because each variable is usually a simple additive functional. The polynomial type large deviation inequality in Theorem 2 enables us to easily apply the scheme by Ibragimov and Has’minskiĭ (1981) and Kutoyants (1984, 2012) to various dependence structures.

Let \(u\in {{\mathbb {R}}}^{\textsf {p}}\). Define \(r_T(u)\) \((u\in {{\mathbb {U}}}_T)\) by

$$\begin{aligned} {{\mathbb {Z}}}_T(u)= & {} \exp \bigg (\varDelta _T[u]-\frac{1}{2}\varGamma [u^{\otimes 2}]+r_T(u)\bigg ) \quad (u\in {{\mathbb {U}}}_T). \end{aligned}$$
(6.3)

It is said that \({{\mathbb {Z}}}_T\) is locally asymptotically quadratic (LAQ) at \(\theta ^*\) if \(r_T(u)\rightarrow ^p0\) as \(T\rightarrow \infty \) for every \(u\in {{\mathbb {R}}}^{\textsf {p}}\), and hence \(\log {{\mathbb {Z}}}_T(u)\) is asymptotically approximated by a random quadratic function of u.

We will confine our attention to a very standard case where \({{\mathbb {Z}}}_T\) is locally asymptotically mixed normal, though the general theory of the quasi-likelihood analysis is framed more generally.

Any measurable mapping \({\hat{\theta }}_T^M:\varOmega \rightarrow {\overline{\varTheta }}\) is called a quasi-maximum likelihood estimator (QMLE) for \({{\mathbb {H}}}_T\) if

$$\begin{aligned} {{\mathbb {H}}}_T({\hat{\theta }}_T^M)= & {} \max _{\theta \in {\overline{\varTheta }}}{{\mathbb {H}}}_T(\theta ). \end{aligned}$$

When \({{\mathbb {H}}}_T\) is continuous on the compact \({\overline{\varTheta }}\), such a measurable function always exists, which is ensured by the measurable selection theorem. Let \({\hat{u}}_T^M=a_T^{-1}({\hat{\theta }}_T^M-\theta ^*)\) for the QMLE \({\hat{\theta }}_T^M\).

Theorem 3

Let \(L>{\textsf {p}}>0\). Suppose that Conditions [L1]-[L4] are satisfied and that \((\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )\) as \(T\rightarrow \infty \), where \(\zeta \) is a \({\textsf {p}}\)-dimensional standard Gaussian random vector independent of \(\varGamma \). Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^M)\big ]\rightarrow & {} {{\mathbb {E}}}\big [f({\hat{u}})\big ]\quad (T\rightarrow \infty ) \end{aligned}$$

for \({\hat{u}}=\varGamma ^{-1/2}\zeta \) and for any \(f\in C({{\mathbb {R}}}^{\textsf {p}})\) satisfying \(\lim _{|u|\rightarrow \infty }|u|^{-p}|f(u)|<\infty \).

Proof

We will sketch the proof to convey the concepts of the quasi-likelihood analysis to the reader. See Yoshida (2011) for details. The space \({\hat{C}}({{\mathbb {R}}}^{\textsf {p}})\) is the linear space of all continuous functions \(f:{{\mathbb {R}}}^{\textsf {p}}\rightarrow {{\mathbb {R}}}\) satisfying \(\lim _{|u|\rightarrow \infty }f(u)=0\). The space \({\hat{C}}({{\mathbb {R}}}^{\textsf {p}})\) becomes a separable Banach space equipped with the supremum norm \(\Vert f\Vert _\infty =\sup _{u\in {{\mathbb {R}}}^{\textsf {p}}}|f(u)|\). Moreover, \({\hat{C}}({{\mathbb {R}}}^{\textsf {p}})\) is regarded as a measurable space with the Borel \(\sigma \)-field. Let

$$\begin{aligned} {{\mathbb {Z}}}(u)= & {} \exp \bigg (\varGamma ^{1/2}\zeta [u]-\frac{1}{2}\varGamma [u^{\otimes 2}]\bigg ) \end{aligned}$$
(6.4)

for \(u\in {{\mathbb {R}}}^{\textsf {p}}\).

The term \(r_T(u)\) admits the expression

$$\begin{aligned} r_T(u)= & {} \int _0^1(1-s)\big \{\varGamma [u^{\otimes 2}]-\varGamma _T(\theta ^*+sa_Tu) {[}u^{\otimes 2}]\big \}ds \end{aligned}$$
(6.5)

for u such that \(|u|\le b_T^{(1-\alpha )/2}\) and T such that \(B(\theta ^*,b_T^{-\alpha /2})\subset \varTheta \). In this situation, we can apply Taylor’s formula even though the whole \(\varTheta \) is not convex. Condition [L4] (iii) and the convergence of \(\varDelta _T\) ensures tightness of the random fields \(\big \{{{\mathbb {Z}}}_T|_{\overline{B(0,R)}}\big \}_{T>T_0}\) for every \(R>0\), where \(B(0,R)=\{u\in {{\mathbb {R}}}^{\textsf {p}}\}\) and \(T_0\) is a sufficiently large number depending on R. Combining this property with the polynomial type large deviation inequality given by Theorem 2, we obtain the convergence \({{\mathbb {Z}}}_T\rightarrow {{\mathbb {Z}}}\) in \({\hat{C}}({{\mathbb {R}}}^{\textsf {p}})\) for the random field \({{\mathbb {Z}}}_T\) extended as an element of \({\hat{C}}({{\mathbb {R}}}^{\textsf {p}})\) so that \(\sup _{{{\mathbb {R}}}^{\textsf {p}}\setminus {{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\le \sup _{u\in \partial {{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\). Consequently, \({\hat{u}}_T\rightarrow {\hat{u}}=\text {argmax}_{u\in {{\mathbb {R}}}^{\textsf {p}}}{{\mathbb {Z}}}(u)\). It is known that a measurable version of extension of \({{\mathbb {Z}}}_T\) exists.

A polynomial type large deviation, even weaker than the one in Theorem 2, serves to show \(L^q\)-boundedness of \(\{|{\hat{u}}_T|^q\}\) for \(L>q>p\). Then, the family \(\{{\hat{u}}_T\}\) is uniformly integrable, and hence we obtain the convergence of \(E[f({\hat{u}}_T)]\). \(\square \)

Remark 2

In Theorem 3, if \(\varDelta _T\rightarrow ^d\varGamma ^{1/2}\zeta \) \(\mathcal{F}\)-stably, then \((\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )\) and \({\hat{u}}^M_T\rightarrow {\hat{u}}\) \(\mathcal{F}\)-stably.

An advantage of the quasi-likelihood analysis is that the asymptotic behavior of the quasi-Bayesian estimator can be obtained as well as that of the quasi-maximum likelihood estimator and its moments convergence. The mapping

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _\varTheta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\varpi (\theta )\mathrm{d} \theta \bigg ]^{-1} \int _\varTheta \theta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\varpi (\theta )\mathrm{d}\theta \end{aligned}$$

is called a quasi-Bayesian estimator (QBE) with respect to the prior density \(\varpi \). The QBE \({\hat{\theta }}_T^B\) takes values in the convex-hull of \({\overline{\varTheta }}\). We will assume \(\varpi \) is continuous and \(0<\inf _{\theta \in \varTheta }\varpi (\theta )\le \sup _{\theta \in \varTheta } \varpi (\theta )<\infty \). We will give a concise exposition in the following among many possible ways. The reader is referred to Yoshida (2011) for further information. Recall that \({\textsf {p}}\) is the dimension of \(\varTheta \), and B(R) denotes the open ball of radius R centered at the origin. \(C(\overline{B(R)})\) is the space of all continuous functions on \(\overline{B(R)}\), and it is equipped with the supremum norm. Recall \({{\mathbb {V}}}_T(r)=\{u\in {{\mathbb {U}}}_T;\>|u|\ge r\}\). As before, \({\hat{u}}=\varGamma ^{-1/2}\zeta \) with a \({\textsf {p}}\)-dimensional standard Gaussian random vector \(\zeta \) independent of \(\varGamma \). Write \({\hat{u}}_T^B=a_T^{-1}({\hat{\theta }}_T^B-\theta ^*)\).

Theorem 4

Let \(p\ge 1\), \(L>p+1\), \(D>{\textsf {p}}+p\). Suppose that \((\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )\) as \(T\rightarrow \infty \), where \(\zeta \) is a \({\textsf {p}}\)-dimensional standard Gaussian random vector independent of \(\varGamma \). Moreover, suppose the following conditions are satisfied.

  1. (i)

    For every \(R>0\),

    $$\begin{aligned} {{\mathbb {Z}}}_T|_{\overline{B(R)}}&\rightarrow ^d&{{\mathbb {Z}}}|_{\overline{B(R)}} \quad \text {in }C(\overline{B(R)}) \end{aligned}$$
    (6.6)

    as \(T\rightarrow \infty \), where \({{\mathbb {Z}}}\) is given in (6.4).

  2. (ii)

    There exist positive constants \(T_0\), \(C_1\) and \(C_2\) such that

    $$\begin{aligned} P\bigg [\sup _{{{\mathbb {V}}}_T(r)}{{\mathbb {Z}}}_T\ge C_1r^{-D}\bigg ]\le & {} C_2r^{-L} \end{aligned}$$
    (6.7)

    for all \(T\ge T_0\) and \(r>0\).

  3. (iii)

    For some \(T_0>0\),

    $$\begin{aligned} \sup _{T\ge T_0}E\bigg [\bigg (\int _{{{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg )^{-1}\bigg ]< & {} \infty . \end{aligned}$$
    (6.8)

Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^B)\big ]\rightarrow & {} E\big [f({\hat{u}})\big ] \end{aligned}$$
(6.9)

as \(T\rightarrow \infty \) for any continuous function \(f:{{\mathbb {R}}}^{\textsf {p}}\rightarrow {{\mathbb {R}}}\) satisfying \(\sup _{u\in {{\mathbb {R}}}^{\textsf {p}}}\big \{(1+|u|)^{-p}|f(u)|\big \}<\infty \).

Proof

We will give a brief summary of the proof; see Yoshida (2011) for details. The variable \({\hat{u}}_T^B\) has the expression

$$\begin{aligned} {\hat{u}}_T^B= & {} \bigg [\int _{{{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\varpi (\theta ^*+a_Tu)\mathrm{d}u\bigg ]^{-1} \int _{{{\mathbb {U}}}_T} u{{\mathbb {Z}}}_T(u)\varpi (\theta ^*+a_Tu)\mathrm{d}u. \end{aligned}$$

By (6.7) and the properties of \(\varpi \), we can approximate \({\hat{u}}_T^B\) by

$$\begin{aligned} {\tilde{u}}_T= & {} \bigg [\int _{B(R)}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg ]^{-1} \int _{B(R)} u{{\mathbb {Z}}}_T(u)\mathrm{d}u \end{aligned}$$

for paying small error when R is large. By (6.6),

$$\begin{aligned} {\tilde{u}}_T&\rightarrow ^d&\bigg [\int _{B(R)}{{\mathbb {Z}}}(u)\mathrm{d}u\bigg ]^{-1} \int _{B(R)} u{{\mathbb {Z}}}(u)\mathrm{d}u=:{\hat{u}}(R). \end{aligned}$$

The random field \({{\mathbb {Z}}}\) inherits a tail estimate from (6.7), and hence \({\hat{u}}(R)\) is approximated by

$$\begin{aligned} \bigg [\int _{{{\mathbb {R}}}^{\textsf {p}}}{{\mathbb {Z}}}(u)\mathrm{d}u\bigg ]^{-1}\int _{{{\mathbb {R}}}^{\textsf {p}}} u{{\mathbb {Z}}}(u)\mathrm{d}u \>=\>\varGamma ^{-1/2}\zeta \>=\>{\hat{u}}. \end{aligned}$$

Combining these estimates, we can conclude \({\hat{u}}_T^B\rightarrow ^d{\hat{u}}\) as \(T\rightarrow \infty \). Convergence of the expectation is a consequence of uniform integrability of \(|{\hat{u}}_T^B|^p\) ensured by (6.7). \(\square \)

Remark 3

(a) It is possible to relax the conditions of Theorem 4 to only ensure the convergence \({\hat{u}}^B_T\rightarrow {\hat{u}}\). (b) In Theorem 4, if \(\varDelta _T\rightarrow ^d\varGamma ^{1/2}\zeta \) \(\mathcal{F}\)-stably, then \({\hat{u}}^B_T\rightarrow {\hat{u}}\) \(\mathcal{F}\)-stably. (c) Usually, the condition (iii) of Theorem 4 is easily verified; See Lemma 2 of Yoshida (2011). (d) We refer the reader to Yoshida (2021) for a simplified quasi-likelihood analysis for a locally asymptotically quadratic random field.

The following result follows from Theorem 4.

Theorem 5

Let \(p>{\textsf {p}}\) and

$$\begin{aligned} L>\max \bigg \{p+1,{\textsf {p}}(\beta -\rho _1),{\textsf {p}}(2\beta _1(1-\alpha )^{-1} -\rho _1)\bigg \}. \end{aligned}$$

Suppose that Conditions [L1]-[L4] are satisfied and that \(E[|\varGamma |^p]<\infty \). \((\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )\) as \(T\rightarrow \infty \), where \(\zeta \) is a \({\textsf {p}}\)-dimensional standard Gaussian random vector independent of \(\varGamma \). Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^B)\big ]\rightarrow & {} {{\mathbb {E}}}\big [f({\hat{u}})\big ]\quad (T\rightarrow \infty ) \end{aligned}$$

for \({\hat{u}}=\varGamma ^{-1/2}\zeta \) and for any \(f\in C({{\mathbb {R}}}^{\textsf {p}})\) satisfying \(\lim _{|u|\rightarrow \infty }|u|^{-p}|f(u)|<\infty \).

Proof

The convergence (6.6) holds, as shown in the proof of Theorem 3. The polynomial type large deviation inequality (6.7) is a consequence of Theorem 2; the number D is arbitrary. Fix \(\delta >0\). Then, there exists \(T_0>0\) such that \(B(\delta )\subset \varTheta \). In particular, \(r_T(u)\) admits the representation (6.5) for all \(u\in B(\delta )\). Since \(M_3=L(\beta -\rho _1)^{-1}>{\textsf {p}}\), \(M_4=L(2\beta _1(1-\alpha )^{-1}-\rho _1)^{-1}>{\textsf {p}}\) and \(p>{\textsf {p}}\), we have \(p':=\min \{M_3,M_4,p\}>{\textsf {p}}\) and

$$\begin{aligned} E[|r_T(u)|^{p'}]\le & {} C_0|u|^{p'}\quad (u\in B(\delta )) \end{aligned}$$

for some constant \(C_0\). Then Lemma 2 of Yoshida (2011) gives the estimate

$$\begin{aligned} E\bigg [\bigg (\int _{B(\delta )}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg )^{-1}\bigg ]\le & {} C_1 \end{aligned}$$

by a constant \(C_1\) depending on \((p',{\textsf {p}},\delta ,C_0)\) and the supremums appearing in [L4](i),(iii),(iv), but \(C_1\) is independent of \(T\ge T_0\). Therefore (6.8) holds true. Thus, we can apply Theorem 4 to conclude the proof. \(\square \)

Table 3 List of stocks investigated in this paper. Sample consists of the whole year 2015, representing roughly 230 trading days for all stocks except LAGA.PA and PEUP.PA which are missing roughly 70 trading days

7 List of stocks

Table 3 lists all the stocks investigated in the paper. For each stock, the total number of days available in the sample is given. Note that for lack of usage time allotment on the computational resources used for this paper, some trading days for few very liquid stocks were not used for some of the marked ratio models tested in Sect. 4.4. In this case, only the trading days where all models have been computed have been used. This is the last column of the table.

8 QAIC and QBIC selection: detailed results

See Tables 4, 5, 6.

Table 4 Side determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks
Table 5 Bid aggressiveness determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks.
Table 6 Ask aggressiveness determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks