Marked point processes and intensity ratios for limit order book modeling

Muni Toke, Ioane; Yoshida, Nakahiro

doi:10.1007/s42081-021-00137-9

Marked point processes and intensity ratios for limit order book modeling

Original Paper
Open access
Published: 01 January 2022

Volume 5, pages 1–39, (2022)
Cite this article

Download PDF

You have full access to this open access article

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Marked point processes and intensity ratios for limit order book modeling

Download PDF

Ioane Muni Toke¹ &
Nakahiro Yoshida²

2451 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This paper extends the analysis of Muni Toke and Yoshida (2020) to the case of marked point processes. We consider multiple marked point processes with intensities defined by three multiplicative components, namely a common baseline intensity, a state-dependent component specific to each process, and a state-dependent component specific to each mark within each process. We show that for specific mark distributions, this model is a combination of the ratio models defined in Muni Toke and Yoshida (2020). We prove convergence results for the quasi-maximum and quasi-Bayesian likelihood estimators of this model and provide numerical illustrations of the asymptotic variances. We use these ratio processes to model transactions occurring in a limit order book. Model flexibility allows us to investigate both state-dependency (emphasizing the role of imbalance and spread as significant signals) and clustering. Calibration, model selection and prediction results are reported for high-frequency trading data on multiple stocks traded on Euronext Paris. We show that the marked ratio model outperforms other intensity-based methods (such as “pure” Hawkes-based methods) in predicting the sign and aggressiveness of market orders on financial markets.

Notes on large deviations for branching processes indexed by a Poisson process

Article 01 January 2020

Nondecreasing Continuous Semi-Markov Processes: Asymptotics and Asymmetry

Article 09 December 2014

Orientation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The limit order book is the central structure that aggregates buy and sell intentions of all the market participants on a given exchange. This structure typically evolves at a very high frequency: on the Paris Euronext stock exchange, the limit order book of a common stock is modified several hundreds of thousand times per day. Among these changes, thousands or tens of thousand events account for a transaction between two participants. The rest of the events indicate either the intention to buy/sell at a limit price lower/higher than available, or the cancellation of such intentions (Abergel et al. 2016).

Empirical observation of high-frequency events on a limit order book may reveal irregular interval times (durations), clustering, intraday seasonality, etc. (Chakraborti et al. 2011). Stochastic point processes are, thus, natural candidates for the modeling of such systems and their time series (Hautsch 2011). In particular, Hawkes processes have been successfully suggested for the modeling of limit order book events (Bowsher 2007; Large 2007; Bacry et al. 2012, 2013; Muni Toke and Pomponio 2012; Lallouache and Challet 2016; Lu and Abergel 2018).

One drawback of such models is the difficulty to account for high intraday variability. Another drawback of such models is the lack of state-dependency: the observed state of the limit order book does not influence the dynamics of the events. One may try to include state-dependency by specifying a fully parametric model (Muni Toke and Yoshida 2017), which is a cumbersome solution. Another solution is to extend the Hawkes framework with marks (Rambaldi et al. 2017) or with state-dependent kernels (Morariu-Patrichi and Pakkanen 2018). Muni Toke and Yoshida (2020) has shown that state-dependency can be efficiently tackled by a multiplicative model with two components: a shared baseline intensity and a state-dependent process-specific component. An intensity ratio model can then allow for efficient estimation of state-dependency. Several microstructure examples are worked out, including a ratio model for the prediction of the next trade sign^{Footnote 1}.

In this work, we extend the framework of Muni Toke and Yoshida (2020) to some cases of marked point processes, by adding a third term to the multiplicative definition of the intensity, which accounts for some mark distribution. We use this extension to deepen our investigation of limit order book data. In financial microstructure, one of the characteristics of an order sent to a financial exchange is its aggressiveness (Biais et al. 1995; Harris and Hasbrouck 1996). We will say here that an order is aggressive if it moves the price. A ratio model with marks can, thus, be used to analyze both the side (bid or ask) and aggressiveness of market orders.

The rest of the paper is organized as follows. In Sect. 2, we show that some marked models can be viewed as combinations of intensity ratios of non-marked processes. Section 3 defines the quasi-likelihood maximum and Bayesian estimators and proceeds to the analysis of the estimation. Theorem 1 states the convergence result and a numerical illustration follows. We then turn to the main financial application in Sect. 4, and show how the two-step ratio model can efficiently predict (in a theoretical setting) the sign and aggressiveness of the next trade. Finally, the full proof of Theorem 1 is given in Sect. 5, and for completeness elements on quasi-likelihood analysis are recalled in Sect. 6.

2 Marked process models as two-step ratio models

Let ${{\mathbb {I}}}=\{0,1,...,{\bar{i}}\}$. We consider certain marked point processes $N^i=(N^i_t)_{t\in {{\mathbb {R}}}_+}$, $i\in {{\mathbb {I}}}$ and ${{\mathbb {R}}}_+=[0,\infty )$. For each $i\in {{\mathbb {I}}}$, let ${\bar{k}}_i$ be a positive integer, and let ${{\mathbb {K}}}_i=\{0,1,...,{\bar{k}}_i\}$ be a space of marks for the process $N^i$. We denote by $N^{i,k_i}=(N^{i,k_i}_t)_{t\in {{\mathbb {R}}}_+}$ the process counting events of type i with mark $k_i\in {{\mathbb {K}}}_i$. We have obviously $N^i=\sum _{k_i\in {{\mathbb {K}}}_i}N^{i,k_i}$. Let ${\check{{{\mathbb {I}}}}}=\cup _{i\in {{\mathbb {I}}}}\big (\{i\}\times {{\mathbb {K}}}_{i}\big )$. We assume that the intensity of the process $N^i$ with mark $k_i$, i.e., the intensity of $N^{i,k_i}$, is given by

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^{i})= & {} \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^i_jX_j(t)\bigg )\>p^{k_i}_i(t,\varrho ^i) \end{aligned}$$

at time t for $(i,k_i)\in {\check{{{\mathbb {I}}}}}$, where $\vartheta ^i=(\vartheta ^i_j)_{j\in {{\mathbb {J}}}}$ ($i\in {{\mathbb {I}}}$) and $\varrho ^i$ ($i\in {{\mathbb {I}}}$) are unknown parameters. More precisely, given a probability space $(\varOmega ,\mathcal{F},P)$ equipped with a right-continuous filtration ${{\mathbb {F}}}=(\mathcal{F}_t)_{t\in {{\mathbb {R}}}_+}$, $\lambda _0=(\lambda _0(t))_{t\in {{\mathbb {R}}}_+}$ is a non-negative predictable process, $X_j=(X_j(t))_{t\in {{\mathbb {R}}}_+}$ is a predictable process for each $j\in {{\mathbb {J}}}=\{1,...,{\bar{j}}\}$, and $p_i^{k_i}(t,\rho ^i)$ is a non-negative predictable process for each $(i,k_i)\in {\check{{{\mathbb {I}}}}}$. Later, we will put a condition so that the mapping $t\mapsto \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^i)$ is locally integrable with respect to $\mathrm{d}t$, and we assume that $N^{i,k_i}_0=0$, and for each $(i,k_i)\in {\check{{{\mathbb {I}}}}}$, the process

$$\begin{aligned} {\tilde{N}}^{i,k_i}_t= & {} N^{i,k_i}_t-\int _0^t\lambda ^{i,k_i}(s,(\vartheta ^i)^*,(\varrho ^i)^*)\mathrm{d}s \end{aligned}$$

is a local martingale for a value $\big ((\vartheta ^i)^*,(\varrho ^i)^*\big )$ of the parameter $\big (\vartheta ^i,\varrho ^i\big )$. We assume that the counting processes $N^{i,k_i}$ ($i\in {{\mathbb {I}}};\>k_i\in {{\mathbb {K}}}_i$) have no common jumps.

In what follows, we consider the processes $p^{k_i}_i(t,\varrho ^i)$ such that

$$\begin{aligned} \sum _{k_i\in {{\mathbb {K}}}_i}p^{k_i}_i(t,\varrho ^i)= & {} 1 \end{aligned}$$

(2.1)

for $i\in {{\mathbb {I}}}$. Then, the ${\bar{k}}_i$-dimensional process $(p^{k_i}_i(t,\varrho ^i))_{k_i\in {{\mathbb {K}}}_i}$ gives the conditional distribution of the event $k_i$ when the event i occurred. Under (2.1), the intensity process of $N^i$ becomes

$$\begin{aligned} \lambda ^i(t,\vartheta ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^i) \>=\>\lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jX_j(t)\bigg ). \end{aligned}$$

(2.2)

The process $\lambda _0$ is called a baseline intensity, whose structure will not be specified, in other words, $\lambda _0$ will be treated as a nuisance parameter, differently from the use of Cox regression as in Muni Toke and Yoshida (2017). The baseline intensity may represent the global market activity in finance, for example, and its irregular change may limit the reliability of estimation procedures and predictions for any model fitted to it. Muni Toke and Yoshida (2020) took an approach with an unstructured baseline intensity process and showed advantages of such modeling. Statistically, the process ${{\mathbb {X}}}(t)=(X_j(t))_{j\in {{\mathbb {J}}}}$ is an observable covariate process. Since the effect of these covariate processes to the amplitude of $\lambda ^i(t,\vartheta ^i)$ is contaminated by the unobservable and structurally unknown baseline intensity, a more interesting measure of dependency of $\lambda ^i(t,\vartheta ^i)$ to ${{\mathbb {X}}}(t)$ is the ratio

$$\begin{aligned} \lambda ^i(t,\vartheta ^i)/\sum _{i'\in {{\mathbb {I}}}}\lambda ^{i'}(t,\vartheta ^{i'}) \end{aligned}$$

for $i\in {{\mathbb {I}}}$. Thus, we introduce the difference parameters $\theta ^i_j=\vartheta ^i_j-\vartheta ^0_j$ ($i\in {{\mathbb {I}}},\>j\in {{\mathbb {J}}}$), ($\theta ^0_j=0$ in particular) and consider the ratios

$$\begin{aligned} r^i(t,\theta )= & {} \frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jX_j(t)\bigg )}{\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^{i'}_jX_j(t)\bigg )} \>=\>\frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^i_jX_j(t)\bigg )}{1+\sum _{i'\in {{\mathbb {I}}}_0}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^{i'}_jX_j(t)\bigg )} \end{aligned}$$

(2.3)

for $i\in {{\mathbb {I}}}$, where $\theta =(\theta ^i_j)_{i\in {{\mathbb {I}}}_0,j\in {{\mathbb {J}}}}$ with ${{\mathbb {I}}}_0={{\mathbb {I}}}\setminus \{0\}=\{1,...,{\bar{i}}\}$.

In this paper, we further assume that the factor $p^{k_i}_i(t,\varrho ^i)$ is given by

$$\begin{aligned} p^{k_i}_i(t,\varrho ^i)= & {} \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \varrho ^{i,k_i'}_{j_i}Y^i_{j_i}(t)\bigg )} \end{aligned}$$

for $(i,k_i)\in {\check{{{\mathbb {I}}}}}$, ${{\mathbb {J}}}_i=\{1,...,{\bar{j}}_i\}$. Obviously, $p^{k_i}_i(t,\varrho ^i)=q^{k_i}_i(t,\rho ^i)$ defined by

$$\begin{aligned} q^{k_i}_i(t,\rho ^i)= & {} \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\rho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{1+\sum _{k_i'\in {{\mathbb {K}}}_{i,0}}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \rho ^{i,k_i'}_{j_i}Y^i_{j_i}(t)\bigg )} \end{aligned}$$

(2.4)

for $(i,k_i)\in {\check{{{\mathbb {I}}}}}$, where $\rho ^{i,k_i}_{j_i}=\varrho ^{i,k_i}_{j_i}-\varrho ^{i,0}_{j_i}$ $(k_i\in {{\mathbb {K}}}_i,\>j\in {{\mathbb {J}}}_i,\>i\in {{\mathbb {I}}})$, $\rho ^{i,0}_{j_i}=0$ in particular, and $\rho ^i=(\rho ^{i,k_i}_{j_i})_{k_i\in {{\mathbb {K}}}_{i,0},j_i\in {{\mathbb {J}}}_i}$ ($i\in {{\mathbb {I}}}$) with ${{\mathbb {K}}}_{i,0}={{\mathbb {K}}}_i\setminus \{0\}=\{1,...,{\bar{k}}_i\}$. The predictable processes $(Y^i_{j_i}(t))_{t\in {{\mathbb {R}}}_+}$ ($i\in {{\mathbb {I}}},\>j_i\in {{\mathbb {J}}}_i$) are observable covariate processes, ${{\mathbb {J}}}_i$ being a finite index set. This is a multinomial logistic regression model.

Let $\varTheta $ be a bounded open convex set in ${{\mathbb {R}}}^{\textsf {p}}$ with ${\textsf {p}}={\bar{i}}\>{\bar{j}}$. For each $i\in {{\mathbb {I}}}$, $\mathcal{R}_i$ denotes a bounded open convex set in ${{\mathbb {R}}}^{{\textsf {p}}_i}$ with ${\textsf {p}}_i={\bar{j}}_i\>{\bar{k}}_i$. Write $\rho =(\rho ^i)_{i\in {{\mathbb {I}}}}$. Let $\mathcal{R}=\varPi _{i\in {{\mathbb {I}}}}\mathcal{R}_i$. We will consider ${\overline{\varTheta }}\times {\overline{\mathcal{R}}}$ as the parameter space of $(\theta ,\rho )$.

Remark 1

The marked ratio model

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^i,\varrho ^{i})= & {} \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^i_jX_j(t)\bigg )\> \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}Y^i_{j_i}(t)\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i'}_{j_i} Y^i_{j_i}(t)\bigg )} \end{aligned}$$

is in general not equivalent to a non-marked ratio model in larger dimension, in which we would write the intensity of the counting process of events of type $i\in {\mathbb {I}}$ with mark $k_i\in {\mathbb {K}}_i$ as

$$\begin{aligned} \lambda ^{i,k_i}(t,\vartheta ^{i,k_i})= & {} {\tilde{\lambda }}_0(t)\exp \bigg (\sum _{j\in {\tilde{{{\mathbb {J}}}}}} \vartheta ^{i,k_i}_jZ_j(t)\bigg ) \end{aligned}$$

for some covariate processes $Z_j, j\in {\tilde{{{\mathbb {J}}}}}$. Equivalence of the models would require these expressions to coincide for some sets of covariates and parameters. However, if $Z_j(t)=0$ for all $j\in {\tilde{{{\mathbb {J}}}}}$, then necessarily $X_j(t)=0$ for all $j\in {{\mathbb {J}}}$ and $Y^i_{j_i}(t)=0$ for all $i\in {\mathbb {I}}$ and $j_i\in {\mathbb {J}}_i$. This in turn implies $ \frac{1}{|{\mathbb {K}}_i|} = \frac{{\tilde{\lambda }}_0(t)}{\lambda _0(t)}$ for all $i\in {\mathbb {I}}$, which is generally not true. In Sect. 4.5, a non-marked ratio model is used as a benchmark to assess the performances of the marked ratio model. Prediction performances are indeed shown to be different.

3 Quasi-likelihood estimation of two-step ratio model

3.1 Quasi-maximum likelihood estimator and quasi-Bayesian estimator

The two step marked ratio model consists of the two kinds of ratio models (2.3) and (2.4). Estimation of this model can be carried out with multiple successive ratio models.

In the first step, we consider the parameter $\theta =(\theta ^i_j)_{i\in {{\mathbb {I}}}_0,j\in {{\mathbb {J}}}}$ and the ratios (2.3) for $i\in {{\mathbb {I}}}$. The quasi-log-likelihood based on observations on [0, T] for this ratio model is

$$\begin{aligned} {{\mathbb {H}}}_T(\theta )= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T\log r^i(t,\theta )\>\mathrm{d}N^i_t. \end{aligned}$$

(3.1)

This comes from the multinomial logistic regression. A quasi-maximum likelihood estimator (QMLE) for $\theta $ is a measurable mapping ${\hat{\theta }}_T^M:\varOmega \rightarrow {\overline{\varTheta }}$ satisfying

$$\begin{aligned} {{\mathbb {H}}}_T({\hat{\theta }}_T^M)= & {} \max _{\theta \in {\overline{\varTheta }}}{{\mathbb {H}}}_T(\theta ) \end{aligned}$$

for all $\omega \in \varOmega $.^{Footnote 2}

In the second step, we consider the ratios (2.4) and the associated quasi-log-likelihood

$$\begin{aligned} {{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log q^{k_i}_i(t,\rho ^i)\>\mathrm{d}N^{i,k_i}_t \end{aligned}$$

(3.2)

for $i\in {{\mathbb {I}}}$. Then, a measurable mapping ${\hat{\rho }}^{i,M}_T:\varOmega \rightarrow {\overline{\mathcal{R}}}_i$ is called a quasi-maximum likelihood estimator (QMLE) for $\rho ^i$ if

$$\begin{aligned} {{\mathbb {H}}}_T^{(i)}({\hat{\rho }}^{i,M}_T)= & {} \max _{\rho ^i\in {\overline{\mathcal{R}}}_i}{{\mathbb {H}}}_T^{(i)}(\rho ^i). \end{aligned}$$

It is possible to pool these estimating functions by the single estimating function

$$\begin{aligned} {{\mathbb {H}}}_T(\theta ,\rho )= & {} {{\mathbb {H}}}_T(\theta ) + \sum _{i\in {{\mathbb {I}}}}{{\mathbb {H}}}_T^{(i)}(\rho ^i). \end{aligned}$$

(3.3)

In other words,

$$\begin{aligned} {{\mathbb {H}}}_T(\theta ,\rho )= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\>\mathrm{d}N^{i,k_i}_t. \end{aligned}$$

(3.4)

The collection of QMLEs $\big ({\hat{\theta }}_T^M,({\hat{\rho }}_T^{i,M})_{i\in {{\mathbb {I}}}}\big )$ is a QMLE for ${{\mathbb {H}}}_T(\theta ,\rho )$. Use of ${{\mathbb {H}}}_T(\theta ,\rho )$ is convenient when we consider asymptotic distribution of the estimators ${\hat{\theta }}_T^M$ and ${\hat{\rho }}_T^{i,M}$ ($i\in {{\mathbb {I}}}$) jointly.

The quasi-Bayesian estimator (QBE) $\big ({\hat{\theta }}_T^B,({\hat{\rho }}_T^{i,B})_{i\in {{\mathbb {I}}}}\big )$ is defined by

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _{\varTheta \times \mathcal{R}}\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \bigg ]^{-1} \nonumber \\&\times \int _{\varTheta \times \mathcal{R}}\theta \exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \end{aligned}$$

(3.5)

and

$$\begin{aligned} {\hat{\rho }}^{i,B}_T= & {} \bigg [\int _{\varTheta \times \mathcal{R}}\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \bigg ]^{-1} \nonumber \\&\times \int _{\varTheta \times \mathcal{R}}\rho ^i\exp \big ({{\mathbb {H}}}_T(\theta ,\rho )\big )\> \varpi (\theta ,\rho )\>\mathrm{d}\theta \mathrm{d}\rho \end{aligned}$$

(3.6)

for a prior probability density $\varpi (\theta ,\rho )$ on $\varTheta \times \mathcal{R}$. We assume that $\varpi :\varTheta \times \mathcal{R}\rightarrow {{\mathbb {R}}}_+$ is continuous and

$$\begin{aligned} 0<\inf _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\varpi (\theta ,\rho ) \le \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\varpi (\theta ,\rho )<\infty . \end{aligned}$$

(3.7)

Since ${{\mathbb {H}}}_T(\theta )$ and ${{\mathbb {H}}}_T^{(i)}(\rho ^i)$ have no common parameters, the maximization of ${{\mathbb {H}}}_T(\theta ,\rho )$ with respect to the parameters $\theta $ and $\rho ^i$ $(i\in {{\mathbb {I}}})$ can be carried out separately. However, these components are not always individually treated for the QBE. If $\varpi (\theta ,\rho )$ is a product of prior densities as $\varpi (\theta ,\rho )=\varpi '(\theta )\varPi _{i\in {{\mathbb {I}}}}\varpi ^i(\rho ^i)$, then the each integral in (3.5) and (3.6) is simplified and we can compute ${\hat{\theta }}_T^B$ and ${\hat{\rho }}_T^{i,B}$ ($i\in {{\mathbb {I}}}$) separately:

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _{\varTheta }\exp \big ({{\mathbb {H}}}_T(\theta )\big )\>\varpi '(\theta )\>d \theta \bigg ]^{-1} \int _{\varTheta }\theta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\>\varpi '(\theta )\>d \theta \end{aligned}$$

and

$$\begin{aligned} {\hat{\rho }}^{i,B}_T= & {} \bigg [\int _{\mathcal{R}_i}\exp \big ({{\mathbb {H}}}_T^{(i)}(\rho ^i)\big )\>\varpi ^i(\rho ^i)\>d \rho ^i\bigg ]^{-1} \int _{\mathcal{R}_i}\rho ^i\exp \big ({{\mathbb {H}}}_T^{(i)}(\rho ^i)\big )\>\varpi ^i(\rho ^i)\>d \rho ^i \end{aligned}$$

for $i\in {{\mathbb {I}}}$.

3.2 Quasi-likelihood analysis

Let ${{\mathbb {X}}}(t)=(X_j(t))_{j\in {{\mathbb {J}}}}$ and let ${{\mathbb {Y}}}^i(t)=\big (Y^i_{j_i}(t)\big )_{j_i\in {{\mathbb {J}}}_i}$ for $i\in {{\mathbb {I}}}$. We consider the following conditions.

[M1]
The process $\big (\lambda _0(t), {{\mathbb {X}}}(t),{{\mathbb {Y}}}(t)\big )$ is a stationary process and the random variables $\lambda _0(0)$, $\exp (|X_j(0)|)$ and $\exp (|Y^i_{j_i}(0)|)$ are in $L^{\infty \text {--}}=\cap _{p>1}L^p$ for $j\in {{\mathbb {J}}}$, $j_i\in {{\mathbb {J}}}_i$ and $i\in {{\mathbb {I}}}$.

Condition [M1] is not restrictive since the covariates can often be regarded as bounded in applications.

The alpha mixing coefficient $\alpha (h)$ is defined by

$$\begin{aligned} \alpha (h)= & {} \sup _{t\in {{\mathbb {R}}}_+}\sup _{\genfrac{}{}{0.0pt}{}{A\in \mathcal{B}_{[0,t]}}{B\in \mathcal{B}_{[t+h,\infty )}}} \big |P[A\cap B]-P[A]P[B]\big |, \end{aligned}$$

where for $I\subset {{\mathbb {R}}}_+$, $\mathcal{B}_I$ denotes the $\sigma $-field generated by $\big (\lambda _0(t),(X_j(t))_{j\in {{\mathbb {J}}}}, (Y^{i,k_i}_{j_i}(t))_{i\in {{\mathbb {I}}}, j_i\in {{\mathbb {J}}}_i,k_i\in {{\mathbb {K}}}_i}\big )_{t\in I}$.

[M2]
The alpha mixing coefficient $\alpha (h)$ is rapidly decreasing in that $\alpha (h)h^L\rightarrow 0$ as $h\rightarrow \infty $ for every $L>0$.

In the two-step ratios model, the category $(i,k_i)$ is selected with twofold multinomial distributions of sample size equal to 1. First the class $i\in {{\mathbb {I}}}$ is selected when $\xi _i=1$ for some random variable

$$\begin{aligned} \xi =(\xi _0,...,\xi _{{\bar{i}}})\sim \text {Multinomial}(1;\pi _0,...., \pi _{{\bar{i}}}). \end{aligned}$$

If $\xi _i=1$ for a class $i\in {{\mathbb {I}}}$, then the class $k_i\in {{\mathbb {K}}}_i$ is chosen as $k_i=k$ when $\eta ^i_k=1$ for some independent random variable

$$\begin{aligned} \eta ^i=(\eta ^i_0,...,\eta ^i_{{\bar{k}}_i})\sim \text {Multinomial} (1;\pi _0',...,\pi '_{{\bar{k}}_i}). \end{aligned}$$

Denote by $\mathsf{V}(x,\theta )$ the variance matrix of the $(1+{\overline{i}})$-dimensional multinomial distribution $\text {Multinomial}(1;\pi _0,\pi _1,...,\pi _{{\overline{i}}})$ with $\pi _i={\dot{r}}^i(x,\theta )$, $i\in {\mathbb {I}}$, where

$$\begin{aligned} {\dot{r}}^i(x,\theta )= & {} \frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^i_jx_j\bigg )}{\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\vartheta ^{i'}_jx_j\bigg )} \>=\>\frac{\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^i_jx_j\bigg )}{1+\sum _{i'\in {{\mathbb {I}}}_0}\exp \bigg (\sum _{j\in {{\mathbb {J}}}}\theta ^{i'}_jx_j\bigg )}, \quad x=(x_j)_{j\in {{\mathbb {J}}}}. \end{aligned}$$

Denote by $\mathsf{V}^i(y^i,\rho ^i)$ the variance matrix of the $(1+{\overline{k}}_i)$-dimensional multinomial distribution $\text {M{ultinomial}}(1;\pi _0',\pi _1',...,\pi _{{\overline{k}}_i}')$ with $\pi _{k_i}'={\dot{q}}_i^{k_i}(y^i,\rho ^i)$, $k_i\in {{\mathbb {K}}}_i$, where

$$\begin{aligned} {\dot{q}}_i^{k_i}(y^i,\rho ^i)&= \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i}_{j_i}y^i_{j_i}\bigg )}{\sum _{k_i'\in {{\mathbb {K}}}_i}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\varrho ^{i,k_i'}_{j_i} y^i_{j_i}\bigg )} \\&= \frac{\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i}\rho ^{i,k_i}_{j_i}y^i_{j_i}\bigg )}{1+\sum _{k_i'\in {{\mathbb {K}}}_{i,0}}\exp \bigg (\sum _{j_i\in {{\mathbb {J}}}_i} \rho ^{i,k_i'}_{j_i}y^i_{j_i}\bigg )}, \quad y^i=(y^i_{j_i})\in {{\mathbb {R}}}^{{\bar{j}}_i}\quad (i\in {{\mathbb {I}}}). \end{aligned}$$

Let us introduce some notations used in the following analysis. For a tensor $\mathsf{T}=(\mathsf{T}_{i_1,...,i_k})_{i_1,...,i_k}$, we write

$$\begin{aligned} \mathsf{T}[u_1,...,u_k] = \mathsf{T}[u_1\otimes \cdots \otimes u_k] = \sum _{i_1,...,i_k}\mathsf{T}_{i_1,...,i_k} u_1^{i_1}\cdots u_k^{i_k} \end{aligned}$$

(3.8)

for $u_1=(u_1^{i_1})_{i_1}$,..., $u_k=(u_k^{i_k})_{i_k}$. Brackets $[\ ,..., \ ]$ stand for a multilinear mapping. We denote by $u^{\otimes r}=u\otimes \cdots \otimes u$ the r times tensor product of u.

Denote by $\partial _{(\theta ,\rho )}$ the differential operator with respect to $(\theta ,\rho )$. Let

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} -T^{-1}\partial _{(\theta ,\rho )}^2{{\mathbb {H}}}_T(\theta ,\rho ) \end{aligned}$$

and let $\varGamma _T=\varGamma _T(\theta ^*,\rho ^*)$. Then, as detailed in Section A.2,

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} \text {diag}\big [ \varGamma _T(\theta ),\varGamma _T^1(\rho ^1),...,\varGamma _T^{{\bar{i}}} (\rho ^{{\bar{i}}}) \big ], \end{aligned}$$

where

$$\begin{aligned} \varGamma _T(\theta ) [u^{\otimes 2}] = \frac{1}{T}\int _0^T \bigg (\mathsf{V}_0({\mathbb {X}}(t),\theta )\otimes {\mathbb {X}}(t)^{\otimes 2}\bigg ) [u^{\otimes 2}]\sum _{i\in {\mathbb {I}}}\mathrm{d}N^i_t \quad (u\in {{\mathbb {R}}}^{{\textsf {p}}}) \end{aligned}$$

(3.9)

with $\mathsf{V}_0(x,\theta )=(\mathsf{V}(x,\theta )_{i,i'})_{i,i'\in {\mathbb {I}}_0}$, and

$$\begin{aligned} \varGamma ^i_T(\rho ^i) [(u^i)^{\otimes 2}] = \frac{1}{T}\int _0^T \bigg (\mathsf{V}^i_0({{\mathbb {Y}}}^i(t),\rho ^i)\otimes {{\mathbb {Y}}}^i(t)^{\otimes 2}\bigg ) [(u^i)^{\otimes 2}]\mathrm{d}N^i_t \quad (u^i\in {{\mathbb {R}}}^{{\textsf {p}}_i}) \end{aligned}$$

with $\mathsf{V}^i_0(y^i,\rho ^i)=(\mathsf{V}^i(y^i,\rho ^i)_{k_i,k_i'})_{k_i,k_i'\in {{\mathbb {K}}}_{i,0}}$.

Let

$$\begin{aligned} \varLambda (w,x) = w\sum _{i\in {\mathbb {I}}}\exp \big (x\big [\vartheta ^{*i}\big ]\big ) \end{aligned}$$

(3.10)

for $w\in {\mathbb {R}}_+$ and $x\in {\mathbb {R}}^{{\overline{j}}}$.

We have

$$\begin{aligned} \mathsf{V}(x,\theta )_{i,i'}= & {} 1_{\{i=i'\}}{\dot{r}}^i(x,\theta )-{\dot{r}}^i(x,\theta ) {\dot{r}}^{i'}(x,\theta ). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathsf{V}({{\mathbb {X}}}(t),\theta )_{i,i'}= & {} 1_{\{i=i'\}}r^i(t,\theta )-r^i(t,\theta )r^{i'}(t,\theta ) \end{aligned}$$

(3.11)

and $\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}=\mathsf{V}({{\mathbb {X}}}(t),\theta )_{i,i'}$ for $i,i'\in {{\mathbb {I}}}_0$. Write $\mathsf{V}_0(x)=\mathsf{V}_0(x,\theta ^*)$.

We have

$$\begin{aligned} \mathsf{V}^i(y^i,\rho ^i)_{k_i,k_i'}= & {} 1_{\{k_i=k_i'\}}{\dot{q}}^{k_i}(y^i,\rho ^i)-{\dot{q}}^{k_i} (y^i,\rho ^i){\dot{q}}^{k_i'}(y^i,\rho ^i). \end{aligned}$$

Hence,

$$\begin{aligned} \mathsf{V}^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}= & {} 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i)-q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \end{aligned}$$

(3.12)

and $\mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}=\mathsf{V}^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}$ for $k_i,k_i'\in {{\mathbb {K}}}_{i,0}$. We denote $\mathsf{V}^i_0(y^i)=\mathsf{V}^i_0(y^i,(\rho ^i)^*)$.

The symmetric matrices $\varGamma (\theta )$ and $\varGamma ^i(\rho ^i)$ are defined by

$$\begin{aligned} \varGamma (\theta )[u^{\otimes 2}]= & {} E\bigg [\bigg (\mathsf{V}_0({\mathbb {X}}(0),\theta )\otimes {\mathbb {X}}(0)^{\otimes 2}\bigg )[u^{\otimes 2}] \varLambda (\lambda _0(0),{\mathbb {X}}(0))\bigg ] \end{aligned}$$

for $u\in {{\mathbb {R}}}^{\textsf {p}}$, and

$$\begin{aligned} \varGamma ^i(\rho ^i)[(u^i)^{\otimes 2}]= & {} E\bigg [\bigg (\mathsf{V}^i_0({{\mathbb {Y}}}^i(0),\rho ^i)\otimes {{\mathbb {Y}}}^i(0)^{\otimes 2}\bigg )[(u^i)^{\otimes 2}] \varLambda (\lambda _0(0),{{\mathbb {X}}}(0))r^i(0,\theta ^*)\bigg ] \end{aligned}$$

for $u^i\in {{\mathbb {R}}}^{{\textsf {p}}_i}$, $i\in {{\mathbb {I}}}$, respectively. Let ${\check{{\textsf {p}}}}={\textsf {p}}+\sum _{i\in {{\mathbb {I}}}}{\textsf {p}}_i={\bar{i}}\>{\bar{j}} +\sum _{i\in {{\mathbb {I}}}}{\bar{k}}_i{\bar{j}}_i$. The full information matrix is the ${\check{{\textsf {p}}}}\times {\check{{\textsf {p}}}}$ block diagonal matrix

$$\begin{aligned} \varGamma (\theta ,\rho )= & {} \text {diag}\big [\varGamma (\theta ),\varGamma ^0(\rho ^0), \varGamma ^1(\rho ^1),...,\varGamma ^{{\bar{i}}}(\rho ^{{\bar{i}}})\big ], \end{aligned}$$

and in particular set

$$\begin{aligned} \varGamma = \varGamma (\theta ^*,\rho ^*). \end{aligned}$$

(3.13)

An identifiability condition will be imposed.

[M3]
$\displaystyle \inf _{\theta \in \varTheta }\inf _{u\in {{\mathbb {R}}}^{\textsf {p}}:\>|u|=1}\varGamma (\theta ) [u^{\otimes 2}]>0$ and $\displaystyle \inf _{\rho ^i\in \mathcal{R}_i}\inf _{u\in {{\mathbb {R}}}^{{\textsf {p}}_i}:\>|u^i|=1}\varGamma ^i(\rho ^i) [(u^i)^{\otimes 2}]>0$ for every $i\in {{\mathbb {I}}}$.

For the QMLE ${\hat{\psi }}_T^M=({\hat{\theta }}^M_T,{\hat{\rho }}^M_T)$ and the QBE ${\hat{\psi }}_T^B=({\hat{\theta }}^B_T,{\hat{\rho }}^B_T)$ of $\psi =(\theta ,\rho )=(\theta ,\rho ^1,...,\rho ^{{\bar{i}}})$, let

$$\begin{aligned} {\hat{u}}_T^\mathsf{A}= & {} T^{1/2}\big ({\hat{\psi }}_T^\mathsf{A}-\psi ^*) \qquad (\mathsf{A}\in \{M,B\}). \end{aligned}$$

Theorem 1

Suppose that Conditions [M1], [M2] and [M3] are satisfied. Then,

$$\begin{aligned} E[f({\hat{u}}_T^\mathsf{A})]\rightarrow & {} {{\mathbb {E}}}[f(\varGamma ^{-1/2}\zeta )] \end{aligned}$$

as $T\rightarrow \infty $ for $\mathsf{A}\in \{M,B\}$ and every $f\in C({{\mathbb {R}}}^{{\check{{\textsf {p}}}}})$ of at most polynomial growth, where $\zeta $ is a ${\check{{\textsf {p}}}}$-dimensional standard Gaussian random vector.

Example 1

As an illustration we consider the case with two processes (${\mathbb {I}}=\{0,1\}$), and two marks for each process (${\mathbb {K}}_0={\mathbb {K}}_1=\{0,1\}$). The first state-dependent term takes into account one covariate $X_1$ (i.e., ${\mathbb {J}}=\{1\}$). The mark distributions both depend on another covariate $Y_1$ (i.e. ${\mathbb {J}}_0={\mathbb {J}}_1=\{1\}$). In this example, we assume that $X_1$ and $Y_1$ are independent Markov chains with values in $\{-1,1\}$ and constant transition intensities $\lambda _X$ and $\lambda _Y$. We assume that $\lambda _0$ is the intensity of a Hawkes process $(H_t)_{t\ge 0}$ with a single exponential kernel, i.e., $\lambda _0(t)=\mu +\int _{0}^t \alpha e^{-\beta (t-s)}\,dH_s$, with $(\alpha ,\beta )\in ({\mathbb {R}}_ +^*)^2, \frac{\alpha }{\beta }<1$.

The two-step ratio model estimates the parameters $(\theta ^1_1,\rho ^{0,1}_1,\rho ^{1,1}_1)$ defined as $\theta ^1_1=\vartheta ^1_1-\vartheta ^0_1$ and $\rho ^{i,1}_1=\varrho ^{i,1}_1-\varrho ^{i,0}_1$, $i=0,1$. In this specific case, the matrix $\varGamma $ of Eq. (3.13) is a $3\times 3$-diagonal matrix, and a direct computation shows that the diagonal coefficients are

$$\begin{aligned} \varGamma _{0,0}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\theta ^1_1}}{1+e^{\theta ^1_1}}\left( \cosh \vartheta ^0_1 + \cosh \vartheta ^1_1 \right) , \\ \varGamma _{1,1}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\rho ^{0,1}_1}}{1+e^{\rho ^{0,1}_1}} \frac{e^{\theta ^1_1/2}}{1+e^{\theta ^1_1}} \left( \cosh \frac{\vartheta ^0_1+\vartheta ^1_1}{2} +\cosh \frac{3\vartheta ^1_1-\vartheta ^0_1}{2}\right) , \\ \varGamma _{2,2}&= \frac{\mu }{1-\frac{\alpha }{\beta }} \frac{e^{\rho ^{1,1}_1}}{1+e^{\rho ^{1,1}_1}} \frac{e^{\theta ^1_1/2}}{1+e^{\theta ^1_1}} \left( \cosh \frac{\vartheta ^0_1+\vartheta ^1_1}{2} +\cosh \frac{3\vartheta ^1_1-\vartheta ^0_1}{2}\right) . \end{aligned}$$

Table 1 Numerical results for the estimation of the model of Example 1

Full size table

We run 1000 simulations of the processes $(N^0, N^1)$ with their marks for various values of horizon T. Numerical values used in these simulations are the following: $\mu =0.5$, $\alpha =1.0$, $\beta = 2.0$, $\lambda _X=\lambda _Y=0.5$, $\vartheta ^0_1=-0.75$, $\vartheta ^1_1=0.75$, $\varrho ^{0,0}_1=-0.5$, $\varrho ^{0,1}_1=0.5$, $\varrho ^{1,0}_1=-1.0$, $\varrho ^{1,1}_1=1.0$. For each simulation, we compute the quasi-maximum likelihood estimators $({{\hat{\theta }}}^1_1,{{\hat{\rho }}}^{0,1}_1,{{\hat{\rho }}}^{1,1}_1)$ with the two-step ratios described above. Table 1 gives the mean estimators and the true values of the parameters, as well as the empirical standard deviation, compared to the theoretical values $T^{-\frac{1}{2}}\varGamma _{i,i}^{-\frac{1}{2}}$, $i=0,1,2$ from Theorem 1, for various values of T.For completeness, Figure 1 also plots the empirical standard deviations of the three estimators and the theoretical standard deviation $T^{-\frac{1}{2}}\varGamma _{i,i}^{-\frac{1}{2}}$, $i=0,1,2$ of Theorem 1, as a function of the horizon T.

Asymptotic values predicted by Theorem 1 are indeed empirically retrieved, which ends this numerical illustration.

4 Modeling and predicting sign and aggressiveness of market orders

4.1 Intensities of the processes counting market orders

We consider the market orders submitted to a given limit order book. Let $N^0$ be the process counting the market orders submitted on the bid side (sell market orders) and $N^1$ the process counting the market orders submitted on the ask side (buy market orders). On each side, we further consider whether the order is an aggressive order that moves the price (labeled with mark 1), or a non-aggressive order that does not move the price (labeled with mark 0).

We assume that the intensity of an order of type $i\in {\mathbb {I}}=\{0,1\}$ with mark $k_i\in {\mathbb {K}}={\mathbb {K}}_0={\mathbb {K}}_1=\{0,1\}$ is

$$\begin{aligned} \lambda ^{i, k_i}(t, \vartheta ^i, \varrho ^{i}) = \lambda _0(t) \exp \left( \! \sum _{j\in {\mathbb {J}}} \vartheta ^i_j X_j(t) \!\right) \frac{\exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k_i}_{j_i} Y^i_{j_i}(t) \right) }{\sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \!\sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k'_i}_{j_i} Y^i_{j_i}(t)\! \right) }. \end{aligned}$$

(4.1)

In the following applications, we will consider several possible models defined with various sets of covariates $X_j$, $j\in {\mathbb {J}}$ and $Y^i_j$, $j\in {\mathbb {J}}_i$, $i=0,1$. The tested sets of covariates $X_j$, $j\in {\mathbb {J}}$ and $Y^i_j$, $j\in {\mathbb {J}}_i$, $i=0,1$ will all be subsets of the following list of possible covariates (besides $Z_0=1$ common to all models):

${Z_1} $: $\frac{q^B(t)-q^A(t)}{q^B(t){+}q^A(t)}$ where $q^B(t)$ (resp. $q^A(t)$) is the quantity available at the best bid (resp.ask) at time t (i.e., the imbalance);
${Z_2}$: $\epsilon (t)$, where $\epsilon (t)$ is the sign of the last market order at time t (1 for an ask market order, $-1$ for a bid market order ;
${Z_3}$: ${s(t)}\epsilon (t)$ the signed spread, where s(t) is the observed spread in currency at time t ;
$Z_4$: $H^{0,1}(t) = \log \left( \mu ^{0,1} + \int _0^t \alpha ^{0,1} e^{-\beta ^{0,1}(t-s)} \mathrm{d}N^{0,1}_s \right) $ (Hawkes covariate for aggressive bid market orders)
$Z_5$: $H^{0,0}(t) = \log \left( \mu ^{0,0} + \int _0^t \alpha ^{0,0} e^{-\beta ^{0,0}(t-s)} \mathrm{d}N^{0,0}_s \right) $ (Hawkes covariate for non-aggressive bid market orders)
$Z_6$: $H^{1,1}(t) = \log \left( \mu ^{1,1} + \int _0^t \alpha ^{1,1} e^{-\beta ^{1,1}(t-s)} \mathrm{d}N^{1,1}_s \right) $ (Hawkes covariate for aggressive ask market orders)
$Z_7$: $H^{1,0}(t) = \log \left( \mu ^{1,0} + \int _0^t \alpha ^{1,0} e^{-\beta ^{1,0}(t-s)} \mathrm{d}N^{1,0}_s \right) $ (Hawkes covariate for non-aggressive ask market orders)
$Z_8$: $H^{0}(t) = \log \left( \mu ^{0} + \int _0^t \alpha ^{0} e^{-\beta ^{0}(t-s)} \mathrm{d}N^{0}_s \right) $ (Hawkes covariate for bid market orders)
$Z_9$: $H^{1}(t) = \log \left( \mu ^{1} + \int _0^t \alpha ^{1} e^{-\beta ^{1}(t-s)} \mathrm{d}N^{1}_s \right) $ (Hawkes covariate for ask market orders).

With these Hawkes covariates, the ratio model can actually be seen as a kind of non-linear Hawkes process. When the theory applied, the ergodicity is an assumption. In the present model, it depends on the nature of the process $\lambda _0$, that was set generally. Brémaud and Massoulié (1996) treated a stability problem of a nonlinear Hawkes process. If the system has a Markovian representation, there is a possibility of applying a drift condition like Abergel and Jedidi (2015) and Clinet and Yoshida (2017). On the other hand, the intraday stationarity (ergodicity) is not essentially important. As described in Section 3.2 of Muni Toke and Yoshida (2020), in quite parallel to the simple stationary case, we can relax the assumption of intraday stationarity by considering a repeated measurements model. Then, we only need a more realistic ergodicity of the data across the long-run repeated measurements, and after all, we can validate the methods.

4.2 Limit order book data

We use tick-by-tick data for 36 stocks traded on Euronext Paris. The sample spans the whole year 2015, i.e., roughly 200 trading days for each stock, although some days are missing for some stocks. Table 3 in Sect. 7 lists the stocks investigated and the number of trading days available. Rough data consist in a TRTH (Thomson-Reuters Tick History) database: for each trading day and each stock, one file lists the transactions (quantities and prices) and one file lists the modifications of the limit order book (level, price and quantities). Timestamps are given with a millisecond precision. Synchronization of both files and reconstruction of the limit order book are carried out with the procedure described in Muni Toke (2016). One strong advantage of the ratio model is that it does not require precise timestamps in itself, since timestamps do not appear explicitly in the quasi-likelihood of the ratios, while fitting other intensity-based models (e.g., Hawkes processes) requires unique precise timestamps for log-likelihood computation. Here, if Hawkes fits are used as covariates (covariates $Z_4$ to $Z_9$ in our application), then we choose to consider only unique timestamps, i.e., we aggregate orders of the same type occurring at the same timestamp.

4.3 Estimation procedure of the two-step ratio model

Following Sects. 2 and 3, estimation of the model defined at Eq. (4.1) can be carried out with multiple successive ratio models. In the first step, we consider the difference parameters $\theta ^i_j = \vartheta ^i_j - \vartheta ^0_j, i\in {\mathbb {I}}\setminus \{0\}, j\in {\mathbb {J}}$ and the ratios $(i\in {\mathbb {I}}\setminus \{0\})$:

$$\begin{aligned} r^i(t,\theta )&= \frac{ \exp \left( \sum _{j\in {\mathbb {J}}} \vartheta ^i_j X_ j(t)\right) }{ \sum _{i'\in {\mathbb {I}}} \exp \left( \sum _{j\in {\mathbb {J}}} \vartheta ^{i'}_j X_ j(t)\right) } = \left[ \sum _{i'\in {\mathbb {I}}} \exp \left( \sum _{j\in {\mathbb {J}}} ( \theta ^{i'}_j - \theta ^i_j) X_ j(t)\right) \right] ^{-1}. \end{aligned}$$

(4.2)

The quasi-log-likelihood based on the observation on [0, T] for this ratio model is defined at Eq. (3.1). In the second step, we consider the ratios

$$\begin{aligned} p^{k_i}_i(t, \varrho ^i)&= \frac{\exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k_i}_{j_i} Y^i_{j_i}(t) \right) }{\sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} \varrho ^{i,k'_i}_{j_i} Y^i_{j_i}(t) \right) } = \left[ \sum _{k'_i\in {\mathbb {K}}_i} \exp \left( \sum _{{j_i}\in {\mathbb {J}}_i} ( \varrho ^{i,k'_i}_{j_i} - \varrho ^{i,k_i}_{j_i}) Y^i_ {j_i}(t)\right) \right] ^{-1}, \end{aligned}$$

(4.3)

and the associated quasi-log-likelihood of Eq. (3.2). Consistency and asymptotic normality of the quasi-maximum likelihood estimators are guaranteed by Theorem 1.

4.4 In-sample model selection with QAIC and QBIC

In this first application, we perform in-sample model selection to assess the relevance of the different possible sets of covariates. For each stock and each trading day, we fix a set of covariates. We use the indices of the tested covariates to name the models: the model 146 is, thus, the model with covariates $(Z_1,Z_4,Z_6)$. If required, we estimate the parameters of all the Hawkes covariates on the previous day and then compute the Hawkes covariates using these fitted parameters. This procedure ensures that the predictability of the covariates is not violated. We finally fit three ratio models following the above procedure : one for the processes $(N^0, N^1)$ (signature of the marker orders), one for the processes $(N^{0,0}, N^{0,1})$ (aggressiveness of the bid market orders) and one for the processes $(N^{1,0}, N^{1,1})$ (aggressiveness of the ask market orders).

For each trading day, we then select the model minimizing some information criterion. For the ratio for the side determination, the criterion is

$$\begin{aligned} -2 {\mathbb {H}}_T({\hat{\theta }}^M_T)+ {a_T} |{\mathbb {J}}|, \end{aligned}$$

(4.4)

where $|{\mathbb {J}}|$ is the cardinality of the set of ${\mathbb {J}}$ and $a_T=2$ for the QAIC criterion, and $a_T=\log (T)$ for the QBIC criterion. For the aggressiveness ratios, the criterion is

$$\begin{aligned} -2 {\mathbb {H}}^{(i)}_T({{\hat{\varrho }}}^{i})+{a_T} |{\mathbb {J}}_i| \quad (i\in {\mathbb {I}}). \end{aligned}$$

(4.5)

We finally compute for each stock the frequencies of selection of different sets of covariates (i.e., the number of trading days in which a model is selected by QAIC or QBIC over the total number of trading days in the sample for this stock). Figures 2, 3 and 4 plot the results as a model $\times $ stock heatmap for each of these three ratios^{Footnote 3}. For completeness, Tables 4, 5 and 6 in Sect. 8 list for each ratio model (side, bid aggressiveness, ask aggressiveness) the frequency of selection averaged across stocks for each model and each information criterion.

For side determination, the models 14689, 124689, 134689 and 1234689 are the four most often chosen models: the selected model is among these four models more than $80\%$ of the time in average across stocks using QAIC, and close the $90\%$ of the time using QBIC. As expected, QBIC favors the smallest model 14689. Imbalance, Hawkes covariates for bid and ask market orders, and Hawkes covariates for aggressive bid and ask market orders, thus, appear to be the most informative covariates.

For aggressiveness determination, the model 146 is the most often selected by QBIC. This is in line with intuition: imbalance is known to be a significant proxy for price change (see, e.g., Lipton et al. 2013) and Hawkes covariates for aggressive bid and aggressive ask are specific to the targeted events. QAIC selection is more widespread and favors a larger model (as expected), namely 12346. Note also that for several stocks, models with “symmetric” sets of covariates can also be chosen: for ask aggressiveness, 1679 is often selected, i.e., imbalance and all available ask Hawkes covariates; symmetrically, 1458 is selected for ask aggressiveness, i.e., imbalance and all available bid Hawkes covariates.

One may in particular observe that these results confirm the primary role of the spread measured in ticks in the theory of financial microstructure. Stocks for which the observed spread is mostly equal to one tick are labeled ’large-tick stocks’, implying that market participants are constrained by the price grid when submitting orders to the limit order book. Other stocks may be labeled ’small-tick stocks’ (Eisler et al. 2012). Using our sample, we compute the mean observed spread in ticks for each stock and each available trading day, and group these values in bins of equal sizes. Then inside each bin, we compute the frequency of selection of the covariate ${Z_3}$ (signed spread) by QBIC for the aggressiveness ratio estimation of Equation (4.3). Bar plot is provided in Fig. 5 (left). We observe an increase of the frequency of the selection of the spread covariate when the mean observed spread increases from 1 tick (its minimal possible value) to roughly $2.{5}$ ticks. For larger spread values, frequency remains high then seems to decrease at high values. This indicates that the significancy of covariates, especially the spread, is not the same for large-tick and small-tick stocks, and that even for small-tick stocks, dependency is not constant/uniform. This visual observation can, for example, be complemented by the following statistical test. For all stocks and trading days, we compute the empirical cumulative distributions functions of the daily mean spread in ticks (i) when the spread covariate is selected by QBIC in the aggressiveness ratios, and (ii) when the spread covariate is not selected. A one-sided Kolmogorov–Smirnov test rejects (with p-value $10^{-53}$) the fact that both distributions are identical, and chooses the alternative hypothesis that the spread covariate is more selected for larger observed spreads. Recall that many microstructure models are developed for large-tick stocks, since assuming a constant spread equal to one tick often simplifies the analysis of the limit order book dynamics. Our observation advocates for the definition of specific microstructure models for small-tick stocks, taking into account the spread dynamics.

Model selection consistency validates the use of QBIC. See Eguchi and Masuda (2018), or follow Muni Toke and Yoshida (2020) for a direct proof for consistency including other criteria. However, the real performance in prediction of a selected model is more important than the model selection consistency. It is worth trying QAIC, or the consistent QAIC.

4.5 Out-of-sample prediction performance

In this section, we use intensity and ratio models to predict the sign and aggressiveness of an incoming market order. For all tested models, the procedure is the following. On a given trading day, the model is fitted. Fitted parameters are then used on the following trading day (available in the database) to compute the intensities (or ratios for ratio models), at all time. The type of an incoming event is then predicted to be the type of highest intensity or ratio. The exercise is theoretical in the sense that we assume that these computations are instantaneous, so that intensities or ratios are available at all times.

Recall the notation $N=(N^{i,k_i})_{i\in \{0,1\}, k_i\in \{0,1\}}$ for the four-dimensional point process counting bid aggressive market orders, bid non-aggressive market orders, ask aggressive market orders and ask non-aggressive market orders. We use two benchmark models.

The first benchmark model is the Hawkes model. Here, N is assumed to be a four-dimensional Hawkes process with a single exponential kernel. In vector notation, the intensity is written as

Estimation and ratio computation can be found in, e.g., Bowsher (2007); Muni Toke and Pomponio (2012). This model is labeled ‘Hawkes’.

The second benchmark model is the four-dimensional ratio model without marks (Muni Toke and Yoshida (2020)). In this model, the intensity of the counting process $(i,k_i)$ is

$$\begin{aligned} \lambda _{R}^{i,k_i}(t) = \lambda _{0,R}(t) \exp \bigg (\sum _{j\in {{\mathbb {J}}}} \vartheta ^{i,k_i}_j X_j(t) \bigg ), \end{aligned}$$

with some unobserved baseline intensity $\lambda _{0,R}(t)$. Given the previous observations, we choose the set of covariates $(Z_1,Z_4,Z_6,Z_8,Z_9)$ for this benchmark. It is natural to choose these covariates (imbalance, Hawkes for aggressive orders and Hawkes for all orders) given the results on model selection of Sect. 4.4. Estimation and ratio computation are detailed in Muni Toke and Yoshida (2020). This model is labeled ’Ratio-14689’.

These two benchmarks are used to assess the performances of two marked ratio models (or two-step ratio models) described in this paper. The first marked ratio model uses the covariates $(Z_4,Z_5,Z_6,Z_7)$ for both steps. These covariates are based on the Hawkes processes of the benchmark Hawkes model. The second marked ratio model uses the covariates $(Z_1,Z_4,Z_6,Z_8,Z_9)$ for the first-step ratio (side determination) and $(Z_1,Z_4,Z_6)$ for both second-step ratios (bid and ask aggressiveness). Again, these choices are natural given the results on model selection of Sect. 4.4. These models are labeled ’MarkedRatio-4567-4567-4567’ and ’MarkedRatio-14689-146-146’, respectively.

Figure 6 plots the results for each stock for the two benchmark models and the two marked ratio models. For completeness, the partial performances for side determination and aggressiveness determination of the trades are provided on Fig. 7. Finally, Table 2 lists the partial and global prediction performances of these models averaged across stocks. The benchmark Hawkes model correctly predicts the sign and aggressiveness of an incoming order with an accuracy in the range $[40\%,60\%]$ for all stocks, with a $50\%$ average. The marked ratio model with only Hawkes parameters (’MarkedRatio-4567-4567-4567’) and no dependency on the state of the limit order book actually reproduces closely these performances. The non-marked ratio model ’Ratio-14689’ improves slightly the global performances of the two previous models. When looking at the partial accuracies, we observe that this improvement is mainly due to a better side prediction. Finally, the ’MarkedRatio-14689-146-146’, which appeared to be in average the best model with respect to the QBIC selection, results strongly outperforms all other models. The global accuracy is in the range $[60\%,80\%]$ for all stocks, with a $67\%$ average, i.e., we are theoretically able to correctly predict both the sign and aggressiveness of an incoming market order two times out of three. Finally, we observe by comparing side determination of ’Ratio-14689’ and ’MarkedRatio-14689-146-146’ that the decoupling of the side and aggressiveness ratios in the marked ratio model significantly improves the prediction performance over the one-step four-dimensional case, while using the same covariates.

Table 2 Prediction performances of selected models averaged across stocks.

Full size table

These results show that the two-step ratio model for marked point processes is a significant improvement to existing intensity models. As in the standard ratio model of Muni Toke and Yoshida (2020), this provides an easy way to have both clustering and state-dependency. However, it is important to note that the two-step ratio strongly improves the performance of the standard ratio model in multidimensional setting. In this example, flexibility in the choice of covariates allows for precise model selection for both sign and aggressiveness.

5 Proof of Theorem 1

The convergence given in Theorem 1 can be obtained by the quasi-likelihood analysis, which we recall in Section 6. We will apply Theorems 3 and 5 in Sect. 6 to the double ratio model. In the present situation, the scaling factor is $b_T=T$, the joint parameter $(\theta ,\rho )$ is for $\theta $ in Section 6, and the dimension of the full parameter space is ${\check{{\textsf {p}}}}$ in place of ${\textsf {p}}$ of Section 6. Fix a set of values of parameters $(\alpha ,\beta _1,\beta _2,\rho , \rho _1,\rho _2)$ so that Condition [L1] (Section 6) is met with $\rho =2$.

5.1 Score functions and a central limit theorem

The score function for $\rho ^i$ is given by

$$\begin{aligned} F^{(i)}_T(\rho ^i)= & {} \partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i) \>=\>\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\partial _{\rho ^i}\log q^{k_i}_i(t,\rho ^i)\mathrm{d}N^{i,k_i}_t. \end{aligned}$$

Then,

$$\begin{aligned} F^{(i)}_T(\rho ^i)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big ( 1_{\{k_i\}}(\cdot )-q^\flat _i(t,\rho ^i)\big )\otimes {{\mathbb {Y}}}^i(t)\mathrm{d}N^{i,k_i}_t, \end{aligned}$$

(5.1)

where $q^\flat _i(t,\rho ^i)=(q_i^{k_i}(t,\rho ^i))_{k_i\in {{\mathbb {K}}}_{i,0}}$, ${{\mathbb {Y}}}^i(t)=(Y^i_{j_i}(t))_{j_i\in {{\mathbb {J}}}_i}$ and $1_{\{k_i\}}(\kappa )=\big (1_{\{\kappa =k_i\}}\big )_{\kappa \in {{\mathbb {K}}}_{i,0}}$. By some calculus with (2.1) and $p^{k_i}_i(t,\varrho ^i)=q^{k_i}_i(t,\rho ^i)$, we see

$$\begin{aligned} F^{(i)}_T:=F^{(i)}_T((\rho ^i)^*)= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big ( 1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big )\otimes {{\mathbb {Y}}}^i(t)d {\tilde{N}}^{i,k_i}_t. \end{aligned}$$

(5.2)

We are assuming that the counting processes $N^{i,k_i}$ ($i\in {{\mathbb {I}}};\>j_i\in {{\mathbb {K}}}_i$) have no common jumps. Then, the ${\textsf {p}}_i\times {\textsf {p}}_{i'}$ matrix valued process

$$\begin{aligned} \langle F^{(i)},F^{(i')}\rangle _T= & {} 0\quad (i,i'\in {{\mathbb {I}}},\>i\not =i') \end{aligned}$$

(5.3)

and

$$\begin{aligned} \langle F^{(i)}\rangle _T= & {} \sum _{k_{i}\in {{\mathbb {K}}}_i}\int _0^T\bigg \{{\big (}1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big )\otimes {{\mathbb {Y}}}^i(t) \bigg \}^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho ^i)^*)\mathrm{d}t \\ {}= & {} \int _0^T\mathsf{V}^i_0({{\mathbb {Y}}}^i(t),(\rho ^i)^*)\otimes ({{\mathbb {Y}}}^i(t))^{\otimes 2} \>\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))r^i(t,\theta ^*)\mathrm{d}t \quad (i\in {{\mathbb {I}}}). \end{aligned}$$

Therefore, the mixing property [M2] gives the convergence

$$\begin{aligned}&T^{-1}\langle F^{(i)}\rangle _T \rightarrow ^p \varGamma ^{(i)}((\rho ^i)^*)\nonumber \\&\quad = E\bigg [\mathsf{V}^i_0({{\mathbb {Y}}}^i(0),(\rho _i)^*)\otimes ({{\mathbb {Y}}}^i(0))^{\otimes 2} \>\varLambda (\lambda _0(t),{{\mathbb {X}}}(0))r^i(0,\theta ^*)\bigg ] \end{aligned}$$

(5.4)

as $T\rightarrow \infty $, with the aid of [M1].

The score function for $\theta $ is the ${\textsf {p}}$-dimensional process

$$\begin{aligned} F_T(\theta )= & {} \partial _{\theta }{{\mathbb {H}}}_T(\theta ) \>=\>\sum _{i\in {{\mathbb {I}}}}\int _0^T\partial _{\theta }\log r^i(t,\theta )\mathrm{d}N^i_t \nonumber \\ {}= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T \big (1_{\{i\}}(\cdot )-r^\flat (t,\theta )\big )\otimes {{\mathbb {X}}}(t)\mathrm{d}N^i_t, \end{aligned}$$

(5.5)

where $r^\flat (t,\theta )=(r^i(t,\theta ))_{i\in {{\mathbb {I}}}_0}$. Evaluated at $\theta ^*$,

$$\begin{aligned} F_T= & {} F_T(\theta ^*) \>=\>\sum _{i\in {{\mathbb {I}}}}\int _0^T\big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*)\big )\otimes {{\mathbb {X}}}(t)d{\tilde{N}}^i_t \nonumber \\= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\big (1_{\{i\}}(\cdot ) -r^\flat (t,\theta ^*)\big )\otimes {{\mathbb {X}}}(t)d{\tilde{N}}^{i,k_i}_t. \end{aligned}$$

(5.6)

Then, the ${\textsf {p}}\times {\textsf {p}}$ matrix valued process $\langle F\rangle $ has the expression

$$\begin{aligned} \langle F\rangle _T= & {} \sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T\big (1_{\{i\}}(\cdot ) -r^\flat (t,\theta ^*)\big )^{\otimes 2} \otimes {{\mathbb {X}}}(t)^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho _i)^*)\mathrm{d }t \\ {}= & {} \sum _{i\in {{\mathbb {I}}}}\int _0^T\big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*) \big )^{\otimes 2} \otimes {{\mathbb {X}}}(t)^{\otimes 2} r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \\ {}= & {} \int _0^T\mathsf{V}_0({{\mathbb {X}}}(t))\otimes {{\mathbb {X}}}(t)^{\otimes 2}\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t. \end{aligned}$$

Then, the mixing property [M2] provides the convergence

$$\begin{aligned} T^{-1}\langle F\rangle _T&\rightarrow ^p&\varGamma (\theta ^*) \>=\>E\bigg [\bigg (\mathsf{V}_0({\mathbb {X}}(0))\otimes {\mathbb {X}}(0)^{\otimes 2}\bigg ) \varLambda (\lambda _0(0),{\mathbb {X}}(0))\bigg ] \end{aligned}$$

(5.7)

as $T\rightarrow \infty $.

For $i\in {{\mathbb {I}}}$,

$$\begin{aligned} \langle F,F^{(i)}\rangle _T= & {} \sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big (1_{\{i\}}(\cdot )-r^\flat (t,\theta ^*)\big )\otimes \big (1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*)\big ) \otimes {{\mathbb {X}}}(t)\otimes {{\mathbb {Y}}}^i(t) \nonumber \\&\times r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))q^{k_i}_i(t,(\rho _i)^*)\mathrm{d}t \nonumber \\= & {} 0 \end{aligned}$$

(5.8)

since

$$\begin{aligned} \sum _{k_i\in {{\mathbb {K}}}_i}\big (1_{\{k_i\}}(\cdot )-q^\flat _i(t,(\rho ^i)^*) \big )q^{k_i}_i(t,(\rho _i)^*)= & {} 0. \end{aligned}$$

The full information matrix is the ${\check{{\textsf {p}}}}\times {\check{{\textsf {p}}}}$ block diagonal matrix

$$\begin{aligned} \varGamma \>=\>\varGamma (\theta ^*,\rho ^*)= & {} \text {diag}\big [\varGamma (\theta ^*),\varGamma ^0((\rho ^0)^*), \varGamma ^1((\rho ^1)^*),...,\varGamma ^{{\bar{i}}}((\rho ^{{\bar{i}}})^*)\big ]. \end{aligned}$$

Let $\varDelta _T=T^{-1/2}\big (F_T,(F^{(i)}_T)_{i\in {{\mathbb {I}}}}\big )$. Now, by the martingale central limit theorem, it is easy to obtain the convergence

$$\begin{aligned} \varDelta _T&\rightarrow ^d&\varGamma ^{1/2}\zeta \quad (T\rightarrow \infty ), \end{aligned}$$

where $\zeta $ is a ${\check{{\textsf {p}}}}$-dimensional standard Gaussian random vector. The joint convergence $(\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )$ is obvious since $\varGamma $ is deterministic.

5.2 Condition [L4]

According to (6.2), we define the random field ${{\mathbb {Y}}}_T:\varOmega \times {\overline{\varTheta }}\times {\overline{\mathcal{R}}}\rightarrow {{\mathbb {R}}}$ by

$$\begin{aligned} {{\mathbb {Y}}}_T(\theta ,\rho ) = T^{-1}\big ({{\mathbb {H}}}_T(\theta ,\rho )-{{\mathbb {H}}}_T(\theta ^*,\rho ^*)\big ) \end{aligned}$$

for ${{\mathbb {H}}}_T(\theta ,\rho )$ given in (3.3). From the expression (3.4) of ${{\mathbb {H}}}_T(\theta ,\rho )$, we have

$$\begin{aligned} T^{-1}{{\mathbb {H}}}_T(\theta ,\rho )= & {} T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big ) \mathrm{d}N^{i,k_i}_t \\= & {} T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t \\&+T^{-1}\sum _{i\in {{\mathbb {I}}}}\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \big \{\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i) \big \}\\&\times \lambda _0(t)\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^i_jX_j(t)\bigg )\>p_i^{k_i}(t,(\varrho ^*)^{i,k_i})\mathrm{d}t. \end{aligned}$$

By definition,

$$\begin{aligned} \big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |\le & {} C\bigg (1+\sum _{j\in {{\mathbb {J}}}}|X_j(t)|+\sum _{i\in {{\mathbb {I}}}}\sum _{j_i\in {{\mathbb {J}}}_i} |Y^{k_i}_{j_i}(t)|\bigg )\, (\ell =0,1), \end{aligned}$$

where C is a constant depending on the diameters of $\varTheta $ and $\mathcal{R}$. Therefore, under Condition [M1],

$$\begin{aligned}&E\bigg [\bigg |\partial _{(\theta ,\rho )}^\ell T^{-1/2} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\mathrm{d}{\tilde{N}}^{i,k_i}_t \bigg |^{2^k}\bigg ] \\&\quad {{\mathop {\sim }\limits ^{{\textstyle<}}}}\ E\bigg [\bigg (T^{-1} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 \mathrm{d}N^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ]\\&\quad {{\mathop {\sim }\limits ^{{\textstyle <}}}}\ E\bigg [T^{-1} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^{2^k} \lambda ^{i,k_i}(t,(\vartheta ^i)^*,(\varrho ^{i,k_i})^*)\mathrm{d}t\bigg ]\\&\qquad +T^{-2^{k-2}}E\bigg [\bigg (T^{-1/2} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 d{\tilde{N}}^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ] \\&\quad = O(1)+T^{-2^{k-2}}E\bigg [\bigg (T^{-1/2} \int _0^T\big |\partial _{(\theta ,\rho )}^\ell \log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )\big |^2 d{\tilde{N}}^{i,k_i}_t\bigg )^{2^{(k-1)}}\bigg ] \end{aligned}$$

for $k\in {{\mathbb {N}}}$, where the constant appearing at each ${{\mathop {\sim }\limits ^{{\textstyle <}}}}\ $ depends only on ${\check{{\textsf {p}}}}$, k and the constant of the Burkholder–Davis–Gundy inequality. By induction, we obtain

$$\begin{aligned} \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}}\sup _{T\ge 1}\bigg \Vert \partial _{(\theta ,\rho )}^\ell T^{-1/2} \int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t\bigg \Vert _p< & {} \infty \end{aligned}$$

(5.9)

for every $p>1$ and $\ell \in \{0,1\}$. Then, Sobolev’s inequality gives

$$\begin{aligned} \sup _{T\ge 1} \bigg \Vert \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \bigg |T^{-1/2}\int _0^T\log \big (r^i(t,\theta )q^{k_i}_i(t,\rho ^i)\big )d{\tilde{N}}^{i,k_i}_t\bigg | \>\bigg \Vert _p< & {} \infty \end{aligned}$$

(5.10)

for every $p>1$.

Let

$$\begin{aligned} \varPhi (t,\theta ,\rho )= & {} {\sum _{i\in {{\mathbb {I}}}} \sum _{k_i\in {{\mathbb {K}}}_i}\bigg \{r^i(t,\theta ^*) p_i^{k_i}(t,(\varrho ^i)^*) \log \frac{r^i(t,\theta )q_i^{k_i}(t,\rho ^i)}{r^i(t,\theta ^*)q_i^{k_i} (t,(\rho ^i)^*)}\bigg \} } \\&\qquad \times \lambda _0(t)\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^{i'}_jX_j(t)\bigg ). \end{aligned}$$

Then, Conditions [M1] and [M2] imply a Rosenthal type inequality under the mixing condition (cf. Rio Rio (2017))

$$\begin{aligned} \sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \sup _{T\ge 1}\bigg \Vert T^{-1/2}\int _0^T\partial _{(\theta ,\rho )}^\ell \big (\varPhi (t,\theta ,\rho )-E[\varPhi (t,\theta ,\rho )]\big )\mathrm{d}t\bigg \Vert _p< & {} \infty \end{aligned}$$

for every $p>1$ and $\ell \in \{0,1\}$. This entails

$$\begin{aligned} \sup _{T\ge 1}\bigg \Vert T^{1/2}\sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \bigg |T^{-1}\int _0^T\big (\varPhi (t,\theta ,\rho )-E[\varPhi (t,\theta ,\rho )] \big )\mathrm{d}t\bigg |\bigg \Vert _p< & {} \infty \end{aligned}$$

(5.11)

for every $p>1$.

Combining (5.11) with (5.10), we obtain

$$\begin{aligned} \sup _{T\ge 1}E\bigg [\bigg (T^{1/2}\sup _{(\theta ,\rho )\in \varTheta \times \mathcal{R}} \big |{{\mathbb {Y}}}_T(\theta ,\rho )-{{\mathbb {Y}}}(\theta ,\rho )\big |\bigg )^p\bigg ]< & {} \infty \end{aligned}$$

(5.12)

for every $p>1$, if we set

$$\begin{aligned} {{\mathbb {Y}}}(\theta ,\rho )= & {} {E[\varPhi (0,\theta ,\rho )]} \nonumber \\= & {} E\bigg [{\sum _{i\in {{\mathbb {I}}}} \sum _{k_i\in {{\mathbb {K}}}_i}\bigg \{r^i(t,\theta ^*) p_i^{k_i}(0,(\varrho ^i)^*) \log \frac{r^i(0,\theta )q_i^{k_i}(0,\rho ^i)}{r^i(0,\theta ^*)q_i^{k_i} (0,(\rho ^i)^*)}\bigg \} } \\&\qquad \times \lambda _0(0)\sum _{i'\in {{\mathbb {I}}}}\exp \bigg (\sum _{j\in {{\mathbb {J}}}} (\vartheta ^*)^{i'}_jX_j(0)\bigg )\bigg ]. \end{aligned}$$

This verifies Condition [L4](ii).

As (6.1), we define $\varGamma _T(\theta ,\rho )$ by

$$\begin{aligned} \varGamma _T(\theta ,\rho )= & {} -T^{-1}\partial _{(\theta ,\rho )}^2{{\mathbb {H}}}_T(\theta ,\rho ). \end{aligned}$$

From (5.1),

$$\begin{aligned} \partial _{\rho ^i}^2{{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} -\sum _{k_i\in {{\mathbb {K}}}_i}\int _0^T \partial _{\rho ^i}q^\flat _i(t,\rho ^i)\otimes {{\mathbb {Y}}}^i(t)\mathrm{d}N^{i,k_i}_t. \end{aligned}$$

More precisely,

$$\begin{aligned} \partial _{\rho ^{i,k_i}_{j_i}}\partial _{\rho ^{i,k_i'}_{j_i'}} {{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t)\mathrm{d}N^{i,k_i''}_t \\ {}= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t)\mathrm{d}{\tilde{N}}^{i,k_i''}_t \\&-\int _0^T \bigg \{ 1_{\{k_i=k_i'\}}q^{k_i}_i(t,\rho ^i) -q^{k_i}_i(t,\rho ^i)q^{k_i'}_i(t,\rho ^i) \bigg \}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t) \\&\qquad \qquad \qquad \times r^i(t,\theta ^*)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \\ {}= & {} -\sum _{k_i''\in {{\mathbb {K}}}_i}\int _0^T \mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'}{{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'} (t)d{\tilde{N}}^{i,k_i''}_t \\&-\int _0^T \mathsf{V}_0^i({{\mathbb {Y}}}^i(t),\rho ^i)_{k_i,k_i'} {{\mathbb {Y}}}^i_{j_i}(t){{\mathbb {Y}}}^i_{j_i'}(t) \varLambda (\lambda _0(t),{{\mathbb {X}}}(t))r^i(t,\theta ^*)\mathrm{d}t \end{aligned}$$

for $k_i,k_i'\in {{\mathbb {K}}}_{i,0}$, $j_i,j_i'\in {{\mathbb {J}}}_i$ and $i\in {{\mathbb {I}}}$, where (3.12) was used. Similarly, from (5.5),

$$\begin{aligned} \partial _\theta ^2{{\mathbb {H}}}_T(\theta )= & {} -\sum _{i\in {{\mathbb {I}}}}\int _0^T\partial _\theta r^\flat (t,\theta )\otimes {{\mathbb {X}}}(t)\mathrm{d}N^i_t, \end{aligned}$$

equivalently,

$$\begin{aligned} \partial _{\theta ^i_j}\partial _{\theta ^{i'}_{j'}}{{\mathbb {H}}}_T(\theta )= & {} -\sum _{i''\in {{\mathbb {I}}}}\int _0^T\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}X_j(t)X_{j'}(t)d{\tilde{N}}^{i''}_t \\&-\int _0^T\mathsf{V}_0({{\mathbb {X}}}(t),\theta )_{i,i'}X_j(t)X_{j'}(t)\varLambda (\lambda _0(t),{{\mathbb {X}}}(t))\mathrm{d}t \end{aligned}$$

for $i,i'\in {{\mathbb {I}}}_0$ and $j,j'\in {{\mathbb {J}}}$. Obviously,

$$\begin{aligned} \partial _{\theta }\partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i)\>=\>0 \quad \text {and}\quad \partial _{\rho ^{i'}}\partial _{\rho ^i}{{\mathbb {H}}}_T^{(i)}(\rho ^i)= & {} 0 \quad (i',i\in {{\mathbb {I}}}:\>i'\not =i). \end{aligned}$$

In a way similar to the derivation of (5.12), as a matter of fact it is easier, we can show

$$\begin{aligned} \sup _{T\ge 1}E\big [\big (T^{1/2}|\varGamma _T(\theta ^*,\rho ^*) -\varGamma |\big )^p\big ]< & {} \infty \end{aligned}$$

for every $p>1$ under Conditions [M1] and [M2]. Therefore, Condition [L4](iv) for $\beta _1=1/2$ was verified. It is also possible to show [L4](iii) in a similar fashion using the mixing property and Sobolev’s inequality. Condition [L4](i) is already checked in (5.9). Thus, Condition [L4] has been verified.

5.3 Conditions [L2] and [L3]

We see

$$\begin{aligned} \partial _{(\theta ,\rho )}^2{{\mathbb {Y}}}(\theta ,\rho )=\varGamma (\theta ,\rho ), \end{aligned}$$

and by [M3], we conclude ${{\mathbb {Y}}}(\theta ,\rho )$ is strictly convex function on ${\overline{\varTheta }}\times {\overline{\mathcal{R}}}={\overline{\varTheta }} \times \varPi _{i\in {{\mathbb {I}}}}{\overline{\mathcal{R}}}_i$. For some neighborhood U of $(\theta ^*,\rho ^*)$ and some positive number $\chi _1$,

$$\begin{aligned} {{\mathbb {Y}}}(\theta ,\rho )\le -\chi _1|(\theta ,\rho )-(\theta ^*,\rho ^*)|^2\qquad \big ((\theta ,\rho )\in U\big ) \end{aligned}$$

by the non-degeneracy of $\varGamma (\theta ^*,\rho ^*)$. Moreover, $\sup _{(\theta ,\rho )\in (\varTheta \times \mathcal{R})\setminus U}{{\mathbb {Y}}}(\theta ,\rho )<0$. In fact, if there was a point $(\theta ^+,\rho ^+)\not \in U$ such that ${{\mathbb {Y}}}(\theta ^+,\rho ^+)=0$, then at a point on the segment connecting $(\theta ^*,\rho ^*)$ and $(\theta ^+,\rho ^+)$, $\varGamma (\theta ,\rho )$ would degenerate, and this contradicts [M3]. As a consequence, Condition [L2] is verified for $\rho =2$ and some (deterministic) positive number $\chi _0$ since the parameter space is bounded. Condition [L3] is now obvious.

5.4 Proof of Theorem 1

We have verified Conditions [L1]-[L4] in the present situation. Theorem 1 now follows from Theorems 3 and 5. $\square $

6 Quasi-likelihood analysis

This section recalls the quasi-likelihood analysis. Let $\varTheta $ be a bounded open set in ${{\mathbb {R}}}^{\textsf {p}}$. Given a probability space $(\varOmega ,\mathcal{F},P)$, suppose that ${{\mathbb {H}}}_T:\varOmega \times {\overline{\varTheta }}\rightarrow {{\mathbb {R}}}$ is of class $C^3$, that is, the mapping $\varTheta \ni \theta \mapsto {{\mathbb {H}}}_T(\omega ,\theta )\in {{\mathbb {R}}}^{\textsf {p}}$ is continuously extended to ${\overline{\varTheta }}$ and of class $C^3$ for every $\omega \in \varOmega $, and the mapping $\varOmega \ni \omega \mapsto {{\mathbb {H}}}_T(\omega ,\theta )\in {{\mathbb {R}}}^{\textsf {p}}$ is measurable for every $\theta \in \varTheta $. Let $\varGamma $ be a ${\textsf {p}}\times {\textsf {p}}$ random matrix.

Let $\theta ^*\in \varTheta $. For a sequence $a_T\in GL({\textsf {p}})$ satisfying $\lim _{T\rightarrow \infty }|a_T|=0$, let

$$\begin{aligned} \varDelta _T = \partial _\theta {{\mathbb {H}}}_T(\theta ^*)a_T \quad \text {and}\quad \varGamma _T(\theta ) = -a_T^\star \partial _\theta ^2{{\mathbb {H}}}_T(\theta )a_T, \end{aligned}$$

(6.1)

where $\star $ denotes the matrix transpose. We consider a random field

$$\begin{aligned} {{\mathbb {Y}}}_T(\theta ) = b_T^{-1}\big ({{\mathbb {H}}}_T(\theta )-{{\mathbb {H}}}_T(\theta ^*)\big ), \end{aligned}$$

(6.2)

which will be assumed to converge to a random field ${{\mathbb {Y}}}:\varOmega \times \varTheta \rightarrow {{\mathbb {R}}}$. Only for simplifying presentation, we will assume that $a_T=b_T^{-1/2}I_{\textsf {p}}$ for diverging sequence $(b_T)_{T>0}$ of positive numbers, where $I_{\textsf {p}}$ is the identity matrix. In what follows, we fix a positive number L.

We will give a simplified exposition of Yoshida (2011) on the polynomial type large deviation inequality. Let $\alpha $, $\beta _1$, $\beta _2$, $\rho $, $\rho _1$ and $\rho _2$ be numbers.

[L1:

] The numbers $\alpha $, $\beta _1$, $\beta _2$, $\rho $, $\rho _1$ and $\rho _2$ satisfy the following inequalities:

$$\begin{aligned}&0<\alpha<1,\quad 0<\beta _1<1/2,\quad 0<\rho _1<\min \{1,\alpha (1-\alpha )^{-1},2 \beta _1(1-\alpha )^{-1}\},\\&\alpha \rho <\rho _2,\quad \beta _2\ge 0\quad \text {and}\quad 1-2\beta _2-\rho _2>0. \end{aligned}$$

Let $\beta =\alpha (1-\alpha )^{-1}$.

[L2:

] There is a positive random variable $\chi _0$ such that

$$\begin{aligned} {{\mathbb {Y}}}(\theta )\>=\>{{\mathbb {Y}}}(\theta )-{{\mathbb {Y}}}(\theta ^*)\le & {} -\chi _0|\theta -\theta ^*|^\rho \end{aligned}$$

for all $\theta \in \varTheta $.

[L3:

] There exists a $C_L$ such that

$$\begin{aligned} P\big [\chi _0\le r^{-(\rho _2-\alpha \rho )}\big ]\le & {} \frac{C_L}{r^L}\quad (r>0) \end{aligned}$$

and

$$\begin{aligned} P\big [\lambda _{\text {min}}(\varGamma )<4r^{-\rho _1}\big ]\le & {} \frac{C_L}{r^L}\quad (r>0). \end{aligned}$$

[L4:

] (i) For $M_1=L(1-\rho _1)^{-1}$, $\displaystyle \sup _{T>0}E\big [|\varDelta _T|^{M_1}\big ] <\infty . $

(ii):: For $M_2=L(1-2\beta _2-\rho _2)^{-1}$,
$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (\sup _{h:|h|\ge b_T^{-\alpha /2}} b_T^{\frac{1}{2}-\beta _2}\big |{{\mathbb {Y}}}_T(\theta ^*+h)-{{\mathbb {Y}}}(\theta ^*+h) \big |\bigg )^{M_2}\bigg ]< & {} \infty . \end{aligned}$$
(iii):: For $M_3=L(\beta -\rho _1)^{-1}$,
$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (b_T^{-1}\sup _{\theta \in \varTheta }\big | \partial _\theta ^3{{\mathbb {H}}}_T(\theta ) \big |\bigg )^{M_3}\bigg ]< & {} \infty . \end{aligned}$$
(iv):: For $M_4=L\big (2\beta _1(1-\alpha )^{-1}-\rho _1\big )^{-1}$,
$$\begin{aligned} \sup _{T>0}E\bigg [\bigg (b_T^{\beta _1} \big |\varGamma _T(\theta ^*)-\varGamma \big |\bigg )^{M_4}\bigg ]< & {} \infty . \end{aligned}$$

Let ${{\mathbb {U}}}_T=\{u\in {{\mathbb {R}}}^{\textsf {p}};\>\theta ^*+a_Tu\in \varTheta \}$ and ${{\mathbb {V}}}_T(r)=\{u\in {{\mathbb {U}}}_T;\>|u|\ge r\}$ for $r>0$.

Theorem 2

(Yoshida (2011)) Suppose that Conditions [L1]-[L4] are satisfied. Then, there exists a constant C such that

$$\begin{aligned} P\bigg [\sup _{u\in {{\mathbb {V}}}_T(r)}{{\mathbb {Z}}}_T(u)\ge \exp \big (-2^{-1}r^{2-(\rho _1\vee \rho _2)}\big ) \bigg ]\le & {} \frac{C}{r^L} \end{aligned}$$

for all $T>0$ and $r>0$. Here, the supremum of the empty set should read $-\infty $ by convention.

We comment some points. Parameters satisfying [L1] exist. Nondegeneracy conditions in [L3] are obvious in ergodic cases. In this paper, we will apply Theorem 2 under ergodicity of the stochastic system. Theorem 2 asserts a polynomial type large deviation inequality can be obtained once the boundedness of moments of some random variables is verified. Condition [L4] is easy to obtain because each variable is usually a simple additive functional. The polynomial type large deviation inequality in Theorem 2 enables us to easily apply the scheme by Ibragimov and Has’minskiĭ (1981) and Kutoyants (1984, 2012) to various dependence structures.

Let $u\in {{\mathbb {R}}}^{\textsf {p}}$. Define $r_T(u)$ $(u\in {{\mathbb {U}}}_T)$ by

$$\begin{aligned} {{\mathbb {Z}}}_T(u)= & {} \exp \bigg (\varDelta _T[u]-\frac{1}{2}\varGamma [u^{\otimes 2}]+r_T(u)\bigg ) \quad (u\in {{\mathbb {U}}}_T). \end{aligned}$$

(6.3)

It is said that ${{\mathbb {Z}}}_T$ is locally asymptotically quadratic (LAQ) at $\theta ^*$ if $r_T(u)\rightarrow ^p0$ as $T\rightarrow \infty $ for every $u\in {{\mathbb {R}}}^{\textsf {p}}$, and hence $\log {{\mathbb {Z}}}_T(u)$ is asymptotically approximated by a random quadratic function of u.

We will confine our attention to a very standard case where ${{\mathbb {Z}}}_T$ is locally asymptotically mixed normal, though the general theory of the quasi-likelihood analysis is framed more generally.

Any measurable mapping ${\hat{\theta }}_T^M:\varOmega \rightarrow {\overline{\varTheta }}$ is called a quasi-maximum likelihood estimator (QMLE) for ${{\mathbb {H}}}_T$ if

$$\begin{aligned} {{\mathbb {H}}}_T({\hat{\theta }}_T^M)= & {} \max _{\theta \in {\overline{\varTheta }}}{{\mathbb {H}}}_T(\theta ). \end{aligned}$$

When ${{\mathbb {H}}}_T$ is continuous on the compact ${\overline{\varTheta }}$, such a measurable function always exists, which is ensured by the measurable selection theorem. Let ${\hat{u}}_T^M=a_T^{-1}({\hat{\theta }}_T^M-\theta ^*)$ for the QMLE ${\hat{\theta }}_T^M$.

Theorem 3

Let $L>{\textsf {p}}>0$. Suppose that Conditions [L1]-[L4] are satisfied and that $(\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )$ as $T\rightarrow \infty $, where $\zeta $ is a ${\textsf {p}}$-dimensional standard Gaussian random vector independent of $\varGamma $. Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^M)\big ]\rightarrow & {} {{\mathbb {E}}}\big [f({\hat{u}})\big ]\quad (T\rightarrow \infty ) \end{aligned}$$

for ${\hat{u}}=\varGamma ^{-1/2}\zeta $ and for any $f\in C({{\mathbb {R}}}^{\textsf {p}})$ satisfying $\lim _{|u|\rightarrow \infty }|u|^{-p}|f(u)|<\infty $.

Proof

We will sketch the proof to convey the concepts of the quasi-likelihood analysis to the reader. See Yoshida (2011) for details. The space ${\hat{C}}({{\mathbb {R}}}^{\textsf {p}})$ is the linear space of all continuous functions $f:{{\mathbb {R}}}^{\textsf {p}}\rightarrow {{\mathbb {R}}}$ satisfying $\lim _{|u|\rightarrow \infty }f(u)=0$. The space ${\hat{C}}({{\mathbb {R}}}^{\textsf {p}})$ becomes a separable Banach space equipped with the supremum norm $\Vert f\Vert _\infty =\sup _{u\in {{\mathbb {R}}}^{\textsf {p}}}|f(u)|$. Moreover, ${\hat{C}}({{\mathbb {R}}}^{\textsf {p}})$ is regarded as a measurable space with the Borel $\sigma $-field. Let

$$\begin{aligned} {{\mathbb {Z}}}(u)= & {} \exp \bigg (\varGamma ^{1/2}\zeta [u]-\frac{1}{2}\varGamma [u^{\otimes 2}]\bigg ) \end{aligned}$$

(6.4)

for $u\in {{\mathbb {R}}}^{\textsf {p}}$.

The term $r_T(u)$ admits the expression

$$\begin{aligned} r_T(u)= & {} \int _0^1(1-s)\big \{\varGamma [u^{\otimes 2}]-\varGamma _T(\theta ^*+sa_Tu) {[}u^{\otimes 2}]\big \}ds \end{aligned}$$

(6.5)

for u such that $|u|\le b_T^{(1-\alpha )/2}$ and T such that $B(\theta ^*,b_T^{-\alpha /2})\subset \varTheta $. In this situation, we can apply Taylor’s formula even though the whole $\varTheta $ is not convex. Condition [L4] (iii) and the convergence of $\varDelta _T$ ensures tightness of the random fields $\big \{{{\mathbb {Z}}}_T|_{\overline{B(0,R)}}\big \}_{T>T_0}$ for every $R>0$, where $B(0,R)=\{u\in {{\mathbb {R}}}^{\textsf {p}}\}$ and $T_0$ is a sufficiently large number depending on R. Combining this property with the polynomial type large deviation inequality given by Theorem 2, we obtain the convergence ${{\mathbb {Z}}}_T\rightarrow {{\mathbb {Z}}}$ in ${\hat{C}}({{\mathbb {R}}}^{\textsf {p}})$ for the random field ${{\mathbb {Z}}}_T$ extended as an element of ${\hat{C}}({{\mathbb {R}}}^{\textsf {p}})$ so that $\sup _{{{\mathbb {R}}}^{\textsf {p}}\setminus {{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\le \sup _{u\in \partial {{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)$. Consequently, ${\hat{u}}_T\rightarrow {\hat{u}}=\text {argmax}_{u\in {{\mathbb {R}}}^{\textsf {p}}}{{\mathbb {Z}}}(u)$. It is known that a measurable version of extension of ${{\mathbb {Z}}}_T$ exists.

A polynomial type large deviation, even weaker than the one in Theorem 2, serves to show $L^q$-boundedness of $\{|{\hat{u}}_T|^q\}$ for $L>q>p$. Then, the family $\{{\hat{u}}_T\}$ is uniformly integrable, and hence we obtain the convergence of $E[f({\hat{u}}_T)]$. $\square $

Remark 2

In Theorem 3, if $\varDelta _T\rightarrow ^d\varGamma ^{1/2}\zeta $ $\mathcal{F}$-stably, then $(\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )$ and ${\hat{u}}^M_T\rightarrow {\hat{u}}$ $\mathcal{F}$-stably.

An advantage of the quasi-likelihood analysis is that the asymptotic behavior of the quasi-Bayesian estimator can be obtained as well as that of the quasi-maximum likelihood estimator and its moments convergence. The mapping

$$\begin{aligned} {\hat{\theta }}_T^B= & {} \bigg [\int _\varTheta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\varpi (\theta )\mathrm{d} \theta \bigg ]^{-1} \int _\varTheta \theta \exp \big ({{\mathbb {H}}}_T(\theta )\big )\varpi (\theta )\mathrm{d}\theta \end{aligned}$$

is called a quasi-Bayesian estimator (QBE) with respect to the prior density $\varpi $. The QBE ${\hat{\theta }}_T^B$ takes values in the convex-hull of ${\overline{\varTheta }}$. We will assume $\varpi $ is continuous and $0<\inf _{\theta \in \varTheta }\varpi (\theta )\le \sup _{\theta \in \varTheta } \varpi (\theta )<\infty $. We will give a concise exposition in the following among many possible ways. The reader is referred to Yoshida (2011) for further information. Recall that ${\textsf {p}}$ is the dimension of $\varTheta $, and B(R) denotes the open ball of radius R centered at the origin. $C(\overline{B(R)})$ is the space of all continuous functions on $\overline{B(R)}$, and it is equipped with the supremum norm. Recall ${{\mathbb {V}}}_T(r)=\{u\in {{\mathbb {U}}}_T;\>|u|\ge r\}$. As before, ${\hat{u}}=\varGamma ^{-1/2}\zeta $ with a ${\textsf {p}}$-dimensional standard Gaussian random vector $\zeta $ independent of $\varGamma $. Write ${\hat{u}}_T^B=a_T^{-1}({\hat{\theta }}_T^B-\theta ^*)$.

Theorem 4

Let $p\ge 1$, $L>p+1$, $D>{\textsf {p}}+p$. Suppose that $(\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )$ as $T\rightarrow \infty $, where $\zeta $ is a ${\textsf {p}}$-dimensional standard Gaussian random vector independent of $\varGamma $. Moreover, suppose the following conditions are satisfied.

(i)
For every $R>0$,
$$\begin{aligned} {{\mathbb {Z}}}_T|_{\overline{B(R)}}&\rightarrow ^d&{{\mathbb {Z}}}|_{\overline{B(R)}} \quad \text {in }C(\overline{B(R)}) \end{aligned}$$
(6.6)
as $T\rightarrow \infty $, where ${{\mathbb {Z}}}$ is given in (6.4).
(ii)
There exist positive constants $T_0$, $C_1$ and $C_2$ such that
$$\begin{aligned} P\bigg [\sup _{{{\mathbb {V}}}_T(r)}{{\mathbb {Z}}}_T\ge C_1r^{-D}\bigg ]\le & {} C_2r^{-L} \end{aligned}$$
(6.7)
for all $T\ge T_0$ and $r>0$.
(iii)
For some $T_0>0$,
$$\begin{aligned} \sup _{T\ge T_0}E\bigg [\bigg (\int _{{{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg )^{-1}\bigg ]< & {} \infty . \end{aligned}$$
(6.8)

Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^B)\big ]\rightarrow & {} E\big [f({\hat{u}})\big ] \end{aligned}$$

(6.9)

as $T\rightarrow \infty $ for any continuous function $f:{{\mathbb {R}}}^{\textsf {p}}\rightarrow {{\mathbb {R}}}$ satisfying $\sup _{u\in {{\mathbb {R}}}^{\textsf {p}}}\big \{(1+|u|)^{-p}|f(u)|\big \}<\infty $.

Proof

We will give a brief summary of the proof; see Yoshida (2011) for details. The variable ${\hat{u}}_T^B$ has the expression

$$\begin{aligned} {\hat{u}}_T^B= & {} \bigg [\int _{{{\mathbb {U}}}_T}{{\mathbb {Z}}}_T(u)\varpi (\theta ^*+a_Tu)\mathrm{d}u\bigg ]^{-1} \int _{{{\mathbb {U}}}_T} u{{\mathbb {Z}}}_T(u)\varpi (\theta ^*+a_Tu)\mathrm{d}u. \end{aligned}$$

By (6.7) and the properties of $\varpi $, we can approximate ${\hat{u}}_T^B$ by

$$\begin{aligned} {\tilde{u}}_T= & {} \bigg [\int _{B(R)}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg ]^{-1} \int _{B(R)} u{{\mathbb {Z}}}_T(u)\mathrm{d}u \end{aligned}$$

for paying small error when R is large. By (6.6),

$$\begin{aligned} {\tilde{u}}_T&\rightarrow ^d&\bigg [\int _{B(R)}{{\mathbb {Z}}}(u)\mathrm{d}u\bigg ]^{-1} \int _{B(R)} u{{\mathbb {Z}}}(u)\mathrm{d}u=:{\hat{u}}(R). \end{aligned}$$

The random field ${{\mathbb {Z}}}$ inherits a tail estimate from (6.7), and hence ${\hat{u}}(R)$ is approximated by

$$\begin{aligned} \bigg [\int _{{{\mathbb {R}}}^{\textsf {p}}}{{\mathbb {Z}}}(u)\mathrm{d}u\bigg ]^{-1}\int _{{{\mathbb {R}}}^{\textsf {p}}} u{{\mathbb {Z}}}(u)\mathrm{d}u \>=\>\varGamma ^{-1/2}\zeta \>=\>{\hat{u}}. \end{aligned}$$

Combining these estimates, we can conclude ${\hat{u}}_T^B\rightarrow ^d{\hat{u}}$ as $T\rightarrow \infty $. Convergence of the expectation is a consequence of uniform integrability of $|{\hat{u}}_T^B|^p$ ensured by (6.7). $\square $

Remark 3

(a) It is possible to relax the conditions of Theorem 4 to only ensure the convergence ${\hat{u}}^B_T\rightarrow {\hat{u}}$. (b) In Theorem 4, if $\varDelta _T\rightarrow ^d\varGamma ^{1/2}\zeta $ $\mathcal{F}$-stably, then ${\hat{u}}^B_T\rightarrow {\hat{u}}$ $\mathcal{F}$-stably. (c) Usually, the condition (iii) of Theorem 4 is easily verified; See Lemma 2 of Yoshida (2011). (d) We refer the reader to Yoshida (2021) for a simplified quasi-likelihood analysis for a locally asymptotically quadratic random field.

The following result follows from Theorem 4.

Theorem 5

Let $p>{\textsf {p}}$ and

$$\begin{aligned} L>\max \bigg \{p+1,{\textsf {p}}(\beta -\rho _1),{\textsf {p}}(2\beta _1(1-\alpha )^{-1} -\rho _1)\bigg \}. \end{aligned}$$

Suppose that Conditions [L1]-[L4] are satisfied and that $E[|\varGamma |^p]<\infty $. $(\varDelta _T,\varGamma )\rightarrow ^d(\varGamma ^{1/2}\zeta ,\varGamma )$ as $T\rightarrow \infty $, where $\zeta $ is a ${\textsf {p}}$-dimensional standard Gaussian random vector independent of $\varGamma $. Then,

$$\begin{aligned} E\big [f({\hat{u}}_T^B)\big ]\rightarrow & {} {{\mathbb {E}}}\big [f({\hat{u}})\big ]\quad (T\rightarrow \infty ) \end{aligned}$$

for ${\hat{u}}=\varGamma ^{-1/2}\zeta $ and for any $f\in C({{\mathbb {R}}}^{\textsf {p}})$ satisfying $\lim _{|u|\rightarrow \infty }|u|^{-p}|f(u)|<\infty $.

Proof

The convergence (6.6) holds, as shown in the proof of Theorem 3. The polynomial type large deviation inequality (6.7) is a consequence of Theorem 2; the number D is arbitrary. Fix $\delta >0$. Then, there exists $T_0>0$ such that $B(\delta )\subset \varTheta $. In particular, $r_T(u)$ admits the representation (6.5) for all $u\in B(\delta )$. Since $M_3=L(\beta -\rho _1)^{-1}>{\textsf {p}}$, $M_4=L(2\beta _1(1-\alpha )^{-1}-\rho _1)^{-1}>{\textsf {p}}$ and $p>{\textsf {p}}$, we have $p':=\min \{M_3,M_4,p\}>{\textsf {p}}$ and

$$\begin{aligned} E[|r_T(u)|^{p'}]\le & {} C_0|u|^{p'}\quad (u\in B(\delta )) \end{aligned}$$

for some constant $C_0$. Then Lemma 2 of Yoshida (2011) gives the estimate

$$\begin{aligned} E\bigg [\bigg (\int _{B(\delta )}{{\mathbb {Z}}}_T(u)\mathrm{d}u\bigg )^{-1}\bigg ]\le & {} C_1 \end{aligned}$$

by a constant $C_1$ depending on $(p',{\textsf {p}},\delta ,C_0)$ and the supremums appearing in [L4](i),(iii),(iv), but $C_1$ is independent of $T\ge T_0$. Therefore (6.8) holds true. Thus, we can apply Theorem 4 to conclude the proof. $\square $

Table 3 List of stocks investigated in this paper. Sample consists of the whole year 2015, representing roughly 230 trading days for all stocks except LAGA.PA and PEUP.PA which are missing roughly 70 trading days

Full size table

7 List of stocks

Table 3 lists all the stocks investigated in the paper. For each stock, the total number of days available in the sample is given. Note that for lack of usage time allotment on the computational resources used for this paper, some trading days for few very liquid stocks were not used for some of the marked ratio models tested in Sect. 4.4. In this case, only the trading days where all models have been computed have been used. This is the last column of the table.

8 QAIC and QBIC selection: detailed results

See Tables 4, 5, 6.

Table 4 Side determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks

Full size table

Table 5 Bid aggressiveness determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks.

Full size table

Table 6 Ask aggressiveness determination—Frequency of QAIC and QBIC selection (and their difference) of each tested model, averaged across stocks

Full size table

Notes

When characterizing a market order, we use indistinctly the terms side (bid/ask) or sign $(-1,+1)$ to indicate if a transaction occurs at the best bid or best ask price of the limit order book.
Originally, ${\hat{\theta }}_T^M$ is defined on a sample space $\mathsf{S}_T$ expressing all the possible outcomes of $(\lambda _0(t),X_j(t),Y^i_{j_i}(t);\>t\in [0,T],i\in {{\mathbb {I}}}, j\in \mathcal{J}, j_i\in {{\mathbb {J}}}_i)$. If $(\varOmega ,\mathcal{F},P)$ is an abstract space used for defining the true probability measure $P^*_T$ on $\mathsf{S}_T$ by some random variable $V_T:\varOmega \rightarrow \mathsf{S}_T$ (i.e. $P^*_T=PV_T^{-1}$), then treating ${\hat{\theta }}_T^M$ as a function on $\varOmega $ conflicts with the definition of ${\hat{\theta }}_T^M$. However, what we want to investigate is concerning the distribution of ${\hat{\theta }}_T^M$ (defined on $S_T$) under $P^*_T$, and then we can pull back ${\hat{\theta }}_T^M$ on $S_T$ to $\varOmega $ by $V_T$ if $P^*_T=PV_T^{-1}$. For this reason, we can identify ${\hat{\theta }}_T^M$ with ${\hat{\theta }}_T^M\circ V_T$, and may regard ${\hat{\theta }}_T^M$ as defined on $\varOmega $. This remark makes sense especially when one treats a weak solution of a stochastic differential equation for a covariate.
In these figures, we use for QBIC the proxy $a_T=\log (n_T)$, where $n_T$ is the number of events in the sample of length T, but results are actually very similar if we simply set $a_T=\log (T)$.

References

Abergel, F., & Jedidi, A. (2015). Long-time behavior of a Hawkes process-based limit order book. SIAM Journal on Financial Mathematics, 6(1), 1026–1043.
Article MathSciNet Google Scholar
Abergel, F., Anane, M., Chakraborti, A., Jedidi, A., & Muni Toke, I. (2016). Limit order books. : Cambridge University Press.
Bacry, E., Dayri, K., & Muzy, J. F. (2012). Non-parametric kernel estimation for symmetric hawkes processes. Application to high frequency financial data. The European Physical Journal B-Condensed Matter and Complex Systems, 85(5), 1–12.
Article Google Scholar
Bacry, E., Delattre, S., Hoffmann, M., & Muzy, J. F. (2013). Modelling microstructure noise with mutually exciting point processes. Quantitative Finance, 13(1), 65–77.
Article MathSciNet Google Scholar
Biais, B., Hillion, P., & Spatt, C. (1995). An empirical analysis of the limit order book and the order flow in the Paris bourse. The Journal of Finance, 50(5), 1655–1689.
Article Google Scholar
Bowsher, C. G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models. Journal of Econometrics, 141, 876–912.
Article MathSciNet Google Scholar
Brémaud, P., & Massoulié L (1996) Stability of nonlinear Hawkes processes. The Annals of Probability, 24(3), 1563–1588.
Chakraborti, A., Muni Toke, I., Patriarca, M., & Abergel, F. (2011). Econophysics review: I. Empirical facts. Quantitative Finance, 11(7), 991–1012.
Article MathSciNet Google Scholar
Clinet, S., & Yoshida, N. (2017). Statistical inference for ergodic point processes and application to limit order book. Stochastic Processes and their Applications, 127(6), 1800–1839.
Article MathSciNet Google Scholar
Eguchi, S., & Masuda, H. (2018). Schwarz type model comparison for LAQ models. Bernoulli, 24(3), 2278–2327.
Article MathSciNet Google Scholar
Eisler, Z., Bouchaud, J. P., & Kockelkoren, J. (2012). The price impact of order book events: Market orders, limit orders and cancellations. Quantitative Finance, 12(9), 1395–1419.
Article MathSciNet Google Scholar
Harris, L., & Hasbrouck, J. (1996). Market vs. limit orders: the superdot evidence on order submission strategy. Journal of Financial and Quantitative analysis, 31(2), 213–231.
Article Google Scholar
Hautsch, N. (2011). Modelling irregularly spaced financial data: Theory and practice of dynamic duration models, vol 539. Springer.
Ibragimov, I.A., & Has’minskiĭ, R. Z. (1981). Statistical estimation, Applications of Mathematics, vol 16. Springer-Verlag, New York, asymptotic theory, Translated from the Russian by Samuel Kotz
Kutoyants, Y. A. (1984). Parameter estimation for stochastic processes, vol 6. Heldermann
Kutoyants, Y. A. (2012). Statistical inference for spatial Poisson processes (Vol. 134). : Springer Science & Business Media.
Lallouache, M., & Challet, D. (2016). The limits of statistical significance of hawkes processes fitted to financial data. Quantitative Finance, 16(1), 1–11.
Article MathSciNet Google Scholar
Large, J. (2007). Measuring the resiliency of an electronic limit order book. Journal of Financial Markets, 10(1), 1–25.
Article Google Scholar
Lipton., A., Pesavento, U., Sotiropoulos, M.G. (2013). Trade arrival dynamics and quote imbalance in a limit order book. arXiv:1312.0514
Lu, X., & Abergel, F. (2018). High-dimensional hawkes processes for limit order books: modelling, empirical analysis and numerical calibration. Quantitative Finance, 18(2), 249–264.
Article MathSciNet Google Scholar
Morariu-Patrichi, M., Pakkanen, M. S. (2018). State-dependent hawkes processes and their application to limit order book modelling. arXiv:1809.08060
Muni Toke, I. (2016). Reconstruction of order flows using aggregated data. Market Microstructure and Liquidity, 02(02), 1650007. https://doi.org/10.1142/S2382626616500076.
Article Google Scholar
Muni Toke, I., & Pomponio, F. (2012). Modelling trades-through in a limited order book using Hawkes processes. Economics -Journal, 6, 22.
Google Scholar
Muni Toke, I., & Yoshida, N. (2017). Modelling intensities of order flows in a limit order book. Quantitative Finance, 17(5), 683–701.
Article MathSciNet Google Scholar
Muni Toke, I., & Yoshida, N. (2020). Analyzing order flows in limit order books with ratios of cox-type intensities. Quantitative Finance, 20(1), 1–18.
Article MathSciNet Google Scholar
Rambaldi, M., Bacry, E., & Lillo, F. (2017). The role of volume in order book dynamics: a multivariate hawkes process analysis. Quantitative Finance, 17(7), 999–1020.
Article MathSciNet Google Scholar
Rio, E. (2017). Asymptotic theory of weakly dependent random processes, (Vol. 80). : Springer.
Yoshida, N. (2011). Polynomial type large deviation inequalities and quasi-likelihood analysis for stochastic differential equations. Annals of the Institute of Statistical Mathematics, 63(3), 431–479.
Article MathSciNet Google Scholar
Yoshida, N. (2021). Simplified quasi-likelihood analysis for a locally asymptotically quadratic random field. arXiv:2102.12460

Download references

Acknowledgements

The authors thank the reviewers for their careful reading and valuable comments to the paper.

Author information

Authors and Affiliations

Université Paris-Saclay, CentraleSupélec, Mathématiques et Informatique pour la Complexité et les Systèmes, France, Bâtiment Bouygues, 3 Rue Joliot Curie, 91190, Gif-sur-Yvette, France
Ioane Muni Toke
Graduate School of Mathematical Sciences, University of Tokyo, Japan, 3-8-1 Komaba, Meguro-ku, Tokyo, 153-8914, Japan
Nakahiro Yoshida

Authors

Ioane Muni Toke
View author publications
You can also search for this author in PubMed Google Scholar
Nakahiro Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nakahiro Yoshida.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was in part supported by Japan Science and Technology Agency CREST JPMJCR14D7; Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research No. 17H01702 (Scientific Research) and by a Cooperative Research Program of the Institute of Statistical Mathematics.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Muni Toke, I., Yoshida, N. Marked point processes and intensity ratios for limit order book modeling. Jpn J Stat Data Sci 5, 1–39 (2022). https://doi.org/10.1007/s42081-021-00137-9

Download citation

Received: 02 October 2020
Revised: 21 June 2021
Accepted: 05 July 2021
Published: 01 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s42081-021-00137-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Marked point processes and intensity ratios for limit order book modeling

Abstract

Similar content being viewed by others

Notes on large deviations for branching processes indexed by a Poisson process

Nondecreasing Continuous Semi-Markov Processes: Asymptotics and Asymmetry

Orientation

1 Introduction

2 Marked process models as two-step ratio models

Remark 1

3 Quasi-likelihood estimation of two-step ratio model

3.1 Quasi-maximum likelihood estimator and quasi-Bayesian estimator

3.2 Quasi-likelihood analysis

Theorem 1

Example 1

4 Modeling and predicting sign and aggressiveness of market orders

4.1 Intensities of the processes counting market orders

4.2 Limit order book data

4.3 Estimation procedure of the two-step ratio model

4.4 In-sample model selection with QAIC and QBIC

4.5 Out-of-sample prediction performance

5 Proof of Theorem 1

5.1 Score functions and a central limit theorem

5.2 Condition [L4]

5.3 Conditions [L2] and [L3]

5.4 Proof of Theorem 1

6 Quasi-likelihood analysis

Theorem 2

Theorem 3

Proof

Remark 2

Theorem 4

Proof

Remark 3

Theorem 5

Proof

7 List of stocks

8 QAIC and QBIC selection: detailed results

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation