1 Introduction

Market microstructure of financial markets has been studied quantitatively and empirically in econophysics [1,2,3,4]. Econophysicists propose various dynamical models, such as at the limit order book level (e.g., the Santa Fe model [5,6,7], the \(\epsilon \)-intelligence model [8], and the latent order book model [9]) and the individual-traders level (e.g., the dealer model [10,11,12,13,14]) with the hope that the statistical physics program is useful even for financial modelling. This paper focuses on a microscopic model of market order submissions proposed by Lillo, Mike, and Farmer (LMF) in 2005 [15], which was hypothetically based on the order-splitting behaviour of individual traders.

The LMF model is a stylised dynamical model to explain the persistence of market-order flows. The market order is a trading option to immediately buy or sell the stock at the best prices. The market order sign \(\epsilon _t\) is defined as \(\epsilon _t=+1\) (\(\epsilon _t=-1\)) for the buy (sell) market order at time t in the following. In financial data analyses, the binary order-sign sequence of market-order flows is known to be predictable for a long time: the autocorrelation function (ACF) of the order-sign sequence obeys the slow decay characterised by the power law, such that

$$\begin{aligned} C_\tau := \langle \epsilon _t\epsilon _{t+\tau }\rangle \simeq c_0 \tau ^{-\gamma } \>\>\> \text{ for } \text{ large } \tau , \>\>\> \gamma \in (0,1). \end{aligned}$$
(1)

Here the empirical average of any stochastic variable A is denoted by \(\langle A\rangle \), \(c_0\) is the ACF prefactor, and \(\gamma \) is the ACF power-law exponent. This slow decay is called the long-range correlation (LRC) of the order flows, and its origin has been under debate in econophysics and market microstructure for a long time [3]. For example, some researchers state that the LRC is a consequence of herding among traders [16,17,18]. However, from the viewpoint of empirical support, the current most promising microscopic hypothesis is the order-splitting hypothesis stating that the LRC originates from the order-splitting behaviour of institutional investors. The LMF model is based on this order-splitting hypothesis in describing the LRC from the microscopic dynamics in the spirit of the statistical-physics programs.

The order-splitting hypothesis states that there are traders who split large metaorders into a long sequence of child orders. Because all the child orders share the same sign for a while, the LRC naturally appears in this scenario. The LMF model is a simple stochastic model implementing this order-splitting picture. In the original article [15], they made the following assumptions:

  • There are M traders in the financial markets. M is a time constant (i.e., a closed system).

  • All traders are order-splitting traders characterised by the identical microscopic parameters: i.e., the homogeneity in the trading strategy is assumed across all the agents.

  • The distribution of metaorder length L is given by the discrete Pareto distribution (\(L=1,2,...\)), such that

    $$\begin{aligned} \rho (L) \simeq \alpha L^{-\alpha -1} \>\>\> \text{ with } \>\>\alpha \in (1,2). \end{aligned}$$
    (2)
  • They randomly submit market orders with the same order-submission probability.

While this microscopic dynamics is described as a Markovian stochastic process (whose dimension is \(2M+1\); see Sec 2.2), LMF solved this model to study the LRC in the ACF as its macroscopic dynamical behaviour with heuristic but reasonable approximations. They finally showed that the ACF asymptotically obeys the LRC asymptotics (1), and the power-law exponent \(\gamma \) and the prefactor \(c_0\) are given by

$$\begin{aligned} \gamma&= \alpha -1, \end{aligned}$$
(3a)
$$\begin{aligned} c_0&= \frac{1}{\alpha M^{2-\alpha }}. \end{aligned}$$
(3b)

They also numerically showed that the power-law exponent formula (3a) robustly works even for an open-system version, where the total number of the traders M fluctuates in time. Since the predictive formula (3) connects the quantitative relationship between the macroscopic LRC phenomenon and the microscopic parameters, the LMF theory belongs to typical statistical physics programs and is exceptionally appealing to econophysicists theoretically.

Several empirical studies support both the order-splitting hypothesis and the LMF model. While the original LMF paper could not establish their prediction (3a) at a quantitative levelFootnote 1 due to the data unavailability of high-quality microscopic datasets in 2005, Refs. [19, 20] showed that the assumption of the power-law metaorder size distribution is plausable by real datasets. In addition, Tóth et al. showed very convincing qualitative evidence in 2015 that the order-splitting is the main cause of the LRC by decomposing the total ACF [21]. Furthermore, Sato and Kanazawa showed crucial evidence in 2023 that the LMF prediction (3a) precisely works well even at a quantitative level [22, 23] using a large microscopic dataset of the Tokyo Stock Exchange (TSE) market.

While the LMF model well-describes the power-law decay of the ACF, its predictive power is expected to be limited regarding the prefactor \(c_0\), because the LMF model was historically proposed to characterise the power-law exponent \(\gamma \) but not prefactor \(c_0\). Indeed, we noticed that heterogeneity of order-splitting strategies is present during the data analyses for Ref. [22, 23] and that such heterogeneity can theoretically impact the prefactor \(c_0\), while the power-law formula (3a) robustly holds. Given the recent breakthrough in data analyses, we believe the classical LMF theory can be updated to describe the prefactor \(c_0\) better by taking into account the heterogeneity in trading strategies toward precise data calibration.

In this report, we propose a generalised LMF model by incorporating heterogeneity of order-splitting strategies. In addition, we solve the generalised LMF model exactly to show the following two characters: (i) The power-law exponent formula (3a) robustly holds true even in the presence of heterogeneous order-submission probability distributions. (ii) The prefactor formula (3b) is replaced with a new formula that is sensitive to the order-submission probability distribution. (iii) Furthermore, the classical prefactor formula (3b) systematically underestimates the actual prefactor in the presence of heterogeneity in agents. Our results imply that while the interpretation of the ACF power-law exponent is robust and straightforward, the interpretation of the ACF prefactor needs more careful investigation for data calibrations.

This report is organised as follows. Section 2 describes our model and mathematical notation with the assumption of the closed system. We show the exact solution for the generalised LMF model in Sect. 3. In Sect. 4, we study several specific but important cases with numerical verifications. Sects. 5 and 6 discuss the implication of our heterogeneous LMF formulas for realistic data calibration. We conclude our paper with some remarks in Sect. 7. At the end of this report, ten appendices follow the main text for its supplements.

2 Model

In this section, let us define the stochastic dynamics of our generalised LMF model.

2.1 Mathematical Notation

In this report, the probability density function (PDF) of a stochastic variable A is written as P(A). If the stochastic variable explicitly depends on time t, such that \(A_t\), the PDF of \(A_t\) is denoted by \(P_t(A)\). We note that any PDF must satisfy the normalisation condition \(\sum _A P(A) = 1\). We also define the cumulative distribution function (CDF) and the complementary cumulative distribution function (CCDF) by

$$\begin{aligned} P_<(A) := \sum _{A'< A} P(A'), \>\>\> P_{\ge }(A) := \sum _{A'\ge A} P(A') = 1 - \sum _{A' < A}P(A'), \end{aligned}$$
(4)

respectively. The stationary PDF and the stationary ensemble average are respectively defined by

$$\begin{aligned} P_{\textrm{st}} (A) := \lim _{t\rightarrow \infty }P_t(A), \>\>\> \langle A\rangle _{\textrm{st}} := \lim _{t\rightarrow \infty }\langle A_t\rangle = \sum _A AP_{\textrm{st}}(A), \end{aligned}$$
(5)

if \(P_{\textrm{st}} (A)\) and \(\langle A\rangle _{\textrm{st}}\) exist. Also, under the condition B, the conditional PDF and conditional average of A are respectively defined by

$$\begin{aligned} P(A|B) := \frac{P(A,B)}{P(B)}, \>\>\> \langle A | B\rangle = \sum _{A'} A' P(A'|B). \end{aligned}$$
(6)

2.2 Model Parameters and Variables

Fig. 1
figure 1

Schematic of our generalised LMF model. The total number of the traders is \(M:=|{\Omega }|\), which is a time constant positive integer. Any trader i is characterised by the order-submission probability \(\lambda ^{(i)}\) and the run-length (metaorder-length) distribution \(\rho ^{(i)}(L)\). Here, a run length is defined by the number of successively same order signs (e.g., \(L=3\) for \(+++\)), and is called the metaorder length in this paper. The intensities and the metaorder-length distribution must satisfy the normalisation conditions \(\sum _{i'\in {\Omega }}\lambda ^{(i')}=1\) and \(\sum _{L=1}^\infty \rho ^{(i)}(L)=1\) for any \(i\in {\Omega }\). At each timestep, a trader \(\mathfrak {i}_{t}\) is randomly selected according to the probability distribution \(\{\lambda ^{(i)}\}_{i\in {\Omega }}\) (i.e., the discrete-time Poisson process), and then submits a market order

\({\Omega }\) denotes the set of all the traders, and the system is assumed to be closed, such that \(M:= |{\Omega }|=\textrm{const}\) (see Fig. 1). \(|{\Omega }|\) is a positive integer, and the traders set \({\Omega }\) can be written as

$$\begin{aligned} {\Omega } = \{1,2,...,M\} \end{aligned}$$
(7)

without loss of generality. We incorporate the heterogeneity of trading strategies into our model, and the characteristic parameters of the ith trader are given by the order-submission probability \(\lambda ^{(i)}\) and metaorder-length (or run-length) distribution \(\rho ^{(i)}(L)\). For simplicity, we assume that the executed volume size is always the minimum unit of transactions. In other words, our model is completely characterised by the following parameter set

$$\begin{aligned} \mathcal {P} := \left( M, \{\lambda ^{(i)}\}_{i\in {\Omega }}, \{\rho ^{(i)}(L)\}_{i\in {\Omega }}\right) , \end{aligned}$$
(8)

where the submission probability and the metaorder-length distribution satisfy the normalisation of the probability

$$\begin{aligned} \sum _{i'\in {\Omega }} \lambda ^{(i')}=1, \>\>\> \sum _{L=1}^{\infty } \rho ^{(i)}(L)=1 \end{aligned}$$
(9)

for any \(i \in {\Omega }\).

We next define the state variable of the ith trader. The trader i has two state variables \(\epsilon ^{(i)}_t\) and \(R^{(i)}_t\), representing the order-sign of the metaorder (\(\epsilon ^{(i)}_t=+1\) denotes buy and \(\epsilon ^{(i)}_t=-1\) denotes sell) and the remaining metaorder length, respectively. The order sign of the whole market \(\epsilon _t\) is defined by \(\epsilon _t:=\epsilon _t^{(\mathfrak {i}_t)}\), where \(\mathfrak {i}_t\) is the trader identifier (ID) who submits the market order at time t. Thus, this system is specified by the point in the phase space

$$\begin{aligned} X_t: = \left( \epsilon _t; \epsilon _t^{(1)}, R_t^{(1)}; \dots ; \epsilon _t^{(M)}, R_t^{(M)}\right) \end{aligned}$$
(10)

and is designed as a Markovian stochastic process with dimension \(2M+1\).

2.3 Stochastic Dynamics

We next proceed with the definition of the stochastic dynamics. Let \(\mathfrak {i}_t\) be the stochastic variable representing the trader identifier who submits the market order at time t, such that \(\mathfrak {i}_{t} \in {\Omega }\). We assume that \(\mathfrak {i}_{t+1}\) obeys the PDF \(\{\lambda ^{(\mathfrak {i})}\}_{\mathfrak {i}\in {\Omega }}\). In other words, the probability \(\mathfrak {i}_{t+1}\) is given by

$$\begin{aligned} P_{t+1}(\mathfrak {i}) = \lambda ^{(\mathfrak {i})} \end{aligned}$$
(11a)

as an independent and identically distributed (IID) sequence \(\{\mathfrak {i}_t\}_{t}\). After the execution by the trader \(\mathfrak {i}_{t+1}\), the remaining volume \(R^{(\mathfrak {i}_{t+1})}_{t+1}\) decreases by one if \(R^{(\mathfrak {i}_{t+1})}_{t} > 1\). If all the metaorder is executed at time \(t+1\) (i.e., \(R^{(\mathfrak {i}_{t+1})}_t=1\)), the metaorder length and its sign are randomly reset for the trader \(\mathfrak {i}_{t+1}\). In summary, the dynamics of \(X_t\) is given as follows for all \(i \in {\Omega }\) (see Fig. 1 for a schematic):

$$\begin{aligned} R^{(i)}_{t+1}&= {\left\{ \begin{array}{ll} R^{(i)}_t &{} \text{ if } i \ne \mathfrak {i}_{t+1} \\ R^{(i)}_t - 1 &{} \text{ if } i = \mathfrak {i}_{t+1} \text{ and } R^{(i)}_{t}>1 \\ L &{} \text{ if } i= \mathfrak {i}_{t+1} \text{ and } R^{(i)}_t=1;\,L \text{ obeys } \rho ^{(i)}(L) \end{array}\right. }, \end{aligned}$$
(11b)
$$\begin{aligned} \epsilon ^{(i)}_{t+1}&= {\left\{ \begin{array}{ll} \epsilon ^{(i)}_t &{} \text{ if } i \ne \mathfrak {i}_{t+1} \text{ or } R^{(i)}_{t}>1 \\ +1 &{} \text{ with } \text{ prob. } 1/2, \text{ if } i = \mathfrak {i}_{t+1} \text{ and } R^{(i)}_{t}=1 \\ -1 &{} \text{ with } \text{ prob. } 1/2, \text{ if } i = \mathfrak {i}_{t+1} \text{ and } R^{(i)}_{t}=1 \end{array}\right. }, \end{aligned}$$
(11c)
$$\begin{aligned} \epsilon _{t+1}&= \epsilon _{t}^{(\mathfrak {i}_{t+1})}. \end{aligned}$$
(11d)

Here the metaorder length is replenished according to the PDF \(\{\rho ^{(i)}(L)\}_L\) when the previous metaorder is terminated (i.e., if \(R_t^{(\mathfrak {i}_{t+1})}=1\)).

2.4 Relationship with the Original LMF Model

Our model is a natural generalisation of the original LMF model to include the heterogeneity of the order-splitting behaviour. Indeed, our model reduces to the original LMF model by setting the parameter \(\mathcal {P}\) as

$$\begin{aligned} \lambda ^{(i)} = \frac{1}{M}, \>\>\> \rho ^{(i)}(L) = \rho (L) \>\>\> \text{ for } \text{ all } i \in {\Omega } \end{aligned}$$
(12)

by removing the heterogeneity in the order-splitting strategies.

3 Exact Solutions

In this section, we derive the exact solutions to our generalised LMF model. Particularly, we are interested in the order-sign autocorrelation function (ACF) in the stationary state:

$$\begin{aligned} C_\tau := \lim _{t\rightarrow \infty }\langle \epsilon _t\epsilon _{t+\tau }\rangle = \langle \epsilon _1\epsilon _{\tau +1}\rangle _{\textrm{st}}. \end{aligned}$$
(13)
Fig. 2
figure 2

Schematic of the ACF decomposition for the case with \(i=1\) and \(R_{t=0}^{(i=1)}=10\). The issuer of the market orders at \(t=1\) and \(t=\tau +1\) is the same, such that \(\mathfrak {i}_1=\mathfrak {i}_{\tau +1}=i=1\). Since \(R_{t=\tau }^{(i=1)}=R_{t=0}^{(i=1)}-N_{t=\tau }^{(i=1)}=3\ge 1\), both orders at \(t=1\) and \(t=\tau +1\) belong to the same metaorder, and, thus, the condition of \({u_{1,t}}=1\) is met

3.1 Preliminary Calculation

Before deriving the explicit formula of the exact ACF, we make a transformation of the definition of the ACF. Let us introduce a flag variable \(u_{t_s,t_e}\) satisfying \(u_{t_s,t_e}=1\) if the metaorder executed at time \(t=t_e\) belongs to the same metaorder executed at time \(t=t_s\) or otherwise \(u_{t_s,t_e}=0\). Let us introduce the conditioning on \(u_{1,\tau +1}\), \(\mathfrak {i}_{\tau +1}\), and \(\mathfrak {i}_1\), to decompose the ACF as

$$\begin{aligned} C_\tau = \sum _{u'\in \{0,1\}}\sum _{i\in {\Omega }}\sum _{j\in {\Omega }}\langle \epsilon _1\epsilon _{\tau +1} | {u_{1,\tau +1}=u'}, \mathfrak {i}_{\tau +1}=i, \mathfrak {i}_1=j \rangle _{\textrm{st}}P({u_{1,\tau +1}=u'}, \mathfrak {i}_{\tau +1}=i, \mathfrak {i}_1=j). \end{aligned}$$
(14)

See Fig. 2 for a schematic of this decomposition. By construction, there is no correlation between the order signs belonging to two different metaorder. On the other hand, the order signs between the same metaorder are perfectly correlated. We thus obtain

$$\begin{aligned} \langle \epsilon _1\epsilon _{\tau +1} | {u_{1,\tau +1}=u'}, \mathfrak {i}_{\tau +1}=i, \mathfrak {i}_1=j \rangle _{\textrm{st}}=\delta _{u',1}. \end{aligned}$$
(15)

In addition, \(\mathfrak {i}_{\tau +1}\), and \(\mathfrak {i}_1\) are independently generated, and

$$\begin{aligned} P({u_{1,\tau +1}}=1, \mathfrak {i}_{\tau +1}=i, \mathfrak {i}_1=j)&= P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1}=i, \mathfrak {i}_1=j)P(\mathfrak {i}_{\tau +1}=i)P(\mathfrak {i}_1=j) \nonumber \\ {}&= P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1}=\mathfrak {i}_1=i)\left( \lambda ^{(i)}\right) ^2\delta _{i,j}. \end{aligned}$$
(16)

We obtain

$$\begin{aligned} C_\tau = \sum _{i\in {\Omega }} \left( \lambda ^{(i)}\right) ^2P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i). \end{aligned}$$
(17)

We next introduce the conditioning on \(R_{t=0}^{(i)}\) as

$$\begin{aligned} P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i) = \sum _{R^{(i)}_0=2}^\infty P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i, R_0^{(i)})P_{\textrm{st}}(R_0^{(i)}), \end{aligned}$$
(18)

where we use the identityFootnote 2\(P(A|B)=\sum _C P(A|B,C)P(C|B)\), and the relationships \(P_\textrm{st}(R_0^{(i)} | \mathfrak {i}_{\tau +1}=\mathfrak {i}_1=i)=P_{\textrm{st}}(R_0^{(i)})\), and \(P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i, R_0^{(i)}=1)=0\).

The term \(P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i, R_0^{(i)})\) is directly related with the survival probability of a metaorder whose initial volumes is \(R_0^{(i)}\). Indeed, by defining \(N^{(i)}_{\tau }\) as the total number of the metaorder executions by the trader i during \([1,\tau ]\), the condition of \({u_{1,\tau +1}}=1\) is equal to \(R_0^{(i)}-N_{\tau }^{(i)}\ge 1\) (see Fig. 2), or equivalently,

$$\begin{aligned} P({u_{1,\tau +1}}=1 | \mathfrak {i}_{\tau +1} = \mathfrak {i}_1=i, R_0^{(i)}) = P(N^{(i)}_{\tau } \le R_0^{(i)}-1). \end{aligned}$$
(22)

In other words, this is the probability that the discrete-time Poisson counting process \(N^{(i)}_{\tau }\) remains within the range \([1,R_0^{(i)}-1]\) during the time interval \([1,\tau ]\) with the initial condition \(N_{t=1}^{(i)}=1\).

Thus, we can exactly decompose the total ACF as

figure a

The explicit formulas for \(P_{\textrm{st}}(R_0^{(i)})\) and \(P(N^{(i)}_{\tau } \le R_0^{(i)}-1)\) can be derived using the master equation approach. Regarding \(P_{\textrm{st}}(R_0^{(i)})\), we obtain the following formulas (see Appendix A):

$$\begin{aligned} P_{\textrm{st}}(R_0^{(i)}) = c_R^{(i)} \rho _{\ge }^{(i)}(R^{(i)}),\>\>\> \rho ^{(i)}_{\ge }(L) := \sum _{L'=L}^\infty \rho ^{(i)}(L'), \>\>\> c_R^{(i)} = P_{\textrm{st}}(1):=\frac{1}{\sum _{L=1}^\infty \rho _{\ge }^{(i)}(L)}=\frac{1}{L_{\textrm{avg}}} \end{aligned}$$
(23)

with \(L_{\textrm{avg}}:=\sum _{L'=1}^\infty L'\rho ^{(i)}(L')\). The metaorder survival probability \(P(N^{(i)}_{\tau } \le R_0^{(i)}-1)\) is given by the sum of the binomial distribution (see Appendix B):

$$\begin{aligned} P_t(N^{(i)}) = {\left\{ \begin{array}{ll} \mathcal {B}_{t-1,\lambda ^{(i)}}(N^{(i)}-1) &{} (\text{ for } N^{(i)} \in [1, t]) \\ 0 &{} (\text{ for } N^{(i)} \not \in [1, t]) \end{array}\right. }, \>\>\> \mathcal {B}_{t,\lambda }(x) := \frac{t!}{x!(t-x)!} \lambda ^{x} \left( 1-\lambda \right) ^{t-x}. \end{aligned}$$
(19)

In summary, we obtain the exact order-sign ACF formula in an explicit form as

figure b

3.2 Remark on the Original Derivation

Let us focus on the homogeneous case \(\lambda ^{(i)}=\lambda =1/M\) for all \(i\in {\Omega }\). In the original LMF argument, they heuristically estimated that the original metaorder length at \(\tau =1\) should obey the PDF

$$\begin{aligned} Q(L) = \frac{L\rho (L)}{\sum _{L=1}^\infty L\rho (L)} \end{aligned}$$
(25)

because a longer metaorder is likely to be observed with a higher probability. Furthermore, they assumed that the remaining metaorder length \(R_{\tau =1}^{(i)}\) is uniformly distributed within [1, L]. On these heuristic but reasonable assumptions, they estimated the order-sign ACF as

$$\begin{aligned} C_{\tau }^\textrm{LMF} = \frac{1}{L_{\textrm{avg}}} \sum _{L=1}^\infty \sum _{j=1}^{L-2}\sum _{h=0}^j \rho (L) \frac{(\tau -1)!}{h!(\tau -1-h)!}\lambda ^{h+1}(1-\lambda )^{\tau -1-h}, \>\>\> L_{\textrm{avg}}:= \sum _{L=1}^\infty L\rho (L). \end{aligned}$$
(26)

Our derivation is essentially similar to the original LMF argument. However, our derivation is more systematic and rigorous than the original LMF derivation in the sense that ours is based on the master equations without heuristic arguments.

Indeed, their heuristic formula is equivalent to ours except for a minor typo as follows: By switching the order of the sums between between L and j, we obtain

$$\begin{aligned} C_{\tau }^\textrm{LMF}&= \frac{1}{L_{\textrm{avg}}} \sum _{j=1}^{\infty }\sum _{h=0}^j\sum _{L=j+2}^\infty \rho (L) \frac{(\tau -1)!}{h!(\tau -1-h)!}\lambda ^{h+1}(1-\lambda )^{\tau -1-h} \nonumber \\&= \frac{1}{L_{\textrm{avg}}} \sum _{j=1}^{\infty }\sum _{h=0}^j \rho _{\ge }(j+2) \frac{(\tau -1)!}{h!(\tau -1-h)!}\lambda ^{h+1}(1-\lambda )^{\tau -1-h} \nonumber \\&= \frac{1}{L_{\textrm{avg}}} \sum _{R_0=3}^{\infty }\sum _{N=1}^{R_0-1} \rho _{\ge }(R_0) \frac{(\tau -1)!}{(N-1)!(\tau -N)!}\lambda ^{N}(1-\lambda )^{\tau -N} \nonumber \\&= \frac{\lambda }{L_{\textrm{avg}}} \sum _{R_0=3}^{\infty }\rho _{\ge }(R_0) P(N_{\tau }\le R_0-1) \end{aligned}$$
(27)

with formal replacements of the dummy variables between the second and third lines as \(j=R_0-2\) and \(h=N-1\).

By the way, for the homogeneous case, our exact formula (21) reduces to

$$\begin{aligned} C^\textrm{SK}_{\tau } = \frac{\lambda }{L_{\textrm{avg}}}\sum _{R_0=2}^\infty \rho _{\ge }(R_0)P(N_{\tau }\le R_0-1). \end{aligned}$$
(28)

Therefore, the LMF estimation \(C^\textrm{LMF}_{\tau }\) in Ref. [15] is consistent with our exact formula \(C^\textrm{SK}_{\tau }\) for the homogeneous case except for a very minor contribution from \(R_0=2\). We think this minor contribution is just a typo without significant meanings, and our formula is a natural and rigorous extension of the original LMF theory.

4 Examples and Numerical Verification

Let us derive the asymptotic behaviour of the order-sign ACF for several important cases.

4.1 Case 0: Random Traders

Let us consider the most trivial case where the trader i submit her orders as independent random variables. This case corresponds to the setting

$$\begin{aligned} \rho ^{(i)}(L) = \delta _{L,1} \>\>\> \Longleftrightarrow \>\>\> \rho _{\ge }^{(i)}(L) = \delta _{L,1} \>\>\> \text{ for } \text{ all } L\ge 1. \end{aligned}$$
(29)

From Eq. (24), we obtain the order-sign ACF without any correlation as

$$\begin{aligned} C_\tau ^{(i)} = \delta _{\tau ,1}. \end{aligned}$$
(30)

4.2 Case 1: Exponential Metaorder Length Distribution

Fig. 3
figure 3

Comparisons between the numerical results and the theoretical prediction (33) by assuming that all the traders are the exponentially-splitting traders with the same parameters \(L^{*(i)}=\bar{L}\), \(\lambda ^{(i)}=\bar{\lambda }\) for all \(i\in {\Omega }\). The market ACF is theoretically given by \(C_\tau = \sum _{i\in {\Omega }}C_\tau ^{(i)}=\bar{\lambda }e^{-1/\bar{L}}(1-\bar{\lambda }(1-e^{-1/\bar{L}}))^{\tau -1}\) for \(\tau \ge 1\). a The numerical autocorrelation functions of the generalised LMF model. The green line denotes the numerical autocorrelation function with \((M,\bar{L},\bar{\lambda })=(10,2,0.1)\), the orange line denotes with \((M,\bar{L},\bar{\lambda })=(10,5,0.1)\), and the red line denotes with \((M,\bar{L},\bar{\lambda })=(10,10,0.1)\) as the red line. b The result of scaling analysis on the relationships

Let us consider the case where the metaorder length obeys the exponential law:

$$\begin{aligned} \rho _{\ge }^{(i)}(L) = e^{-(L-1)/L^{*(i)}}, \>\>\> {L^{*(i)}}>0 \>\>\> \Longleftrightarrow \>\>\> \rho ^{(i)}(L) = \left( e^{1/L^{*(i)}}-1\right) e^{-L/L^{*(i)}}. \end{aligned}$$
(31)

Note that \(c_{R}^{(i)}:= 1/\sum _{L=1}^\infty \rho _{\ge }^{(i)}(L) = 1-e^{-1/L^{*(i)}}\). For this case, we obtain an exact ACF formula, such that

$$\begin{aligned} C_\tau ^{(i)} = \left( \lambda ^{(i)}\right) ^2e^{-1/L^{*(i)}}\left( 1-\lambda ^{(i)}+\lambda ^{(i)}e^{-1/L^{*(i)}}\right) ^{\tau -1} \>\>\>\text{ for } \tau \ge 1. \end{aligned}$$
(32)

See Appendix C for the detailed derivation. This equation can be rewritten as

figure c

This implies that the exponential decay appears in the order-sign ACF as a fast-decaying tail, which is consistent with empirical observations. We numerically checked the validity of this formula as shown in Fig. 3.

4.3 Case 2: Power-Law Metaorder Length Distribution

Fig. 4
figure 4

Comparisons between the numerical results and the theoretical prediction when both random and power-law splitting traders coexist with \(|\Omega _{\textrm{PL}}|=M_{\textrm{PT}}\), \(\alpha ^{(i)}=\alpha \), and \(\lambda ^{(i)}=\bar{\lambda }:=\mu /M_\textrm{PT}\) for all \(i\in \Omega _{\textrm{PT}}\). The total order-submission probability of the random-trader submission is \(1-\mu \). The green, orange, and red lines shows the results for \(\mu =1.00\), \(\mu =0.85\), and \(\mu =0.70\), respectively. The market ACF is theoretically given by \(C_\tau = \sum _{i\in {\Omega }}C_\tau ^{(i)}\simeq (\mu \bar{\lambda }^{2-\alpha }/\alpha )\tau ^{-\alpha +1}\) for \(\tau \gg 1\). We used the following parameters: \((\alpha , M)=(1.5,10)\) for a, \((\alpha ,M_{\textrm{PT}})=(1.5,100)\) for b, and \((\alpha ,M_{\textrm{PT}})=(2.5,10)\) for c

We next study the case where the metaorder length obeys the power law:

$$\begin{aligned} \rho ^{(i)}_{\ge }(L) = L^{-\alpha ^{(i)}} \end{aligned}$$
(34)

with a positive constant \(\alpha ^{(i)}>1\). This means that the density profile is approximately given by

$$\begin{aligned} \rho ^{(i)}(L) \simeq -{\frac{d}{dL}}\rho ^{(i)}_{\ge }(L) = \alpha ^{(i)} L^{-\alpha ^{(i)}-1}. \end{aligned}$$
(35)

For this case, by using an integral approximation of the sum, we obtain

$$\begin{aligned} \frac{1}{c_R^{(i)}} = L_{\textrm{avg}} := \sum _{L=1}^\infty L \rho (L) \simeq \int _1^\infty \alpha ^{(i)}L^{-\alpha ^{(i)}}dL = \frac{\alpha ^{(i)}}{\alpha ^{(i)}-1}. \end{aligned}$$
(36)

For sufficiently large \(\tau \gg 1\), we asymptotically obtain

figure d

For the detailed derivation, see Appendix D.

4.4 ACF Formula with Heterogeneous Strategies

Let us summarise the above formula regarding the heterogeneity of the order-splitting strategies. Let us consider a market where the following-types of traders coexist:

  • random traders (whose set is denoted by \(\Omega _{\textrm{RT}}\)),

  • exponentially-splitting traders (whose set is denoted by \(\Omega _{\textrm{ET}}\)), and

  • power-law splitting traders (whose set is denoted by \(\Omega _{\textrm{PT}}\)).

The total ACF asymptotically obeys

figure e

Thus, while we observe the fast decay characterised by the exponential law for relatively small \(\tau \), the slow decay is dominant for large \(\tau \). Such characters are consistent with the empirical observations.

4.4.1 Remark 1: Consistency with the Original LMF Formula for the Homogeneous Case

Let us assume that all traders are power-law splitting traders with homogeneous order-submission probability, such that

$$\begin{aligned} \alpha ^{(i)} = \alpha , \>\>\> \lambda ^{(i)} = \frac{1}{M} \>\>\> \text{ for } \text{ all } i\in \Omega _{\textrm{PT}} = {\Omega }. \end{aligned}$$
(39)

For this case, we obtain

$$\begin{aligned} C_\tau \simeq \frac{\tau ^{-\gamma }}{\alpha M^{2-\alpha }}, \>\>\> \gamma := \alpha -1, \end{aligned}$$
(40)

which is equivalent to the original LMF formula (3).

4.4.2 Remark 2: The Importance of the Minimum Power-Law Exponent \(\alpha _{\min }\)

The ACF is finally characterised by the power law

$$\begin{aligned} C_\tau \propto \tau ^{-\alpha _{\min }+1}\>\>\> \text{ for } \text{ large } \tau , \>\>\> \alpha _{\min } := \min _{i\in {\Omega _{\textrm{PL}}}} \alpha ^{(i)} \end{aligned}$$
(41)

Thus, \(\alpha _{\min }\) is the most important parameter characterising the final asymptotic behaviour of the ACF.

This character is relevant to the data calibration. Indeed, a typical quantity that is empirically-available is the aggregated metaorder-length distribution among all the splitting traders \(\Omega _{\textrm{ST}}:=\Omega _{\textrm{ET}} + \Omega _{\textrm{PT}}\), such that

$$\begin{aligned} \rho _{\textrm{ST}}^\textrm{empirical}(L) := \frac{1}{N_\textrm{tot}}\sum _{k=1}\delta (L-L_k), \end{aligned}$$
(42)

where \(N_{\textrm{tot}}\) is the total number of metaorder lengths and \(L_k\) is the kth metaorder length among all the splitting traders. Let us decompose this aggregated empirical distribution as

$$\begin{aligned} \rho _{\textrm{ST}}^\textrm{empirical}(L) = \frac{1}{N_{\textrm{tot}}}\sum _{i\in \Omega _{\textrm{ST}}}\sum _{k=1}^{N_{\textrm{tot}}^{(i)}}\delta (L-L_k^{(i)}) = \sum _{i\in \Omega _{\textrm{ST}}}\frac{N_{\textrm{tot}}^{(i)}}{N_\textrm{tot}}\left\{ \frac{1}{N_{\textrm{tot}}^{(i)}}\sum _{k=1}^{N_\textrm{tot}^{(i)}}\delta (L-L_k^{(i)})\right\} \end{aligned}$$
(43)

with \(L_{k}^{(i)}\) being the kth metaorder length of the trader i and \(N_{\textrm{tot}}^{(i)}\) being the total number of metaorder lengths of the trader i. Here we use the ergodicity regarding the empirical distributions

$$\begin{aligned} \rho ^{(i)}(L) = \lim _{N_{\textrm{tot}}^{(i)} \rightarrow \infty }\frac{1}{N_\textrm{tot}^{(i)}}\sum _{k=1}^{N_{\textrm{tot}}^{(i)}}\delta (L-L_k^{(i)}). \end{aligned}$$
(44)

Also, we can evaluate the following quantities for a long-time simulation with the simulation time t as

$$\begin{aligned} N_{\textrm{tot}}^{(i)} \simeq \frac{\lambda ^{(i)}t}{\langle L_i\rangle }, \>\>\> N_{\textrm{tot}} \simeq \frac{\lambda _{\textrm{ST}} t}{\langle L\rangle }, \>\>\> \lambda _{\textrm{ST}} := \sum _{i\in \Omega _{\textrm{ST}}}\lambda ^{(i)}. \end{aligned}$$
(45)

We thus obtain

$$\begin{aligned} \rho _{\textrm{ST}}^\textrm{empirical}(L) \simeq \sum _{i\in \Omega _{\textrm{ST}}} w^{(i)}\rho ^{(i)}(L) \propto L^{-{\alpha _{\min }}-1} \>\> \text{ for } \text{ a } \text{ large } \text{ L } \text{ with } \>\> w^{(i)} := \frac{\lambda ^{(i)}}{\lambda _\textrm{ST}}\frac{\langle L\rangle }{\langle L_i\rangle }. \end{aligned}$$
(46)

This relation implies that it is acceptable to use the aggregated metaorder-length distributions among all the splitting traders in determining \(\alpha _{\min }\).

We note that these formulas are derived by assuming that the sample size is infinity and the metaorder-length distributions obey the true power-law without cutoffs. Technically, the straightforward applicability of these formulas is rather limited for real data analyses, where the sample size is finite and the metaorder-length PDFs obey truncated power laws. Indeed, the numerical convergence speed of the asymptotic relation (46) was slow regarding the sample size (see Appendix E for the numerical results). However, the above analysis highlights the conceptual relation between the metaorder-length PDF for individual traders and the aggregated PDF that is empirically accessible.

5 Theoretical Discussion 1: Data Calibration Based on the Power-Law Splitting Assumption

Here we discuss the implication of our heterogeneous-LMF formula (38) for the data calibration. Particularly, in this section, we only make a simple assumption

$$\begin{aligned} \alpha ^{(i)} = \alpha \>\> \text{ for } \text{ all } i \in \Omega _{\textrm{PT}} \end{aligned}$$
(47)

with the heterogeneity included in the intensities \(\{\lambda ^{(i)}\}_{i \in \Omega _{\textrm{PT}}}\) among the power-law splitting traders. Also, the total order-submission probability \(\mu \) and the total number M of the power-law splitting traders are denoted by

$$\begin{aligned} \mu := \sum _{\i \in \Omega _{\textrm{PT}}} \lambda ^{(i)}, \>\>\> M_{\textrm{PT}} := |\Omega _{\textrm{PT}}|, \end{aligned}$$
(48)

respectively. For this case, the asymptotic behaviour is described by

figure f

5.1 Robust Power-Law Exponent Formula

What is the implication of the heterogeneous LMF formula (49) for the data calibration? One of the most important implications is that the power-law exponent \(\gamma \) is insensitive to the heterogeneity of the order-submission probability distribution \(\{\lambda ^{(i)}\}_{i \in \Omega _{\textrm{PT}}}\). This is a very important character of our heterogenous LMF model because it implies that the power-law exponent is a very robust measurable quantity: even if the heterogeneity of the order-submission probability distribution is present, the LMF prediction \(\gamma = \alpha -1\) is a trustable relationship. In real datasets, the average waiting time \(\tau ^{(i)}:= 1/\lambda ^{(i)}\) is expected to distribute widely, such as the power-law distribution \(P(\tau ):=(1/M)\sum _{i\in \Omega _{\textrm{PT}}}\delta (\tau -\tau ^{(i)})\propto \tau ^{-\chi -1}\) for large \(\tau \) with \(\chi > 0\). This assumption is equivalent to the power-law peak asymptotics in the order-submission probability distribution, such that \(P(\lambda )\propto \lambda ^{\chi -1}\) for small \(\lambda \). The relation (49) states that such widely-distributed waiting times (or intensities) have no impact on the macroscopic power-law exponent \(\gamma \) in the ACF, which is non-trivial. From this viewpoint, the heterogeneity in the LMF model is not essential in understanding the LRC; the original LMF is a sufficient model.

In addition, since the formula \(\gamma =\alpha -1\) does not depend on the intensities \(\{\lambda ^{(i)}\}_i\), the power-law ACF formula will hold even when the intensities have slow time dependence if \(\alpha \) is time independent. Indeed, in the presence of the time inhomogeneity of \(\{\lambda ^{(i)}(t)\}_i\), the ACF formula will be replaced by

$$\begin{aligned} C_{\tau } \simeq \bar{c}_0^\textrm{SK} \tau ^{-\gamma }, \>\>\> \gamma =\alpha -1, \>\>\> \bar{c}_0^\textrm{SK} := \frac{1}{\alpha T_\textrm{fin}}\sum _{i\in \Omega _{\textrm{PT}}}\int _0^{T_{\textrm{fin}}} \left( \lambda ^{(i)}(t)\right) ^{3-\alpha }dt \end{aligned}$$
(50)

with the final observation time \(T_{\textrm{fin}}\), if the time dependence of \(\{\lambda ^{(i)}(t)\}_i\) is sufficiently slow. Given that the intensities \(\{\lambda ^{(i)}(t)\}_i\) will change day by day, it is pleasant that the power-law ACF formula holds independently of the intensities, at least for their slow time variation.

5.2 Non-robust Prefactor Formula

On the other hand, the prefactor \(c_0^\textrm{SK}\) is very sensitive to the heterogeneity of the order-submission probability distribution \(\{\lambda ^{(i)}\}_{i \in \Omega _{\textrm{PT}}}\). Indeed, the prefactor \(c_0^\textrm{SK}\) is different from the homogeneous LMF formula:

$$\begin{aligned} c_0^\textrm{SK} \ne \frac{1}{\alpha M_{\textrm{PT}}^{2-\alpha }}, \end{aligned}$$
(51)

if \(\lambda ^{(i)}\ne 1/M_{\textrm{PT}}\) for some \(i \in \Omega _\textrm{PT}\). This implies that the interpretation of the prefactor is not straightforward because it sensitively depends on the underlying microscopic assumptions. We are sure that the homogenous assumption in the order-submission probability distribution is unrealistic in real datasets.

Furthermore, we have assumed that the metaorder length PDF exactly obeys the paretian distribution for all the range. This assumption is also unrealistic. Rather, it is a more realistic assumption that the power-law holds only asymptotically:

$$\begin{aligned} \rho _{\ge }(L) \simeq c_{\rho } L^{-\alpha } \>\>\> \text{ for } \text{ large } L. \end{aligned}$$
(52)

Actually, we validated this weaker assumption in our microscopic datasets of the TSE market (see Ref. [22, 23]). Under this assumption, the prefactor is slightly modified. Anyway, the prefactor is sensitive to the model-specific assumptions.

5.3 Systematic Underestimation of the Prefactor by the Homogeneous LMF Model

Furthermore, our heterogeneous LMF formula (49) implies that the prefactor is systematically biased in the presence of the heterogeneous intensities. To clarify this point, let us consider the homogenous assumption in the intensities among the power-law splitting traders, such that

$$\begin{aligned} \lambda ^{(i)}_{\textrm{LMF}} = \frac{\mu }{M_{\textrm{PT}}}, \end{aligned}$$
(53)

while we assume that the random and exponentially-splitting traders can be present (i.e., \(\mu \) can be different from the unity). For this case, the prefactor is given by

$$\begin{aligned} c_0^\textrm{LMF} = \frac{\mu ^{3-\alpha }}{\alpha M_{\textrm{PT}}^{2-\alpha }}, \end{aligned}$$
(54)

which reduces to Eq. (3b) for \(\mu =1\). Here we can prove that the prefactor is systematically underestimated by the homogeneous LMF model with \(\alpha \in (1,2)\):

figure g

The lower-bound inequality is derived by applying Hölder’s inequality, and the upper-bound inequality is derived by mathematical induction. See Appendix F for the detailed proof. The lower-bound equality holds when the intensities are homogeneous, such that \(\lambda ^{(i)}=\mu /M_\textrm{PT}\) for all \(i\in \Omega _{\textrm{PT}}\). In addition, the upper-bound equality holds when the power-law splitter is alone \(M_{\textrm{PT}}=1\).

These inequalities highlight the impact of the heterogeneous trading strategies on the prefactor estimation, and is the final main result of this report. The inequality (55) is practically relevant to the evaluation of the ACF prefactors by data calibration.

5.3.1 Estimation of the Lower Bound of the Total Number of Order-Splitting Traders

The inequality (55) is useful for the estimation of the total number of order-splitting traders. Indeed, we can estimate the lower bound of the total number of the power-law order-splitting traders as

figure h

by assuming \(c_{0}^\textrm{SK}\simeq c_0^\textrm{dat}\), where \(c_0^\textrm{dat}\) is the empirically accessible quantity. Since \(\gamma \) is directly measurable from the ACF, \(\alpha \) is also indirectly measurable by the relationship \(\alpha =\gamma +1\). While \(\mu \) is not easily accessible from public data, we assume \(\mu =0.8\) because this value was typical in the TSE market from 2012 to 2020. Thus, we have approximate access to \(M_{\textrm{PT}}^\textrm{LB}\).

5.3.2 How to Use the Inequality (56)

Let us discuss how to use the inequality (56) for the evaluation of the total number of traders \(M_{\textrm{PT}}\) from the empirical ACF. Our question is whether \(M^\textrm{LB}_{\textrm{PT}}\) has useful informationis on the true value of \(M_{\textrm{PT}}\). For simplicity, let us consider the case where one has to decide whether there is one or two order splitters. We also assume that the true values are given by

$$\begin{aligned} M_{\textrm{PT}}=2, \>\>\> \mu = 0.8, \>\>\> \lambda ^{(1)} = 0.2,\>\>\> \lambda ^{(2)} = 0.6,\>\>\> \alpha ^{(1)} = \alpha ^{(2)} = 1.5. \end{aligned}$$
(57)

Under these assumptions, the theoretical value of the prefactor is given by \(c_0\simeq 0.37\). Using inequality (56), we can estimate the lower bound of the number of power-law splitting traders \(M_{\textrm{PT}}\) as

$$\begin{aligned} M_{\textrm{PT}} \gtrsim M_{\textrm{PT}}^\textrm{LB} \approx 1.66. \end{aligned}$$
(58)

This result rejects the possibility of the case with \(M_{\textrm{PT}}=1\), and provides a similar value to \(M_{\textrm{PT}}=2\). By parallel considerations, the lower bound \(M^\textrm{LB}_{\textrm{PT}}\) has useful information on the true value of \(M_{\textrm{PT}}\) because it rejects the possibility of the case with \(M_{\textrm{PT}}< M^\textrm{LB}_{\textrm{PT}}\).

5.3.3 Remark on the Practical Interpretation of \(M_{\textrm{PT}}\).

In practice, it is not easy to define the total number of traders \(M_{\textrm{PT}}\). We here remark on this technical issue. For example, let us consider the case where one mutual fund joins a financial market with one trading account. If we regard a trading account as the unit of the trader, the contribution of the mutual funds to \(M_{\textrm{PT}}\) is one. However, a mutual fund has many clients behind and aggregates their orders, and it might be plausible to count the “hidden" clients regarding the contribution to \(M_{\textrm{PT}}\). We are unsure which interpretation is appropriate for the LMF calibration. This example illustrates the difficulty in defining the total number of traders in practice from the viewpoint of data analysts. In other words, if we attempt to correctly predict \(M_{\textrm{PT}}\), many microscopic details start to matter. Accordingly, \(M_{\textrm{PT}}^\textrm{LB}\) provides an approximate lower bound but not the exact lower bound in light of the difficulty in defining the true \(M_{\textrm{PT}}\).

6 Theoretical Discussion 2: Superposition of the Exponential Splitting Traders

In Sect. 5, we discuss the theoretical scenario in the overwhelming presence of power-law splitting traders to understand the origin of the LRC in the market-order flow. While this scenario is the most promising and plausible, here we discuss other theoretical possibilities that the LRC appears as the superposition of exponential splitting traders. In other words, we assume the absence of powew-law splitting traders \(M_{\textrm{PT}}=0\) but the dominant presence of exponential splitting traders \(M_\textrm{ET}\ne 0\). Interestingly, the LMF prediction \(\gamma = \alpha -1\) still holds even for this alternative scenario, suggesting the robustness of the LMF prediction.

Let us define the empirical distribution function of \((L^{*(i)}, \lambda ^{(i)})\), which characterises the exponential splitting traders:

$$\begin{aligned} P_{\textrm{ET}}(L^*,\lambda ) := \frac{1}{M_{\textrm{ET}}}\sum _{i\in \Omega _{\textrm{ET}}}\delta (L^*-L^{*(i)})\delta (\lambda -\lambda ^{(i)}), \>\>\> M_{\textrm{ET}} := |\Omega _{\textrm{ET}}|. \end{aligned}$$
(59)

For simplicity, we assume that \(M_{\textrm{ET}}\) is large enough for \(P_{\textrm{ET}}(L^*,\lambda )\) to be approximated as a continuous function and that \(L^*\) and \(\lambda \) are statistical independent, such that

$$\begin{aligned} P_{\textrm{ET}}(L^*,\lambda ) = P_{\textrm{ET}}(L^*)P_{\textrm{ET}}(\lambda ) \end{aligned}$$
(60)

with \(P_{\textrm{ET}}(L^*):=(1/M_{\textrm{ET}})\sum _{i\in \Omega _{\textrm{ET}}} \delta (L^*-L^{*(i)})\) and \(P_{\textrm{ET}}(\lambda ):=(1/M_\textrm{ET})\sum _{i\in \Omega _{\textrm{ET}}} \delta (\lambda -\lambda ^{(i)})\). In addition, we assume the total number of exponential splitting traders is sufficiently large \(M_{\textrm{ET}}\gg 1\) and there is no single trader overwhelmingly contributing to the total market orders, such that

$$\begin{aligned} \lambda ^{(i)} \ll 1 \>\> \text{ for } \text{ all } i\in \Omega _{\textrm{ET}}. \end{aligned}$$
(61)

6.1 Scenario Based on the Fat-Tailed Decay Length Distribution

Fig. 5
figure 5

Comparisons between the numerical results and the theoretical prediction in the system constituted by exponential splitting traders with various decay lengths. a, b The aggregated metaorder length distribution and the autocorrelation function of the order flow under \(|\Omega _{\textrm{EX}}|=10^3\), \(\lambda ^{(i)}=|\Omega _\textrm{EX}|^{-1}\), and \(P(L^{*(i)})=\left( L^{*(i)}\right) ^{-0.5}\). c, d The aggregated metaorder length distribution and the autocorrelation function of the order flow under \(|\Omega _\textrm{EX}|=10^3\), \(\lambda ^{(i)}=|\Omega _{\textrm{EX}}|^{-1}\), and \(P(L^{*(i)})=\left( L^{*(i)}\right) ^{-1.5}\)

Let us focus on the strong inhomogeneity in the decay length \(L^{*(i)}\), such that

$$\begin{aligned} P(L^*)\simeq ( \vartheta -1)\left( L^{*}\right) ^{-\vartheta } \end{aligned}$$
(62)

with \(\vartheta \in (1,2)\). This scenario can be interpreted from financial viewpoints as follows: the typical metaorder length is assumed to be correlated with the size of the trading institutions, such that large (small) metaorders are likely to be associated with large (small) institutions. The typical lengths of metaorders are assumed to be homogeneous within the same institution (e.g., with exponential distributions). Finally, the heterogeneity in the institution sizes obeys power laws.

On this assumption, we find that both the market order ACF \(C_{\tau }\) and the aggregated empirical metaorder-length PDF \(\rho _{\textrm{ST}}^\textrm{empirical}(L)\), defined by Eq. (42), obeys the power law (see Fig. 5; see Appendices 1 and 1 for the derivation and the technical details of numerical simulations):

figure i

6.1.1 Relationship to the Previous BBDG Model

Let us discuss the relationship to the previous model in the textbook [3] by Bouchaud, Bonart, Donier, and Gould (BBDG). BBDG proposed a variant of the LMF model (which we call the BBDG model in this article to distinguish the two models; see Appendix I) to simplify the algebraic calculations for the ACF formulas. The BBDG model is based on the assumption that the stopping probability of order splitting obeys a power law. Our scenario based on superposition of exponential splitters is essentially similar to the BBDG model. Indeed, by setting \(\lambda ^{(i)}=\mu /M_{\textrm{ET}}\), we obtain

$$\begin{aligned} q_0^\textrm{BBDG} := \Gamma (\alpha )\frac{\mu ^{3-\alpha }}{M_\textrm{ET}^{2-\alpha }}, \end{aligned}$$
(64)

which is equivalent to the formula in Ref. [3] when \(\mu =1\).

6.1.2 Robustness of the Power-Law Formula

The results (63) highlight the robustness of the LMF power-law prediction \(\gamma =\alpha -1\): if the empirical aggregated metaorder-length PDF \(\rho _{\textrm{ST}}^\textrm{empirical}(L)\) obeys the power law with exponent \(\alpha \), we can expect the ACF power-law decay with exponent \(\gamma =\alpha -1\) whether \(\rho _{\textrm{ST}}^\textrm{empirical}(L)\) is composed of power-law splitters or superposition of exponential-law splitters. This prediction is robust and insensitive to the details of the underlying microscopic dynamics, even regarding the types of splitters (regardless of whether they are power-law or exponential-law splitters). This character is pleasant and reliable for data analyses.

6.1.3 Robustness of the Prefactor Formulas

The prefactor formula (63) is very similar to the prefactor formula (49) in Sect. 5. Indeed, we have

figure j

In other words, the prefactor is smallest if and only if the submission intensities are homogeneous, such that \(\lambda _i=\mu /M_{\textrm{ET}}\) for all \(i\in \Omega _{\textrm{ET}}\).

6.1.4 Remark on the Essential Similarity Between the LMF and BBDG Models

In addition, we find that the prefactor formulas between \(c_0\) and \(q_0\) are essentially similar in the sense that

figure k

In other words, the prefactors are almost the same between the two scenarios except for factor 2 at most.

6.2 Open Question: The Power-Law Splitter Scenario vs. the Superposed Exponential-Law Splitter Scenario

We presented various theoretical scenarios to derive the market ACF from the order-splitting hypothesis. For example, the power-law exponent formula \(\gamma =\alpha -1\) and the prefactor inequality (55) robustly hold for both two scenarios, in the presence of power-law splitters or superposition of exponential splitters with various decay length. Note that the power-law ACF can be derived even for the scenario of superposition of exponential splitters with various intensities (see Appendix J).

The natural question is which scenario is most plausible in reality, the power-law splitter scenario or the superposition of the exponential-law splitters. While the robustness of the LMF predictions is a pleasant character in applying the LMF theory, verifying the LMF predictions itself does not immediately imply the rejection of either scenario inversely due to the robustness of the LMF formulas. In other words, while our previous reports [22, 23] establish the relationship \(\gamma =\alpha -1\) and the inequality (55), technically, they do not distinguish the two scenarios of the power-law splitting and the superposition of exponential splitting. It is a crucial issue to reject either scenario in real data analyses. Indeed, testing these scenarios is feasible by studying the metaorder-length distributions of individual traders, which is planned to be our subsequent study by analysing the TSE microscopic dataset.

7 Conclusion

We have proposed a generalised Lillo–Mike–Farmer model by incorporating the heterogeneity of order-splitting strategies. This model is exactly solved to evaluate the impact of the heterogeneous strategies regarding both the power-law exponent and the prefactor in the order-sign autocorrelation function. Our theoretical formulas imply that (i) the power-law exponent formula \(\gamma =\alpha -1\) robustly holds even in the presence of the heterogeneous intensity distributions. On the other hand, (ii) the prefactor formula is sensitive to the underlying microscopic assumptions. Indeed, the formula explicitly depends on the intensity distributions among the power-law splitting traders. Furthermore, we find that (iii) the prefactor formula for the homogeneous LMF model systematically underestimates the actual prefactor in the presence of the heterogeneous order-submission probability distributions. We believe that points (i)–(iii) are essential in examining the LMF model for data calibration.

These days, the availability of high-quality microscopic datasets has been significantly enhanced, and our recent articles [22, 23] have verified the LMF prediction quantitatively. Considering such updates from the data-analytic side, we believe that the classical LMF theory should be updated for precise empirical validation.

We must admit that our generalisation is just a first step forward for data calibration, and there is plenty of room to improve the trader model for market order submissions. While only the heterogeneity of the order-splitting strategies is included in our generalised LMF model, other characters, such as the trend-following (herding) behaviour among traders, are not included. Trend-following behaviour is empirically observed at the level of individual traders [24], which can be included in the market order submission models for a more precise market description.