Exact solution to a generalised Lillo-Mike-Farmer model with heterogeneous order-splitting strategies

The Lillo-Mike-Farmer (LMF) model is an established econophysics model describing the order-splitting behaviour of institutional investors in financial markets. In the original article (LMF, Physical Review E 71, 066122 (2005)), LMF assumed the homogeneity of the traders' order-splitting strategy and derived a power-law asymptotic solution to the order-sign autocorrelation function (ACF) based on several heuristic reasonings. This report proposes a generalised LMF model by incorporating the heterogeneity of traders' order-splitting behaviour that is exactly solved without heuristics. We find that the power-law exponent in the order-sign ACF is robust for arbitrary heterogeneous intensity distributions. On the other hand, the prefactor in the ACF is very sensitive to heterogeneity in trading strategies and is shown to be systematically underestimated in the original homogeneous LMF model. Our work highlights that the ACF prefactor should be more carefully interpreted than the ACF power-law exponent in data analyses.


Introduction
Market microstructure of financial markets has been studied quantitatively and empirically in econophysics [1,2,3,4].Econophysicists propose various dynamical models, such as at the limit-order book level (e.g., the Santa Fe model [5,6,7], the -intelligence model [8], and the latent order-book model [9]) and the individualtraders level (e.g., the dealer model [10,11,12,13,14]) with the hope that the statistical-physics program is useful even for financial modelling.This paper focuses on a microscopic model of market-order submissions proposed by Lillo, Mike, and Farmer (LMF) in 2005 [15], which was hypothetically based on the ordersplitting behaviour of individual traders.
The LMF model is a stylised dynamical model to explain the persistence of market-order flows.In financial data analyses, the binary order-sign sequence of market-order flows is known to be predictable for a long time: by writing a buy (sell) order at time t as t = +1 ( t = −1), the autocorrelation function (ACF) of the order-sign sequence obeys the slow decay characterised by the power law, such that C τ := t t+τ c 0 τ −γ for large τ, γ ∈ (0, 1). ( Here the ensemble average of any stochastic variable A is denoted by A , c 0 is the ACF prefactor, and γ is the ACF power-law exponent.This slow decay is called the long-range correlation (LRC) of the order flows and has been under debate in econophysics and market microstructure for a long time [3].For example, some researchers state that the LRC is a consequence of herding among traders [16,17,18].However, from the viewpoint of empirical support, the current most promising microscopic hypothesis is the order-splitting hypothesis stating that the LRC originates from the order-splitting behaviour of institutional investors.The LMF model is based on this order-splitting hypothesis in describing the LRC from the microscopic dynamics in the spirit of the statistical-physics programs.The order-splitting hypothesis states that there are traders who split large metaorders into a long sequence of child orders.Because all the child orders share the same sign for a while, the LRC naturally appears in this scenario.The LMF model is a simple stochastic model implementing this order-splitting picture.In the original article [15], they made the following assumptions: -There are M traders in the financial markets.M is a time constant (i.e., a closed system).
-All traders are order-splitting traders, and the homogeneity of their strategy is assumed.
-The distribution of metaorder length L is given by the discrete parato distribution (L = 1, 2, ...), such that ( -They randomly submit market orders with the same intensity. While this microscopic dynamics is described as an 2M + 1-dimensional stochastic process, LMF solved this model to study the LRC in the ACF as its macroscopic dynamical behaviour with heuristic but reasonable approximations.They finally showed that the ACF asymptotically obeys the LRC asymptotics (1), and the power-law exponent γ and the prefactor c 0 are given by They also numerically showed that the power-law exponent formula (3a) robustly works even for an opensystem version, where the total number of the traders M fluctuates in time.Since the predictive formula (3) connects the quantitative relationship between the macroscopic LRC phenomenon and the microscopic parameters, the LMF theory belongs to typical statistical-physics programs and is exceptionally appealing to econophysicists theoretically.Several empirical studies support both the order-splitting hypothesis and the LMF model.While the original LMF paper could not establish their prediction (3a) at a quantitative level1 due to the data unavailability of high-quality microscopic datasets in 2005, Tóth et al. showed very convincing qualitative evidence in 2015 that the order-splitting is the main cause of the LRC by decomposing the total ACF [19].Furthermore, Sato and Kanazawa showed crucial evidence in 2023 that the LMF prediction (3a) precisely works well even at a quantitative level [20] using a large microscopic dataset of the Tokyo Stock Exchange (TSE) market.
At the same time, theoretically, the predictive power of the LMF model is expected to be limited regarding the prefactor c 0 because the formula (3b) should depend on system-specific details of the underlying microscopic dynamics.Indeed, we noticed that heterogeneity of order-splitting strategies is present during the data analyses for Ref. [20] and that such heterogeneity can impact the prefactor c 0 , while the power-law formula (3a) robustly holds.Given the recent breakthrough in data analyses, we believe the classical LMF theory can be updated to take into account the heterogeneity in trading strategies toward precise data calibration.
In this report, we propose a generalised LMF model by incorporating heterogeneity of order-splitting strategies.In addition, we solve the generalised LMF model exactly to show the following two characters: (i) The power-law exponent formula (3a) robustly holds true even in the presence of heterogeneous intensity distributions.(ii) The prefactor formula (3b) is replaced with a new formula that is sensitive to the intensity distribution.(iii) Furthermore, the classical prefactor formula (3b) systematically underestimates the actual prefactor in the presence of heterogeneity in agents.Our results imply that while the interpretation of the ACF power-law exponent is robust and straightforward, the interpretation of the ACF prefactor needs more careful investigation for data calibrations.
This report is organised as follows.Section 2 describes our model and mathematical notation.We show the exact solution for the generalised LMF model in Sec. 3. In Sec. 4, we study several specific but important cases with numerical verifications.Section 5 discusses the implication of our heterogeneous LMF formulas for realistic data calibration.We conclude our paper with some remarks in Sec. 6.At the end of this report, four appendices follow the main text for its supplements.

Microscopic model
Macroscopic phenomena

Run length
Fig. 1 Schematic of our generalised LMF model.The total number of the traders is M := |ΩTR|, which is a time constant positive integer.Any trader i is characterised by the intensity λ (i) and the run-length (metaorder-length) distribution ρ (i) (L).
Here, a run length is defined by the number of successively same order signs (e.g., L = 3 for + + +), and is called the metaorder length in this paper.The intensities and the metaorder-length distribution must satisfy the normalisation conditions i ∈Ω TR λ (i ) = 1 and ∞ L=1 ρ (i) (L) = 1 for any i ∈ ΩTR.At each timestep, a trader it is randomly selected according to the probability distribution {λ (i) }i∈Ω TR (i.e., the discrete-time Poisson process), and then submits a market order.

Model
In this section, let us define the stochastic dynamics of our generalised LMF model.

Mathematical notation
In this report, the probability distributin function (PDF) of a stochastic variable A is written as P (A).If the stochastic variable explicitly depends on time t, such that A t , the PDF of A t is denoted by P t (A).We note that any PDF must satisfy the normalisation condition A P (A) = 1.We also define the cumulative distribution function (CDF) and the complementary cumulative distribution function (CCDF) by respectively.The stationary PDF and the stationary ensemble average are respectively defined by Also, under the condition B, the conditional PDF and conditional average of A are respectively defined by

Model parameters and variables
Ω TR denotes the set of all the traders, and the system is assumed to be closed, such that M := |Ω TR | = const (see Fig. 1).|Ω TR | is a positive integer, and the traders set Ω TR can be written as without loss of generality.We incorporate the heterogeneity of trading strategies into our model, and the characteristic parameters of the ith trader are given by the submission rate λ (i) and metaorder-length (or runlength) distribution ρ (i) (L).For simplicity, we assume that the executed volume size is always the minimum unit of transactions.In other words, our model is completely characterised by the following parameter set where the submission rate and the metaorder-length distribution satisfy the normalisation of the probability for any i ∈ Ω TR .
We next define the state variable of the ith trader.The trader i has two state variables t , representing the order-sign of the metaorder ( (i) t = +1 denotes buy and (i) t = −1 denotes sell) and the remaining metaorder length, respectively.The order-sign of the whole market is denoted by t .Thus, this system is specified by the point in the phase space t ; . . .; and is designed as a Markovian stochastic process with dimension 2M + 1.

Stochastic dynamics
We next proceed with the definition of the stochastic dynamics.Let i t be the stochastic variable representing the trader identifier (ID) who submits the market order at time t, such that i t ∈ Ω TR .We assume that i t+1 obeys the PDF {λ (i) } i∈ΩTR .In other words, the probability i t+1 is given by as an independent and identically distributed sequence {i t } t .After the execution by the trader i t+1 , the remaining volume If all the metaorder is executed at time t + 1 (i.e., R (it+1) t = 1), the metaorder length and its sign are randomly reset for the trader i t+1 .In summary, the dynamics of X t is given as follows for all i ∈ Ω TR (see Fig. 1 for a schematic): Here the metaorder length is replenished according to the PDF {ρ (i) (L)} L when the previous metaorder is terminated (i.e., if R (it+1) t = 1).

Relationship with the original LMF model
Our model is a natural generalisation of the original LMF model to include the heterogeneity of the ordersplitting behaviour.Indeed, our model reduces to the original LMF model by setting the parameter P as by removing the heterogeneity in the order-splitting strategies.The case for = 1 and =0 1 = 10 Both orders belong to the same metaorder because R (1) = R 0 (1) − 1 ≥ 1 Fig. 2 Schematic of the ACF decomposition for the case with i = 1 and R (i=1) t=0 = 10.The issuer of the market orders at t = 1 and t = τ + 1 is the same, such that i1 both orders at t = 1 and t = τ + 1 belong to the same metaorder, and, thus, the condition of u = 1 is met.

Exact solutions
In this section, we derive the exact solutions to our generalised LMF model.Particularly, we are interested in the order-sign autocorrelation function (ACF) in the stationary state:

Preliminary calculation
Before deriving the explicit formula of the exact ACF, we make a transformation of the definition of the ACF.
Let us introduce a flag variable u satisfying u = 1 if the metaorder executed at time t = τ + 1 belongs to the same metaorder executed at time t = 1 or otherwise u = 0. Let us introduce the conditioning on u, i τ +1 , and i 1 , to decompose the ACF as See Fig. 2 for a schematic of this decomposition.By construction, there is no correlation between the order signs belonging to two different metaorder.On the other hand, the order signs between the same metaorder are perfectly correlated.We thus obtain In addition, i τ +1 , and i 1 are independently generated, and We obtain We next introduce the conditioning on R (i) t=0 as where we use the identity 2 P (A|B) = C P (A|B, C)P (C|B), and the relationships P st (R ) is directly related with the survival probability of a metaorder whose initial volumes is R (i) 0 .Indeed, by defining N (i) τ as the total number of the metaorder executions by the trader i during [1, τ ], the condition of u = 1 is equal to R τ ≥ 1 (see Fig. 2), or equivalently, In other words, this is the probability that the discrete-time Poisson counting process In summary, we can exactly decompose the total ACF as which needs the explicit formulas for P st (R (i) 0 ) and P (N In the following subsections, we will derive the exact formulas for these quantities using the master equation approach.

Stationary PDF for the remaining metaorder length
Let us derive the stationary PDF P st (R (i) ) for the remaining metaorder length R (i) via the master equation approach.The master equation for the remaining metaorder length PDF P t (R (i) ) is given by where ∆ t P t (R (i) ) := P t+1 (R (i) ) − P t (R (i) ) for R (i) > 0 (see Appendix.A for the derivation).In the stationary state ∆ t P t R (i) = 0, we obtain the stationary distribution P st (R (i) ) in an exact form as We note that c R can be transformed as 2 See the following derivation:

Survival probability of the metaorder length
We next study the survival probability of the metaorder length, which is formulated as the CDF P (N Because the discrete-time Poisson counting process is equivalent to the Bernoulli process with probability λ (i) , the PDF for N (i) t obeys the binomial distribution3 : which satisfies the initial condition P t=1 (N (i) ) = δ N (i) ,1 .Finally, its CDF is given by (28) with the regularised incomplete Beta function I x (a, b) defined by for real numbers x, a, and b.

Exact form of the order-sign ACF
We finally obtain the exact order-sign ACF formula in an explict form as

Remark on the original derivation
Let us focus on the homogeneous case λ (i) = λ = 1/M for all i ∈ Ω TR .In the original LMF argument, they heuristically estimated that the original metaorder length at τ = 1 should obey the PDF because a longer metaorder is likely to be observed with a higher probability.Furthermore, they assumed that the remaining metaorder length On theses heuristic but reasonable assumptions, they estimated the order-sign ACF as Our derivation is essentially similar to the original LMF argument.However, it is more systematic and rigorous version based on the master-equation approach without heuristic arguments.Indeed, their heuristic formula is equivalent to ours except for a minor typo as follows: By exchange the sums between L and j, we obtain with formal replacements of the jummy variables between the second and third lines as j = R 0 − 2 and h = N − 1.
By the way, for the homogeneous case, our exact formula (21) reduces to Therefore, the LMF estimation C LMF τ in Ref. [15] is consistent with our exact formula C SK τ for the homogeneous case except for a very minor contribution from R 0 = 2.We think this minor contribution is just a typo without significant meanings, and our formula is a natural and rigorous extension of the original LMF theory.

Examples and numerical verification
Let us derive the asymptotic behaviour of the order-sign ACF for several important cases.

Case 0: Random traders
Let us consider the most trivial case where the trader i submit her orders at purely random.This case corresponds to the setting From Eq. (30), we obtain the order-sign ACF without any correlation as Fig. 3 Comparisons between the numerical results and the theoretical prediction (39) by assuming that all the traders are the exponentially-splitting traders with the same parameters L * (i) = L, λ (i) = λ for all i ∈ ΩTR.The market ACF is theoretically given by Cτ = i∈Ω TR C The numerical autocorrelation functions of the generalised LMF model are shown for the case (a) with (M, L, λ) = (10, 2, 0.1) as the green line, the case (b) with (M, L, λ) = (10, 5, 0.1) as orange line, and the case (c) with (M, L, λ) = (10, 10, 0.1) as the red line.

Case 1: Exponential metaorder length distribution
Let us consider the case where the metaorder length obeys the exponential law: . For this case, we obtain an exact ACF formula, such that See Appendix C for the detailed derivation.This equation can be rewritten as ET := (39b) This implies that the exponential decay appears in the order-sign ACF as a fast-decaying tail, which is consistent with empirical observations.We numerically checked the validity of this formula as shown in Fig. 3.

Case 2: Power-law metaorder length distribution
We next study the case where the metaorder length obeys the power law: with a positive constant α (i) > 1.This means that the density profile is approximately given by For this case, by using an integral approximation of the sum, we obtain For sufficiently large τ 1, we asymptotically obtain (43) For the detailed derivation, see Appendix D.

ACF formula with heterogeneous strategies
Let us summarise the above formula regarding the heterogeneity of the order-splitting strategies.Let us consider a market where the following-types of traders coexist: random traders (whose set is denoted by Ω RT ), exponentially-splitting traders (whose set is denoted by Ω ET ), and power-law splitting traders (whose set is denoted by Ω PT ).

The total ACF asymptotically obeys
(44) Thus, while we observe the fast decay characterised by the exponential law for relatively small τ , the slow decay is dominant for large τ .Such characters are consistent with the empirical observations.

Remark 1: consistency with the original LMF formula for the homogeneous case
Let us assume that all traders are power-law splitting traders with homogeneous intensity, such that For this case, we obtain which is equivalent to the original LMF formula (3).

Remark 2: the importance of the minimum power-law exponent α min
The ACF is finally characterised by the power law C τ ∝ τ −αmin+1 for large τ, α min := min Thus, α min is the most important parameter characterising the final asymptotic behaviour of the ACF.This character is relevant to the data calibration.Indeed, a typical quantity that is empirically-available is the aggregated metaorder-length distribution among all the splitting traders Ω ST := Ω ET + Ω PT , such that where N tot is the total number of metaorder lengths and L k is the kth metaorder length among all the splitting traders.Let us decompose this aggregated empirical distribution as with k being the kth metaorder length of the trader i and N (i) tot being the total number of metaorder lengths of the trader i.Here we use the ergodicity regarding the empirical distributions Also, we can evaluate the following quantities for a long-time simulation with the simulation time t as We thus obtain This relation implies that it is acceptable to use the aggregated metaorder-length distributions among all the splitting traders in determining α min .

Discussion for data calibration
Here we discuss the implication of our heterogeneous-LMF formula (44) for the data calibration.Particularly, in this section, we only make a simple assumption with the heterogeneity included in the intensities {λ (i) } i∈ΩPT among the power-law splitting traders.Also, the total intensity µ and the total number M of the power-law splitting traders are denoted by respectively.For this case, the asymptotic behaviour is described by Proof.The inequality (60) is proved by the Hölder's inequality: for any series {a k } k , {b k } k and any real numbers p, q satisfying 1/p + 1/q = 1.By putting which is equivalent to We thus obtain the inequality (60).

Estimation of the lower bound of the total number of order-splitting traders
The inequality (60) is useful for the estimation of the total number of order-splitting traders.Indeed, we can estimate the lower bound of the total number of the power-law order-splitting traders as is the empirically-available prefactor.Since γ is directly measurable from the ACF, α is also indirectly measurable by the relationship α = γ + 1.While µ is not an empirically observable quantity from public data, Ref. [20] reports that µ was typically 0.8 in the Tokyo Stock Exchange market from 2012 to 2020.Thus, it might be possible to roughly evaluate the lower bound of the total number of order-splitting traders M PT from this inequality.

Conclusion
We have proposed a generalised Lillo-Mike-Farmer model by incorporating the heterogeneity of order-splitting strategies.This model is exactly solved to evaluate the impact of the heterogeneous strategies regarding both the power-law exponent and the prefactor in the order-sign autocorrelation function.Our theoretical formulas imply that (i) the power-law exponent formula γ = α − 1 robustly holds even in the presence of the heterogeneous intensity distributions.On the other hand, (ii) the prefactor formula is sensitive to the underlying microscopic assumptions.Indeed, the formula explicitly depends on the intensity distributions among the power-law splitting traders.Furthermore, we find that (iii) the prefactor formula for the homogeneous LMF model systematically underestimates the actual prefactor in the presence of the heterogeneous intensity distributions.We believe that points (i)-(iii) are essential in examining the LMF model for data calibration.
These days, the availability of high-quality microscopic datasets has been significantly enhanced, and our recent articles [20] have verified the LMF prediction quantitatively.Considering such updates from the dataanalytic side, we believe that the classical LMF theory should be updated for precise empirical validation.
We must admit that our generalisation is just a first step forward for data calibration, and there is plenty of room to improve the trader model for market-order submissions.While only the heterogeneity of the ordersplitting strategies is included in our generalised LMF model, other characters, such as the trend-following (herding) behaviour among traders, are not included.Trend-following behaviour is empirically observed at the level of individual traders [21], which can be included in the market-order submission models for a more precise market description.