Deep Hawkes Process for High-Frequency Market Making

High-frequency market making is a liquidity-providing trading strategy that simultaneously generates many bids and asks for a security at ultra-low latency while maintaining a relatively neutral position. The strategy makes a profit from the bid-ask spread for every buy and sell transaction, against the risk of adverse selection, uncertain execution and inventory risk. We design realistic simulations of limit order markets and develop a high-frequency market making strategy in which agents process order book information to post the optimal price, order type and execution time. By introducing the Deep Hawkes process to the high-frequency market making strategy, we allow a feedback loop to be created between order arrival and the state of the limit order book, together with self- and cross-excitation effects. Our high-frequency market making strategy accounts for the cancellation of orders that influence order queue position, profitability, bid-ask spread and the value of the order. The experimental results show that our trading agent outperforms the baseline strategy, which uses a probability density estimate of the fundamental price. We investigate the effect of cancellations on market quality and the agent's profitability. We validate how closely the simulation framework approximates reality by reproducing stylised facts from the empirical analysis of the simulated order book data.


Introduction
Technological innovations and regulatory initiatives in the financial market have led to the traditional exchange floor being displaced by the electronic exchange.The electronic exchange is a fully automated trading system programmed to incisively enforce order precedence, pricing and the matching of buy and sell orders.Each order's pricing, submission and execution is performed using sophisticated algorithmic trading strategies, which account for 85% of the equity market's trading volume (Mukerji et al., 2019).Highfrequency trading (HFT, or high-frequency trader), a subset of algorithmic trading, is characterised by exceptionally high speeds, minuscule timeframes and complex programs for initiating and liquidating positions (SEC, 2014).The critical discussion on the role of HFT in a fragmented market has been reignited after the Flash Crash of 6 May 2010 (Kirilenko et al., 2017).This systemic intra-day anomaly only lasted for a couple of minutes, but temporarily wiped away a trillion dollars in market value.The analysis of agents resolved transaction level data in the E-mini by Kirilenko et al. (2017) also looks at the behaviour of market makers, whose inventory dynamics remain stationary in conditions of fluctuating liquidity.Even though the market design of E-mini has no high-frequency market maker liability, unlike equity markets, this seminal paper (Kirilenko et al., 2017) gave a boost to research aimed at understanding high-frequency market making or other liquidityproviding strategies in an algorithmic trading setting.
Market making is a liquidity-providing trading strategy that quotes numerous bids and asks for a security in anticipation of making a profit from a bid-ask spread, while maintaining a relatively neutral position (Chakraborty and Kearns, 2011).The high-frequency market making strategy can be characterised as subset of HFT that uses latency, at a scale of nanoseconds, to trade in a fragmented market (Menkveld, 2013).The growing literature reports that the market makers provide quality liquidity, improve market quality, contribute to price efficiency and have a positive but moderate welfare effect (Kirilenko et al., 2017;Brogaard et al., 2014;Menkveld, 2013).However, there is another strand in the literature that argues that the quality of liquidity is deceptive.The orders are characterised as phantom liquidity, which quickly disappears before other market participants can access it.The optimal design of market making strategies is therefore an important question for practical applicability, market design and security exchange regulations.
The research on market making spans numerous disciplines, including finance (Ho and Stoll, 1981;Glosten and Milgrom, 1985;O'Hara and Oldfield, 1986;Avellaneda and Stoikov, 2008;Guéant et al., 2013;Cartea et al., 2014;Ait-Sahalia and Saglam, 2017), agent-based modeling (Das, 2005;Preis et al., 2006;Wah et al., 2017;Chao et al., 2018), and artificial intelligence (Spooner et al., 2018;Ganesh et al., 2019;Kumar, 2020).Inspired by seminal work of Ho and Stoll (1981) and its mathematical formulation (Avellaneda and Stoikov, 2008), the quintessential research in finance considers market making as a stochastic optimal control problem.In a simplistic setting, the market is modelled as a stochastic process, in which market makers try to maximise the expected utility of their profit and loss under inventory constraints (Guéant et al., 2013).In parallel to inventory-based models, Glosten and Milgrom (1985) proposed information-based models, in which market makers face adverse selection risk emerging from informed traders.The unrealistic assumptions placed on market models to mathematically extract the market maker's asset pricing forces researchers to look beyond stochastic optimal control approaches.
Market making has also been extensively investigated in agent-based modelling (ABM, or agent-based model) literature (Das, 2005;Wah et al., 2017).The ABMs in market making evolved from zero intelligence to an intelligent variant by incorporating order book microstructure for order placement, execution and pricing policy.For example, Muni Toke (2011) reinforced the zero-intelligence market maker model with order arrival following mutually exciting Hawkes processes.The Hawkes process has been exhaustively used in an empirical estimation and calibration of market microstructure models deemed essential for designing optimal market-making strategies (Muni Toke, 2011;Hawkes, 2018;Morariu-Patrichi and Pakkanen, 2018).In these models, the arrival rate of orders is not dependent on the state of the limit order book.However, the empirical results suggest the existence of feedback loop between order arrival and the state of the limit order book, together with self-and cross-excitation effects, for which current models fail to account (Gonzalez and Schervish, 2017;Morariu-Patrichi and Pakkanen, 2018).In addition, the Hawkes process constrains the parametric specification for conditional intensity, which limits the model's eloquence.To tackle the parametric specification problem, Mei and Eisner (2017) proposed the Neural Hawkes process, in which the Hawkes process is generalised by calculating the event intensities from the hidden state of a long short-term memory (LSTM).Despite the success of the Neural Hawkes process in natural language processing (Mei and Eisner, 2017), the facile LSTM architecture might be inadequate when it comes to modelling noisy, asynchronous order book events.
In recent years, deep learning has made significant inroads into high-frequency finance.The Convolutional Neural Network (CNN) architecture and its variants were used to model price-formation mechanisms using order book events as input (Sirignano, 2019;Cont and Sirignano, 2019;Tashiro et al., 2019;Tsantekidis et al., 2017).However, the CNN architectures are not sophisticated enough to capture self-and cross-excitation effects in the limit order book (LOB) (Zhang et al., 2019).The deep long-short term memory (DLSTM) architecture performs a hierarchical processing of complex order book events, and as such is able to capture the temporal structures of LOB (Sagheer and Kotb, 2019a,b).However, training the DLSTM model directly through stochastic gradient descents, initialised with random parameters, may have led to the backpropagation algorithm being trapped within multiple local minima (Sagheer and Kotb, 2019b;Vincent et al., 2010).To circumvent the aforementioned limitations, the literature proposed an unsupervised pre-training of each layer and a stacking of many convolutional layers (Vincent et al., 2010).We use Stacking Denoising Autoencoders (SDAEs) together with DLSTM to resolve the random weight initialisation problem in base architecture (Sagheer and Kotb, 2019b).In addition, SDAEs are quite effective at filtering out noisy order-level data at minuscule resolutions.
The exemplary predictive performance of deep-learning models has encouraged researchers to augment order book data with agent-based artificial market simulation, for the purpose of investigating algorithmic trading strategies (Maeda et al., 2020).The success of the model is dependent on the simulation framework of the financial market being close to realism.However, algorithmic trading research is still waiting for market simulators that could be used for developing, training, and testing algorithms in a manner similar to classic Atari 2600 games simulator (Mnih et al., 2015).In this paper, we develop realistic simulations of the financial market and use them to design a high-frequency market making agent using the Deep Hawkes process (DHP).The DHP models the streams of order book events by constructing a self-exciting multivariate Hawkes process and a limit order state process, which are coupled and interact with each other.Based on a long stream of high-frequency transaction-level order book data for the different events (e.g.buy, sell, cancel, etc), the high-frequency market makers use DHP to accurately predict every held-out event.

Contribution
This paper is the first to incorporate DHP into the market making strategy, which allows feedback loops between order arrival and the state of the limit order book, together with self-and cross-excitation effects.We extend the neurally self-modulating multivariate point process (Mei and Eisner, 2017) to the deep framework by stacking SDAE with DLSTM, resulting in DLSTM-SDAE.The SDAE resolves the problem associated with weight initialisation, multiple local minima and ultra-noisy order book data that the stacked recurrent network fails to address.Our approach outperforms the NHP in predicting the next order type and its time.The gained predictive power helps agents to outperform the benchmark market making strategy, and uses a probability density estimate of the fundamental price.We outline our contribution below: 1. We designed a multi-asset simulation framework that is scalable and can augment markets of substantial size.The framework is built on realistic market architecture, interface kernels, a matching engine and the Financial Information eXchange (FIX) protocol.
2. We are first to introduce a feedback loop between order arrival and the state of the order book using DHP in the high-frequency market making setting.
3. We investigate the predictive and trading performance of the agents with the benchmark.
4. We explore the effect of cancellation on order queue position, agent's profitability, bid-ask spread, and value of order relative to queue position, in order to verify the existing empirical findings (Moallemi and Yuan, 2017;Dahlström et al., 2018).

Structure
The paper is organized as follow.Section 2 presents the background to the limit order book, market making, Hawkes process, Neural Hawkes process, and long short-term memory.Section 3 explores the research stream in market making strategies.Section 4 explains the novel deep Hawkes process.Section 5 illustrates the multi-agent simulation framework.Section 6 elaborates on the experimental configuration.Section 7 provides the results of the experiments.Section 8 presents our conclusions.

Background
In this section, we first introduce the limit order book together with basic definitions.We then briefly present examples of market making strategy, and review essential tools for designing and investigating the classical market making model.We start with the Hawkes process, the assumptions of which are violated in the financial markets.To check the missing links of the Hawkes process, we describe the Neural Hawkes process.Finally, we discuss the LSTM framework, which represents a divergence from selecting a parametric form for the conditional intensity in the Hawkes process variants.

Limit Order Books
The LOB is a centralised database for outstanding orders submitted by traders to buy or sell a specified number of securities on an exchange.In an exchange, the traders can primarily submit three different types of orders: limit orders, market orders, and cancellation orders.A limit order is an order to buy or sell a particular number of securities at a specified price or better.A market order is an order to immediately buy or sell a certain number of securities at the best available price in the LOB.The arrival of market orders is instantly executed against the best available price in the limit order book.Orders that exceed the size available at the best price automatically spill over to the next best available price.Unlike market orders, the limit orders rest in the order book, pending execution against a market order or being partially or fully canceled by traders.The limit orders posted near to bid and ask are executed instantly, but may be prolonged if market prices diverge from the requested price.Market makers utilise this attribute to design optimal trading strategies (Spooner et al., 2018).
In an exchange, several orders posted by traders can have the same price at a given time t.To effectively match the orders within each discrete price level, the LOBs employ various priority mechanism algorithms.The algorithm most commonly employed by various exchanges is price-time.In this case, for buy or sell orders, the matching algorithms give priority to the orders with the highest or lowest price.In the event of ties, preference is given to orders with the earliest submission time relative to other orders (Gould et al., 2013).Other prominent priority mechanism algorithms include prorata and price-size.Under the pro-rata algorithms, the ties at the given price are broken by distributing orders according to the depth in the LOB, while price-size mechanism algorithms break the ties by giving higher priority to larger orders.Our study makes use of price-time priority mechanism algorithms, as these encourage market makers to submit limit orders when designing their trading strategies.

High-Frequency Market Making
Market making is a liquidity-providing trading strategy that quotes numerous bids and asks for a security in anticipation of making a profit in the form of bid-ask spread, while maintaining a relatively neutral position (Chakraborty and Kearns, 2011).The modern market making strategy or high-frequency market making strategy can also be characterized as subset of HFT, where traders uses "lightning fast with a latency (inter-message time) upper bound of 1 millisecond, only engages in proprietary trading, generates many trades (it participates in 14.4% of all trades, split almost evenly across both markets), and starts and ends most trading days with a zero net position" (Menkveld, 2013).The foundation of designing optimal market making strategies is dependent on optimising order-handling costs, adversely selected bid or ask quote costs, and non-zero position risk-averse costs (Menkveld, 2013).
In order to better understand high-frequency market making, let us consider a simple example from LOB illustrated in Figure 1.The market making strategy places a buy order at 100.00 and sell order at 100.10.The execution of the both orders will give the traders profit of 00.10, which is also the spread.As millions of market maker's trades are executed each second, they can amass huge profits.
However, high-frequency market making strategies are always exposed to the risk of adverse price movements, uncertain executions and adverse selections (Penalva et al., 2015).To avoid the above risks, the high-frequency market making strategies seek to compete in the market through lower orderprocessing costs, fast matching engines and low latency.The fast matching engine reduces the adverse selection risk by facilitating a market-making strategy to immediately update quotes on the arrival order book information, while low latency reduces the search for market venues to mitigate a costly non-zero inventory position (Penalva et al., 2015;Menkveld, 2013).The array of sophisticated statistical or machine learning models used on the real-time order book data serve to optimise the transaction costs, which are directly related to profitability.

Hawkes Process
To set the stage for Hawkes process, we first explicate key concept related to counting process, point process, and conditional intensity function from the classical literature (Gao et al., 2018;Hawkes, 2018;Bacry et al., 2015;Embrechts et al., 2011).
Let us consider a positive sequence of event arrival time, {t n } n∈N , such that ∀ n ∈ N, t n < t n+1 , defined on probability space (Ω, F, P) with almost surely finite, right-continuous step function defined for all t ∈ R + , and complete information filtration (F t ) t≥0 , such that F = F N (t), t ≥ 0. Then the counting process N (t) and its analogous point process L(t) is defined as: where I {.} is the indicator function and { n : n ≥ 1} is a sequence of nonnegative random variable having independent and identical distribution.
In academic literature, counting and point process terminology is often indistinguishable.The reader is expected to infer the nature of the process depending on the context.For example, point process are often characterized by the distribution function of the occurrence of the event conditioned to the past, but problems associated with the conditional arrival distribution mean that this is not very practical.Instead, the conditional intensity function is used.For counting process N (t) with associated history H(t) and adapted to a filtration F, we define the conditional intensity as: We now define Hawkes process characterized by intensity λ(t) with respect to its natural filtration as: where µ is an the baseline intensity and φ is a non-negative kernel function such that ||φ|| 1 = ∞ 0 φ(τ )dτ < 1. Prominent kernel function used in finance is exponential decay φ(t) = αe −βt , where parameter α represents previous event weight and β past event duration.Now, we consider N (t) = {N i t } M i=1 as the M-dimensional counting process, and its analogous point process as {L i t } M i=1 .Similarly, we define the multidimensional Hawkes process as: where, µ = {µ i } M i=1 a baseline intensity vector and φ(t) = {φ i,j (t)} M i,j=1 a matrix-valued kernel that is component-wise non-negative, causal, and L 1integrable (Bacry et al., 2015).
We can also enrich the Equation 2.4 by associating each event with time of event t n , its component κ n and mark k n .For example, modeling the trades performed at event time t n with different volume k n and drawdown intensity κ n (Bacry et al., 2015).The vector intensity function of the multivariate marked Hawkes process will be defined as: where, µ = {µ i } M i=1 a exogenous intensities vector, φ(t) = {φ i,j (t)} M i,j=1 a matrix-valued kernel and ψ j (n) a impact function of marks.
The properties of the Hawkes process can be characterized thoroughly in an analytical manner due to the linear structure of the stochastic intensity (Hawkes, 2018;Bacry et al., 2015).The linear properties of a Hawkes process enable linear predictions of the models, given their base intensity and the kernels as parameters (the kernels are non-parametrically calibrated from the data).A Hawkes process can be appropriately approximated as autoregressive process, wiener processes, and clustering representation (Bacry et al., 2015).Simply put by Mei and Eisner (2017), " the Hawkes process supposes that past events can temporarily raise the probability of future events, assuming that such excitation is positive, additive over the past events, and exponentially decaying with time".In a simplified setting, these properties might be useful for modelling constrained processes, but might not be applicable to real world examples.For example, at large scales, the price formation process's microscopic variables, as they relate to order book events, do not diffuse toward Wiener processes.Similarly, the massive cancellation of orders in LOB might inhibit price rather than exciting it.

Neural Hawkes Process
The Neural Hawkes Process was (NHP) introduced to fill the gaps in the Hawkes process's unrealistic assumptions.Building on the earlier formulation, the positivity constraints on baseline intensity vector µ, kernels φ, and decay rate δ limit the eloquentness of Hawkes process.It fails to capture inhibition and inherent inertia effect, which are characteristics of realistic financial market (Mei and Eisner, 2017).
Let {t n , κ n , k n } n∈N is a event streams, where t n are times of occurrence of an event, its component κ n , and their corresponding mark {k n } n∈N in k n := {1, . . ., K}.Then, the probability of incidence of next event at time t n of type k n is P{(t n , κ n , k n ) | H n , (t n − t n−1 )}dt.The associated intensity function conditioned on the past events h for self-exciting multivariate point process or Hawkes process with exponentially decaying kernel function is: (2.6) The inhibition and inherent inertia effect are introduced in the self-modulating model, where we relax positivity constraints on parameters α and µ.The negative total resultant activation is then passed through non-linear transfer function (e.g.rectified linear unit (ReLU) function, softplus function etc.) such that: where µ k < 0, α j,k < 0 allows inertia and inhibition effect respectively (Mei and Eisner, 2017).The summation in Equations 2.8 places a restriction on the λk (t), where past events have an independent and additive influence.This deviates from reality, which is characterised by the existence of complex dependence between the intensities in terms of number of order event types and past event timings (Bacry et al., 2015).Mei and Eisner (2017) proposed the Neural Hawkes Process to learn and predict complex dependency by replacing the summation with a novel recurrent neural network.In this novel process, the hidden state vector h(t) controls the dynamics of time varying event intensities, which in turn depends on a vector c(t) of memory cells in a continuoustime long short-term memory (Hochreiter and Schmidhuber, 1997).

Long Short-Term Memory
The central idea of LSTM is the use of memory cell, which overcomes the problem associated with vanishing gradient (Arras et al., 2019).The memory cell of LSTM is a complex unit, built from alike nodes in a distinct connectivity pattern, with the novel inclusion of multiplicative nodes, represented in figure 2. A typical LSTM memory cell architecture contains a cell input activation vector x t , an input gate i t , a forget gate f t , a cell c t , an output gate o t and an output response h t .The distinctive feature of the LSTM approach, input gate and forget gate, govern the information flow into and out of the cell according to gate logic.Whereas, the output gate controls the amount of information flow from the cell to the output h t .A self-connected recurrent edge with fixed unit weight in the memory cell ensures that error can flow across many time steps without vanishing or exploding.
At each time step t, the recursive computation in the LSTM model proceeds according to the following equations: (2.9) where x t is input vector at time t, the activation function σ (x) / φ (x) is defined as sigmoid σ (x) = 1/(1 + e −x ) / φ (x) = tanh, W AB is the weight matrix between A and B (e.g., W xi is the weight matrix from the inputs x t to the input gates i t ), b B denotes the bias term of B with B ∈ {i, f, c, o} and denotes point-wise multiplication of the two vectors.To ensure compre-hensibility with standard literature, we follow the same naming conventions discussed in the paper by Zhu et al. (2016).

Literature Review
In this section, we briefly review three prominent streams of research into how the Hawkes process is used to design market making strategies.We are well aware of blurred and overlapping boundaries across research streams, but strongly believe in capturing the essence of Hawkes models in market making.

Stochastic Optimal Control
The classical finance literature employs a stochastic optimal control framework to determine market maker's optimal quotes for bid and ask in the presence of inventory risk, adverse selection, information asymmetry and latency (Ho and Stoll, 1981;Sandas, 2001;Avellaneda and Stoikov, 2008;Penalva et al., 2015).The market maker aims to maximise the expected utility of the profit and loss contour at closing time.By integrating a utility framework into the microstructure of LOB, Avellaneda and Stoikov (2008) were able to analytically derive optimal bid and ask quotes.In the continuous-time model, the mid-prices evolve according to Brownian motion, and the order arrival follows a Poisson process.In one particular case, a non-homogeneous Poisson process is reduced to the Hawkes process (Bacry et al., 2015).The unrealistic assumptions limit the model's ability to capture adverse selection effects, market impact and autocorrelation structures in the security being traded.Building on the seminal work off Avellaneda and Stoikov (2008), numerous researchers looked at price impact, adverse selection effects, and latency, together with different objective functions (Penalva et al., 2015).
The purpose of all of the improvements in the model is to provide numerical approximations for the associated stochastic differential equations.However, problems related to model ambiguity have yet to be addressed (Nystrom et al., 2014).
The profusion of transaction-level data at a minuscule resolution provides an unprecedented opportunity to apply Hawkes processes to the study of market microstructure with a view to deciphering price formation mechanism, liquidity dynamics and volatility (Morariu-Patrichi and Pakkanen, 2018).The use of Hawkes processes to model extreme price moves and order flow is well integrated with the high-frequency market making model (Nystrom et al., 2014;Bacry et al., 2015).For example, Nystrom et al. (2014) proposed a market making model based on model risk or uncertainty, in which inventory dynamics, fill rates and price formation are modelled using two independent Hawkes processes.While the market making models based on the Hawkes processes have been moderately successful in integrating a realistic order arrival process, they fail to incorporate the complex interaction between the self-and cross-excitation effects of the endogenous state variables that describe the LOB (Morariu-Patrichi and Pakkanen, 2018).In the classical "buy low and sell high" market making strategies, (Cartea et al., 2014) used a multivariate Hawkes process to capture the interaction between market orders and the state of LOB.However, the constraints placed on parametric specification for conditional intensity by the Hawkes process limit the model's eloquence.

Agent-Based Models
Market making has been also been extensively investigated in ABM (Das, 2005;Wah et al., 2017).Using a bottom-up approach, the ABMs try to artificially simulate the systems' aggregate behaviours caused by the actions and interactions between heterogeneous autonomous agents (Samanidou et al., 2007).The ABMs in market making range from the simple "zero intelligence" of mainstream economics to representing in detail the full complexity of the order book and market microstructure.The seminal work of Muni Toke (2011) assimilated a microscopic, dynamical statistical model for the continuous double auction (Smith et al., 2003) into Hawkes framework.The reinforced zero-intelligence market making model populated with the liquidity provider and liquidity taker uses mutually exciting Hawkes processes for the order arrival process.However, the parametric specification of the exponential kernels contradicts the empirical results on the existence of a feedback loop between order arrival and the state of LOB (Gonzalez and Schervish, 2017;Morariu-Patrichi and Pakkanen, 2018).The Neural Hawkes process Mei and Eisner (2017) applied a Neural Hawkes process to natural language processing in order to tackle the parametric specifications problem, and generalised the Hawkes process by calculating the event intensities from the hidden state of an LSTM.
Another prominent ABM in market making research (Wah et al., 2017)uses empirical game-theory analysis to examine the effect of market making on market performance in different scenarios.In the paper, the authors use multiple background traders (e.g.traditional zero intelligence agents) to produce realistic market microstructure.The Bayesian market makers were then used to investigate welfare effect, trading gains and strategic behaviour.The simplistic adaptive trading strategies adopted by the market makers ought to be sufficient to investigate market equilibria, but would face serious challenges in relation to developing realistic market making strategies that also take account of market microstructure.The academic literature needs to assimilate the success of artificial intelligence in designing trading strategies that interact with close-to-reality market simulations.

Artificial Intelligence
The success of deep reinforcement learning in board games (Silver et al., 2017) and video games (Mnih et al., 2015) soon sent ripples through the world of finance.Drawing on the classical mathematical setup by Avellaneda and Stoikov (2008), Gueant and Manziuk (2019) used a model-based deep actor-critic algorithm to find the optimal bid and ask quotes across a highdimensional space of corporate bonds.Spooner et al. (2018) rreconstructed a limit order book from historical data and used it to construct a market making agent using temporal-difference reinforcement learning.The next natural progression, a multi-agent simulation of a dealer market, was developed by Ganesh et al. (2019) to understand the behaviour of market making agents.The success of this model is dependent on a realistic simulation framework of the financial market.However, algorithmic trading research is evolving, and is still at the stage of exploiting market simulators that could be used to develop training and testing algorithms in a similar vein to the classic Atari 2600 games simulator (Mnih et al., 2015).
Despite the exemplary predictive performance of deep learning models, market making has not yet been comprehensively addressed.If we return to stochastic optimal control in market making, we notice that the dynamics of order flow are mostly described by variants of Hawkes processes, while the resultant control algorithms are associated with continuous semi-martingales, which are hard to solve recursively (Gueant and Manziuk, 2019).In addition, the unavailability of a realistic simulation framework and high-quality order-level data restricts researchers' ability to replicate the successes of deep learning in market making.
Nevertheless, deep learning models (based on CNN architecture) have done moderately well when it comes to modelling price-formation mechanisms using order book events as input (Sirignano, 2019;Cont and Sirignano, 2019;Tashiro et al., 2019;Tsantekidis et al., 2017).The literature is proof that empirical investigation of order-level data has always augmented the model of choice.In short, the ABM, statistical modelling and artificial intelligence approaches to market making all have desirable attributes that the others lack.For example, the CNN architectures lacks self-and crossexcitation effects while modelling LOB dynamics (Zhang et al., 2019).The DLSTM architecture performs hierarchical processing of complex order book events, and as such is able to efficiently capture the temporal structures of LOB (Sagheer and Kotb, 2019a,b).As market making involves placing optimal bids and asks, the DLSTM architecture, together with the Hawkes process, would efficiently model order arrival and price-formation mechanisms by constructing a self-exciting multivariate Hawkes process with limit order state process, which are coupled and interact with each other.

Deep Hawkes Process
In this section, we propose that the Deep Hawkes Process (DHP) be used to concurrently model the order book event timings and associated event types.By assimilating it into the order arrival process, the market makers have control over the sending of different orders at specific points in time.The basic idea behind our approach is to view the conditional intensity of the Hawkes process as a nonlinear deterministic function of past history, and to use DLSTM to automatically learn a high-dimensional representation from the data.We believe that capturing the constraints of the Hawkes process and recurrent network architecture enables a replication of the success of the natural language process (Du et al., 2016;Mei and Eisner, 2017) and time series prediction (Sagheer and Kotb, 2019a,b) in market making.

Deep Long Short-Term Memory
Despite the success of recurrent marked temporal point processes and the Neural Hawkes process across disciplines, the conventional shallow recurrent network architecture and its variant long short-term memory (LSTM) might be inadequate when it comes to modelling noisy asynchronous order book events.Lately, the DLSTM architecture (Sagheer and Kotb, 2019a,b;Zhu et al., 2016) used in action modelling and multivariate time series forecasting validated its precedence over traditional LSTM architecture.The architecture of the DLSTM is same as the previously introduced LSTM, apart from the fact that it involves multiple LSTM layers stacked on top of each other.

DLSTM-SDAE
In an ideal setting, the performance of the DLSTM model proved to be empirically better than existing contemporary statistical and deep modules.However, as the non-linear variables are modelled in such a way as to be scaled up, the overall learning of the models suffers greatly due to the backpropagation algorithm trapped within multiple local minima (Sagheer and Kotb, 2019b).This may be the biggest hurdle when it comes to modelling limit order book events that comprise complex, asynchronous non-linear multivariate time series.The ultra-noisy order book data also makes processing more challenging.To circumvent the limitations of the conventional stacked LSTM model, we use the stacked denoising auto-encoder (SDAE) together with DLSTM.The SDAE enables the deep neural networks with multiple nonlinear hidden layers to learn complex features from noisy limit order book data (Li et al., 2019;Vincent et al., 2010) and resolves the random weight initialization of LSTM's units problem in DLSTM (Sagheer and Kotb, 2019b).Figure 3 shows the proposed DLSTM-based SDAE architecture for DHP.As shown in Figure 3, the reconstructed order book data is denoised by SDAEs layers in in DLSTM-SDAE architecture.In this method, at the first layer the input x t is corrupted into xt using stochastic mapping xt ∼ S D ( xt | x t ).Then, the autoencoder maps corrupted input xt to a hidden representation h = f θ ( xt ) with encoder f θ ( xt ) = (W xt + b).Lastly, the decoder g θ reconstruct z = g θ ( xt ) from the hidden representation h.The parameters θ and θ are trained using stochastic gradient descent to minimize reconstruction error measured in the squared error loss L 2 (x, z) = x − z 2 .Once mapping is learned, the high-level hidden state h is applied for training the next layer.For detail learning procedure in SDAE, please refer to seminal paper on the subject (Vincent et al., 2010).
At time t, the denoised input x t from SDAE is then passed to first layer of LSTM together with previous hidden state h 1 t−1 .The hidden state at time t, h 1 t is calculated using recursive LSTM procedure discussed in Equation 2.9.Its is then moved to next time step and LSTM layers.In the second layer, the hidden state h 1 t and the previous h 2 t−1 is used to compute h 2 t and procedure repeats until last layer is complied.

Model Formulation
Let {t n , κ n , k n } n∈N be a stream of order book event, where t n are times of occurrence of an event, its component κ n , and their corresponding mark {k n } n∈N in k n := {1, . . ., K}.Then, the probability that the next event occurs at time We are interested in the model to predict next event stream {t n , κ n , k n } given a past history of event k n , evaluate its likelihood and simulate the next event stream by learning from the past event stream.Equation 4.1 shows the associated intensity function of the DHP with relaxed positivity constraints: The hidden state h(t) is updated from the memory sell c(t) as in Equation 4.2.
The life of interval (t n−1 , t n ] is determined by the next event k n at t n , DLSTM reads {t n , κ n , k n } and updates the current memory cells c(t) to c n+1 , associated with hidden state h(t n ).The other parameters of the DLSTM are recursively updated according to the Equation 4.3.
where x n is n th input vector represented by one hot encoding of new order book event k n ; the activation functions σ (x) / φ (x) are sigmoid / hyperbolic tangent function, respectively; W AB (e.g., W ci ) is the weight matrix from the memory cell to input gate vector; b B denotes the bias term of B with B ∈ {i, f, c, o, d}, s is exponential decay parameter and f (x) = s log(1+exp(x/s)), s > 0 is scaled soft plus function.As it can be seen in the Equation 4.3, the parameters are updated using the hidden state h(t n ) at time t n , succeeding its decay over interval t n − t n−1 rather previous hidden state (Equation 2.9).The memory cell c(t) on the interval (t n−1 , t n ] follows power-law distribution decaying from c n+1 to c n+1 and defined as: The DHP, with novel discrete update of stacked LSTM state, allows the model to capture a delayed response, fits non-interacting event pairs, and copes with partially observed event streams.Mei and Eisner (2017) discuss in detail these benefits of the neural version of the model.In order to ensure mathematical tractability, we have illustrated parameter updates for one of the layers of stacked LSTM, but this can be easily extended to deep architecture.For example, the hidden state h b at block b in stacked LSTM is recursively computed from b = 1 : N and t = 1 : T using

Feedback Loop Exploration
The empirical results (Morariu-Patrichi and Pakkanen, 2018; Gonzalez and Schervish, 2017) indicate the existence of feedback loop between the order flow and the shape of the LOB, together with the self-and cross-excitation effects.In order to efficiently capture this feedback effect in high-dimensional parameter space, we infuse the feedback loop exploration process into the DLSTM-SDAE architecture, as discussed in Figure 3.In connection with designing the deep network architecture and appropriate regularisation, we take into consideration that the network can automatically explore distinct feedback loops for different types of events and their codependency.For example, the feedback effect of market buy and sell orders on price, volume and the bid-ask spread of LOB.
Consequently, we design a fully connected deep network in which each neuron represents an LOB state or feedback effect of the preceding layer, in order to automatically explore the feedback loop.Furthermore, the neurons in the same layer are partitioned into B blocks to take into account different combinations of feedback loops.The corresponding regularisation is incorporated into the loss function and described by: where L is the loss function of the DLSTM, and other two terms are feedback loop regularization applied to each block in the network (Zhu et al., 2016).W xB ∈ R N N ×K J is weight matrix, with number of neurons N N and inputs dimension K J .The S characterizes the set of gates and cell in LSTM neurons for each block in DLSTM.Lastly, the W 2,1 = i j w 2 i,j is a structural 21 norm.The loss function (Equation 4.5) was solved by using Adaptive Moment Estimation (Adam) (Kingma and Ba, 2015).Adam optimization is an augmentation to stochastic gradient descent that is memory efficient, extremely insensitive to hyperparameters, works with sparse gradients, is appropriate for non-stationary objectives, and learns the learning rates itself on a per-parameter basis.It is well suited for highly noisy and/or sparse-gradient order book data.

Parameters Estimation
Given a collection of sequence of order book events S, {t n , κ n , k n } n∈N , the the log-likelihood of the model can be expressed as the sum of the log-intensities of lapsed event minus an integral of the aggregate intensities over the whole interval observed till T : The parameter (k, t) is estimated maximizing L using Adam (Kingma and Ba, 2015) and Monte Carlo methods (Mei and Eisner, 2017).The frequently used thinning algorithm adapted from multivariate Hawkes process is then used to sample random sequence from the model.

Multi-Agent Simulation Framework
Multi-agent-based modelling is a bottom-up computational modelling approach intended to artificially simulate the aggregate behaviours of systems caused by the actions and interactions between heterogeneous autonomous agents in an environment or environments (Samanidou et al., 2007).In this section, we describe the important components of multi-agent-based modelling, including environment (market simulator), agent ecology (trading strategies) and reward (profit and loss), to study the behaviour of market making agents whose strategies employ Deep Hawkes processes.

Market Simulator
The market simulator is an essential tool for designing, evaluating and backtesting algorithmic trading strategies under various market scenarios.The market is flooded with various proprietary and open-source financial simulation frameworks, but the constraints associated with licences, application interfaces and software design limit their usability (Maeda et al., 2020;Izumi and Toriumi, 2009).In this paper, we have designed a multi-asset market simulator from scratch, which is scalable to markets of substantial size.The asynchronous event-based interface is built over realistic market architecture, interface kernels, a matching engine and the Financial Information eXchange (FIX) protocol -an open electronic communications protocol standard used to carry out trades in electronic exchanges.

Market Architecture
The market architecture consolidates the communication interface, market and matching engine.Figure 4 outlines key components of market architecture and their interaction from a high-level perspective.The agents connect to the market via a kernel that hosts order management details.This acts as a transmission channel between agents and markets, thereby providing the extreme throughput and lowest latency for order transactions.It also throttles the amount of transactions, as per the market requirement.As such, there is a guarantee of fairness between agents waiting to place orders.The markets represent an information interchange, in which heterogeneous agents communicate through kernels for order transactions, processing and execution according to the matching engine, as per the financial instruments.The markets respond to order status by sending an execution report that covers the period from start to market reset event.This provides opportunities for agents to tweak the parameters in their trading strategies after every trading period if required.

Agents Kernel Markets Matching Engines
Figure 4: Market Simulator.

Communication Interface
The communication interface between agents and markets follows the FIX standard protocol to increase efficiency, competition and innovation.It is an electronic communications protocol designed for the real-time exchange of information between agents and exchanges, including agent identifiers, order identifiers, order handling, trade notifications, broadcasts and execution reports.The widespread adoption of the FIX protocol across financial markets reduces costs associated with connectivity, regulatory compliance, liquidity searches and transactions.Using the FIX protocol, multiple agents can interact with the market via kernels, simultaneously and independently.

Matching Engine
At the core of the market simulator are several matching engines for different financial instruments.Each matching engines matches bids and asks to execute trades in specific instruments.The orders are matched using pricetime priority mechanisms.In this context, among bids or asks, the matching algorithms give priority to orders with the highest or lowest price.The ties are broken by giving preference to orders with the earliest submission time compared to other orders.Other well-known priority mechanism algorithms are pro-rata and price-size (Gould et al., 2013).

Market Ecology
The success of a financial simulation framework depends on the precise representation of market ecology that can adequately mimic the real market design.In finance literature, the term "market ecology" (Farmer, 2002) refers to the composition of heterogeneous trading strategies that keep evolving over time in response to pressure from a contrasting market.The correct mapping of financial market ecology requires the availability of agent-resolved orderlevel data, which enables identification of sources and events in the market.
In the absence of agent-resolved data, the trading strategies are classified by theoretical considerations, the results of surveys, direct investigations of the trading profile of classes of investors and proxies for HFT (Kirilenko et al., 2017;Mankad and Michailidis, 2013).In this paper, we adapt the market ecology from the different strands of academic literature (McGroarty et al., 2019;Paulin et al., 2019;Musciotto et al., 2018;Kirilenko et al., 2017;Leal et al., 2016;Mankad and Michailidis, 2013;Toth et al., 2012).

Deep Hawkes Market Makers
Despite the existence of sophisticated market making strategies, the classical "buy low and sell high" strategy (Cartea et al., 2014) is a preferred strategy to make money in the securities market.The success of making a profit from short-term price predictions on bids and asks hinges on placing orders at precisely the right time.The mathematical modelling of the complicated order arrival process is done using the Poisson process (Chakraborti et al., 2011), which assumes that orders arrive randomly.However, this is contrary to empirical findings, which show that order arrival times are strongly connected (Rambaldi et al., 2017), have self-and cross-excitation effects (Morariu-Patrichi and Pakkanen, 2018) and that a feedback loop exists between the order arrival and the state of the LOB (Gonzalez and Schervish, 2017).In this article, we incorporate the feedback loop between order arrival and the state of the limit order book, together with self-and cross-excitation effects, using DHP, to design a high-frequency market making strategy.
The order book events in the securities market are stochastically excited or impeded by a pattern in the past event streams.The market makers are interested in learning the distribution and structure of order book events stream to accurately predict the next order (limit orders, market orders, cancellations, etc.) together with an associated labels (price, volume, etc.).Given a stream of order book events {t n , κ n , k n } n∈N , the market makers calculate the probability that the next event occurs at time t n is of type k n and its probability density conditioned on the history of events H n by: (5.1) To predict the time and the next event having minimum loss without information about the time t n , we choose tn = The associated intensity function for calculating the next order book events is the same as DHP as described in Equation 4.1.
The real securities market has numerous distinct order book events, which is difficult to incorporate in our simulation framework.For the sake of computational tractability, we decided to only include limit order buy/sell, market order buy/sell, and partial/full cancellations.We also assume that the high-frequency market maker is trading a single security in a market whose price at time t is denoted by p t .Unlike traditional market making strategies (Spooner et al., 2018), the DHP market makers can trade with rational limit or market order quantity, price surge and bid-ask spread distributions.
The deep hawkes market maker (DHMM) place orders at a specified depth relative to the mid-price, p t .At each time step t, the DHMM agent's pricing mechanism is given by: where p t is the mid price at time t, J i,u is the number of upward jumps with i ticks, and J i,d is the number of downward jumps with i ticks between 0 and t, i = 1, . . ., δ.The intensities of J i,u and J i,d are λ k,u (t) and λ k,d (t), respectively, (5.4) The parameters of the Equation 5.4 are calculated as discussed under model formulation in Section 4.3.
Most of the quantitative finance research into the high-frequency market making problem is based on the assumption of constant order size (Huang et al., 2015).However, the empirical analyses suggests that the order sizes have striking statistical distribution at different timescales (Lu and Abergel, 2018;Rambaldi et al., 2017;Mu et al., 2009).The limit order size follows q-Gamma distribution (Mu et al., 2009).The market maker's willingness to sell or buy specified quantities of securities is defined as: where I t is inventory at time t, Ī maximum inventory, and Γ(α, β) is q-Gamma distribution is described as: One striking feature of equity markets is the existence of short-lived limit orders that are modified or cancelled once every 50 milliseconds (Dahlström et al., 2018).The limit order cancellation is an important characteristic of market making strategies that are related to expected profit, bid-ask spread and order queue position.We model cancellation sizes as follows: where P c (q; Q) is truncated geometric distribution (Lu and Abergel, 2018).
The LOB is represented as [Q −i : i = 1, . . ., L] and [Q i : i = 1, . . ., L] with corresponding quantities q i .The truncated geometric distribution is defined as: Finally, the market order follows a mixture of truncated geometric distribution and the dirac delta distribution (Lu and Abergel, 2018).The market order size that a market maker is willing to buy or sell is described as: The parameters {p 0 c , p 0 m , θ 0 , θ k , θ ∞ } are estimated using a maximum likelihood method.The details of estimation and calibration can be retrieved from the Lu and Abergel (2018).The market orders are used to clear the unexecuted inventory at the end of trading.

Probabilistic Market Makers
To ensure fair competition with DHMM and incorporate the existing state-ofthe-art, we include a probabilistic estimate-based benchmark strategy (Das, 2005) adapted to our simulation framework.In this market making strategy, the agent attempts to track the fundamental price of securities by maintaining a probability density estimate of the fundamental price.
The probabilistic market makers (PMM) intent to sell or buy q unit of security at time t for price p a,b t in a market populated with uninformed, informed and noisy informed agents.Let us assume that the fundamental price of the security at time t is f t , ξ be the fraction of informed agents and the probability of buy or sell orders by the uninformed agents is ζ.The noisy informed agents assumes that the price of securities follow normal distribution p t = f t + N s (0, σ 2 n ).Whereas the fundamental price of security evolves according to a jump process.The order book event defines the jump and prices follow normal distribution.The PMM ask and bid prices at time t are then defined as: (5.8) where P Buy,Sell is a priori probability of a buy or sell order and N s (0, σ n ) is sample from normal distribution.The bids/asks equations derivation, its approximate solutions, density estimate update, and algorithm are discussed in the benchmark paper (Das, 2005).We add layers of complexity on the benchmark algorithm by allowing the PMM to sample order or cancellations size from a normal distribution.The order cancellations size at time t determined as follows:

Fundamental Traders
Fundamental traders decide to trade based on the presumption that the securities prices will eventually return to their basic, intrinsic or fundamental value.Therefore, they strive to buy (sell) the security when the price at time t is below (above) its fundamental value.Fundamental traders are predominantly categorised as buyers or sellers, depending on the inventory at the end of a trading day.The accumulation of directional net positions is an important element in identifying buyers or sellers, since the latter acquire sizable net positions by executing numerous small-size orders, while the former only execute a couple of large orders (Kirilenko et al., 2017;Mankad and Michailidis, 2013).According to the agent ecology literature, the fundamental traders assume that fundamental value of a security will follow a random walk: Given last mid-price at time t, the limit order price by fundamental traders is determined by: Finally, the order under fundamental traders strategy are calculated as follows: The decision to buy or sell is governed by following logic: Buy, q t;f ≥ 0 Sell, q t;f < 0 (5.13)

Chartist Traders
Unlike fundamental traders, the chartist or technical trader's strategy depends on predicting future price direction based on past price movement.The chartist traders in our simulation framework use a simple trend-following strategy described in Leal et al. (2016).The price, order size and trade direction are described below: Buy, q t;c ≥ 0 Sell, q t;c < 0 (5.16)

Noise Traders
In the securities market, noise traders make trading decisions based solely on non-information.In the models, they serve as an essential proxy for randomness, no trade and no speculation.We incorporate the slightly more evolved noise or background traders from a seminal paper by Wah et al. (2017).The noise traders ask or bid price is determined by its fundamental private valuation and trading strategy.The fundamental value evolves according to a mean-reverting stochastic process (Wah et al., 2017).
The private valuation for the noise traders at time t is given by: The noise trader calculates its private value and decide to buy or sell q t;n order sampled from a normal distribution,N (0, σ 2 n ), with equal probability of 1/2.

Reward Design
Unlike traditional reward design, in which an agent's performance is assessed at the end of a trading period, we calculate the agent's instantaneous rewards at each timestep t.The reward function for the agents (j) comprises profit & loss (PnL), inventory cost (IC) and transaction cost (TC).The PnL is simple profit or loss made by the agents through buying or selling security at the exchange.Its is defined as: As a agent's inventory is exposed to the volatility of the market price, we incorporate it our reward design using a term associated with inventory cost.Its given by: (5.20) Finally, we consolidate a quadratic penalty on the number of shares executed to account for transaction cost.Specifically, the transaction cost for order executed q e t by agent j till time t is: The reward function is the sum of orders bought or sold plus inventory cost less a transaction cost penalty.R t;j = P nL t;j + IC t;j − T C t;j . (5.22)

Capital Allocation
The amount of currency units held by an agent is represented by capital.
Prior to securities market opening in simulation framework, every heterogeneous agents endowed with different amount of capital by a power law distribution.The agent's initial capital c a follows a power law if it is drawn from drawn from a probability distribution p(c a ) ∝ c a −αa .The α a is referred as scaling parameter which ordinarily lies between 2 and 3 (Clauset et al., 2009).

Experiments
In this section, we elaborate on data, its processing, performance metrics, benchmarks, training and the parameter configuration for the proposed model.

Data
We use the publicly available historical Nasdaq TotalView-ITCH 5.0 data feed sample1 to reconstruct limit order book (Huang and Polak, 2011).The reconstructed database provides tick-by-tick details of full order book depth by listing every quote and order at each price level of a specific security in Nasdaq, NYSE, and regional-listed securities on Nasdaq.The raw data feed in the binary format has a series of sequenced messages to describe the system, securities, order, and trade events at a resolution of the nanosecond scale.The event stream at nanosecond timestamp guarantees the inclusion of stochastically missing events which might increase the predictive accuracy of the Deep Hawkes model.Although the neural hawkes model (Mei and Eisner, 2017) is expressive enough to take account of missing event stream, it makes sense to access the performance of deep hawkes model at millisecond resolution order book data as compared to nanoseconds.Nasdaq uses multiple messages to indicate the current order, trading, system, and circuit breakers event's status as discussed in technical report (Nasdaq, 2020).For mathematical tractability, we have sampled high frequency data for hundred most liquid securities over eight days from reconstructed orderbook.The extracted sample data consists of approximately a billion transaction records at nanosecond resolutions together with the possible event of limit order buy/sell, market order buy/sell, and cancellations partial/full.The reconstructed limit order book for Apple at 11:21 on March 29th, 2018 is shown in Figure 1.
The reconstructed orderbook data is divided into training, validation and test set.For a single security at nanosecond and millisecond resolution, the descriptive statistics are given in Table 1.The validation set is included to optimize the model's hyper-parameters while training, thus having control at over-fitting.To avoid high variance in the data set, we only record the average value over multiple splits denoted by ≈ in the Table 1.

Performance Metrics
DHMM agents use DHP to accurately predict Experiments order book events (buy, sell or cancel) and their timing.The accuracy of DHMM's predictions in terms of events and time is an important determinant of its trading profitability.The performance metrics are vital components for measuring the performance of the trained model's prediction reliability with observed test data.The widely used scale-dependent metric (Hyndman and Koehler, 2006), root mean square error (RMSE) and classification error rate (ER) were used to evaluate the prediction performance of the Neural Hawkes model.Following Mei and Eisner (2017), we predict each prevailed order book event stream {t n , κ n , k n } from the past event stream H n and evaluate prediction using RMSE and ER.
The predominant metric for evaluating the performance of the agents is profit and loss at the end of the trading period.However, this approach may be misleading, as agents are tested across heterogeneous securities, with varying pricing and liquidity structures.Alternatively, to efficiently capture spread, we use a normalised PnL (NPnL) with inventory and quadratic transaction costs.The NPnL is calculated every hour by dividing the total reward by the weighted average market spread.To take account of the small inventories maintained by the market maker,, Spooner et al. (2018) introduced the mean absolute position (MAP) metric.An extreme score under this metric indicates a risky speculative strategy, while a moderate one indicates a strategy based on a stagnant market.We record the variability for NPnL and MAP using the standard deviation and mean, respectively.

Benchmarks
We aim to evaluate the performance of the DHMM with a modified probabilistic estimate-based benchmark strategy (Das, 2005).The market making strategy is an extension of the classic information-based model (Glosten and Milgrom, 1985), in which agents use the probability estimates of the fundamental price of securities to set bid and ask prices.The agents can sample limit, market or cancellation orders from the normal distribution or contradictory to a unit market order.We implement the probabilistic estimatebased strategy at the top of our simulation framework in continuous-time simulation rather than a discrete-time simulation.This provides the perfect test-bed to assess the performance of the simulation framework in extending the discrete-time mechanisms to continuous-time, where heterogeneous agents interact asynchronously.
The DHP extends the seminal Neural Hawkes process to the deep learning framework in a market making setting.We introduce the novel architecture to circumvent complications related to random weight initialisation, training and noisy order-level data (Sagheer and Kotb, 2019b).Given that the Neural Hawkes process is the kernel of our proposed deep model, we evaluate the performance of market making agents that use the earlier model in their trading strategies.For comparison purposes, we use the same architecture and training mechanism as discussed in the seminal paper (Mei and Eisner, 2017).The neurally self-modulating multivariate Hawkes process also acts as benchmark model for evaluating DHP's performance on the prediction of order book events and in terms of time on the reconstructed limit order book data.

Training
The high-frequency marker making agents uses DHP to learns from reconstructed limit orderbook data to place bids or asks or cancels at suitable time.The learned prediction is then infused into the market making strategy to trade with the simulation framework.The agents learn the system parameters in a two-step process.Firstly, the preprocessed order book stream, n-th event, k n is embedded into a latent space before passing into SDAE layer together with timing t n .The deep network, consists of a stack of multiple DAEs, generate higher representation of convoluted order book events interaction.The high level denoised representations are then fed into DL-STM to predict the next order's type and the time to evaluate the loss.The DLSTM-SDAE learns the deep representation in two phases: pre-training and fine-tuning.In pre-training, a greedy layer-wise structure is used to train each layer of DAE iteratively, to form a three-layer SDAE.At the end of pre-training, a stack of three LSTMs is produced as an output of SDAE.Secondly, the parameter of DLSTM-SDAE is then fine-tuned to minimise the error in predicting events and time, using using stochastic gradient-descent and Adam optimisation algorithms.The early stopping methods used on the validation set's log-likelihood performance were also used on the heldout validation set to avoid overfitting.We also add isotropic Gaussian noise to augment generalisation in the performance of the events' classification.Table 2 lists the hyper-parameters tuned by validation set performance for the DLSTM-SDAE network architecture.The other non-LSTM parameters includes s n ∈ R and W n ∈ R D as discussed in Section 4. The market making agent using NHP uses single layer LSTM and the number of hidden nodes from a small set (64,128,256,512,1024) as described in the base paper (Mei and Eisner, 2017).The hyperparameters are optimized based on the performance of the validation set.
The high-frequency market making agents are trained in the simulation framework for 1000 trading days.Each trading day starts at 9:30 and lasts until 16:00.Two hundred trading days were used to fine tune the hyperparameters using random search.We acknowledge the existence of differences between the real data and simulated data, but firmly believe that they are generated from the same mechanisms -a claim substantiated by agent-based models that reproduce stylised facts similar to empirical findings.Taking the above into consideration, we train market makings agents using DHP and NHP for 100 trading days five times.The aim of this exercise is to synchronise the agents' learning over different sets of data generated from the same stochastic process.We then test the performance of the agents against the benchmark for 300 trading days.To ensure fair competition with market making agents, we use heterogeneous market ecology consisting of fundamental, chartist and random agents.The important parameters pertaining to the trading agents in the simulation framework are given in Table 2.

Results
In this section, we investigate the performance of market making agents in predicting types of order book events and their timestamps.Having learned which orders to send and at what time, we evaluate the agent's trading performance in the simulation framework.We tweak order cancellations to examine the impact on the agent's profitability and the microstructure of the order book.We then check the robustness of the model by performing sensitivity analysis.Finally, we validate our simulation framework by reproducing stylised facts with our simulated data.

Predictive Performance
Given a stream of order book events {t n , κ n , k n } n∈N , the market makers seek to predict the next event type and its time.We evaluate predictive performance of tn and kn using RMSE and ER, respectively.To avoid getting entangled in the problem of overfitting, we divide the training set into the sub-training and validation sets.We train DHP and NHP models on the subtraining set so as to choose hyperparameters for validation set.Following the training procedure of Mei and Eisner (2017), we generate the predictive performance of the market making agents on reconstructed order book data at nanosecond resolution in Figure 5.As is evident from the figure, neither model is invariably better at predicting events or, in particular, time.It seems that the both models do not explicitly address the complex dynamics of asynchronous order book data at nanosecond resolution.The event dynamics at nanosecond timestamps need much more sophisticated models to filter noise and to model event interaction and non-linearity.In Figure 6, we evaluate the predictive performance of the high-frequency market making agents using millisecond data sampled from the reconstructed order book.Compared to the earlier results, the DHP model's performance has drastically increased.In addition, the time prediction is consistently better than with the NHP.The deep model with novel architecture and pretraining module are sophisticated enough to capture excitation and feedback effects in the order book.The results also substantiate the claim of the NHP model regarding stochastically missing data.The order book events at millisecond timestamps theoretically omit the events at the finer time resolution, but they are generated by a different mechanisms.This is the reason why there are completely different results at the two time resolutions.The Deep Hawkes model presented here is expressive enough to learn true predictive distribution with scholastically missing events, but only if they are generated from the same mechanism.By integrating the predictive capabilities into the market making strategies, the agents trade in a simulation framework populated with heterogeneous trading strategies.In the next section, we explore the agents' trading performance.

Trading Performance
The trading performance of the high-frequency market making agents is discussed in Table 3.Our simulation framework evaluates various models, including the Neural Hawkes model, the benchmark probabilistic estimate model, and our proposed Deep Hawkes model.According to the performance metrics specified in Table 3, our proposed agent using DHP (DHMM) consistently outperforms the PMM and NHMM, which suggests that the proposed trading agent benefits by learning the robust microstructure of order book data.Further, the DHP lets the agent capture the self-and cross-excitation effects of the limit order book together with a feedback loop, to place the right order at the right time.We discuss the performance of each agent in detail below.As shown in Figure 6, the DHMM is better at predicting the type of order and its time compared to NHMM.This is an important element of the market making strategy.The novel DLSTM-SDAE architecture allows the agents to learn the hidden representation of the noisy order book data, and therefore to place orders that add to its profitability.Furthermore, DHMM exhibits a faster convergence rate compared to NHMM, as shown in Figure 7.
The baseline strategy used by PMM maintains a probability density estimated on the basis of fundamental price.The fundamental price evolves according to the jump process, following a normal distribution.This works in the favor of PMM, which makes more profit at the beginning, as verified in Figure 7.Over time, however, the DHMM and NHMM learn the art of placing the right order with the right intensity.Afterwards, the profitability of the PMM fall dramatically.The PMM might perform better if it took a long position over several days, rather than trading intraday.It would be interesting to check the performance of the PMM agents with different probability density estimate conditions on the joint distribution of microstructure features.

Order Cancellation Effect
Massive numbers of order cancellations in a short period are a distinctive attribute of the equity market.For example, at Nasdaq Nordic, order cancellations typically account for 40% of submitted limit orders on a particular trading day.Market making strategies using limit order cancellations contribute to the market marker's profit, bid-ask spread and order queue position (Dahlström et al., 2018).We study the distribution of profit, bid-ask spread and order queue position by removing the cancellations mechanism in the base simulation framework.We estimate the intrinsic value of the order relative to the queue position by applying the model developed by Moallemi and Yuan (2017).The agent's order queue position provides an estimate of number of orders ahead of the agent's order at a particular price.A position at the front of the queue guarantees prompt execution, higher fill rate, low latency and lower adverse selection cost.We estimate the queue position in the order book by reconstructing the limit order book from the simulated data feed.Lets us suppose that the high-frequency market making agent places a limit order at time t = 0 seeking best ask price p a which gets filled or canceled at time τ .Filling the order pays the agent p a while cancellations pay nothing.We now describe the value of the order perceived by agents relative to the queue position as: To empirically calibrate the model (Equation 7.1), we take the same parameters used by Moallemi and Yuan (2017).These are exponential order size distribution, trade arrival rate (TAR), average trades size (ATS), trade size in the stan-dard lot (TSS), cancellation arrival rate (CAR), average cancellation size (ACS), price jump arrival rate (PJR), average jump size (AJS), market impact (MI) and average queue size (AQS).The trade size is identified as the limit order or market order, contrary to aggressive market orders as described by Moallemi and Yuan (2017).Table 4 specifies the estimated parameters for simulated data with no cancellation mechanisms (Simulated NC), without cancellation mechanisms (Simulated WC) and an average (Simulated AV) over 21 days.The paper itself provides more detail regarding the parameters, calibration and model fitting (Moallemi and Yuan, 2017).Table 4 shows that an absence of cancellation mechanisms at the highfrequency market maker's end leads to a drastic increase in the average queue size.The decrease in the cancellation rate increases the queue size, which affects the high-frequency market maker's profitability, bid-ask spread and market impact.The value of the order as a function of the queue position, bid-ask spread, and the agent's profit efficiently captures the claim illustrated by the data in Figure 8.The wider bid-ask spread when agent's are unable to cancel the limit order in Figure 8d has negative effect on the profitability (Figure 8e ) as compared to scenarios with cancellations (Figure 8a,8b ).As stated in the model in Equation 7.1, the value of an order that is not filled is zero.Figure 8f shows that an increase in queue length decreases the probability of execution, and therefore the value.The value of the order becomes flat, as the queue length is extremely large.Our results are consistent with the findings of Dahlström et al. (2018) when investigating the determinants of order cancellations.It is difficult to infer causal relationships between cancellations and market microstructure variables based on artificially created scenarios, but this approach nonetheless it paves the way for future investigation using order level data.(f) VOPQ NC Figure 8: Effect of limit order cancellations on the market.The top row (marked WC) represents the distribution when the market maker's agents can cancel the limit orders.The bottom row (marked NC) represents a situation with no cancellations.BAD is intraday bid-ask distribution, DHMMP is profit distribution of DHMM over the trading day, and VOPQ is the value of the orders relative to queue position.The average queue length on a particular trading day is represented by a black triangle.

Validation
The validation of the trading simulation framework is performed by measuring how successfully the simulation's output exhibits persis-tent empirical patterns in the order book data.Such empirical patterns are common across various markets and instruments, and even timescales are often classified as "stylised facts" (Cont, 2001).We present a nominal set of stylised facts, reproduced from the empirical analysis of simulated order book data, as shown in Figure 9.Let p(t) be the price of a security at time t.Given a timescale ∆t, we define log return at ∆t as r(t, ∆t) = ln p(t + ∆t) − ln p(t).The cumulative distribution (CDF) of returns is given as F ∆t (x) = P[r(t, ∆t) ≤ x].The derivative of the earlier gives probability density function (PDF) F ∆t = f ∆t , empirically estimated for normalized simulated return, as illustrated in Figure 9a.The cumulative distribution of return follows power law F ∆t ∼ |r| −α with 2 < α < 5.In Figure 9b, the positive tail F + ∆t (x) = P[r(t, ∆t) ≥ x] and the negative tail F − ∆t (x) = P[r(t, ∆t) ≤ x] of cumulative distribution, shown as yellow circles and green squares, exhibit power law, as denoted by the red line with α = 2.8.In Figure 9c, we show the absence of the autocorrelation of price change, defined as ρ(τ ) = Corr r(t, ∆t), r(t + τ, ∆t) .The autocorrelation function (ACF) drastically decays to zero in few lags.asynchronous stochastic events characterised by marked multivariate Hawkes processes.5. Extract the agent's trading algorithm parameters directly from the order book data , rather than from random seeds or empirical literature.

Figure 5 :
Figure 5: Performance evaluation of high-frequency market making agents in predicting order book events and time at nanosecond resolution.The standard deviation over 10 experiments using different train-val-test sample is denoted by error bar.

Figure 6 :
Figure 6: Performance evaluation of high-frequency market making agents in predicting order book events and time at millisecond resolution.The standard deviation over 10 experiments using different train-val-test sample is denoted by error bar.

Figure 7 :
Figure 7: Trading agents performance with DHMM, NHMM and PMM while training, testing and random day.
Gould et al. (2013)an example of the reconstructed LOB for Apple securities traded on NASDAQ.The smallest increment by which the price of the security can move is called a tick.The highest price at time t for which there is outstanding buy order is called bid price(168.60),whilethelowest sell price is called ask price (168.50).The bid-ask spread (0.10) at time t is defined as the difference between the ask and bid prices.The mid price (150.05) at time t is the arithmetic average of the bid and the ask.For an in-depth review of definitions, mechanisms and nomenclature, please refer toGould et al. (2013).

Table 1 :
Descriptive statistics of the orderbook data

Table 3 :
Mean and standard deviation on the daily normalised PnL (PnL) and mean absolute positions (MAP) for different market makers probablity (liquidity spread premium − adverse selection cost) (7.1)whereFP t P FILL | F t , LSP t (p a − p t ) , ASC t E (p τ − p t ) | F t , FILL

Table 4 :
Estimated parameters for simulated orderbook data.

Table 5 :
Sensitivity to the number of hidden units and Gaussian noise.The DLSTM-SDAE used in our model has 3 DAE layers and 3 LSTM layers.In performing sensitivity analysis, we fix the 3 LSTM layers and change only the DAE layer.