Learning about latent dynamic trading demand ∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document}

We present an equilibrium model of dynamic trading, learning, and pricing by strategic investors with trading targets and price impact. Since trading targets are private, investors filter the child order flow dynamically over time to estimate the latent underlying parent trading demand imbalance and to forecast its impact on subsequent price-pressure dynamics. We prove existence of an equilibrium and solve for equilibrium trading strategies and prices as the solution to a system of coupled ODEs. Trading strategies are combinations of trading towards investor targets, liquidity provision for other investors’ demands, and speculation based on learning about latent underlying trading-demand imbalances.

parent demands, by splitting their trading into dynamic sequences of smaller orders, called child orders (see O'Hara [32]), to minimize their price impact. Since the parent demands driving child-order trading are private information, investors use information from arriving child orders to form inferences over time about the dynamically evolving fundamental state of the market. In particular, investors learn about imbalances in the underlying aggregate parent demands and the associated pressure on future market-clearing prices and incorporate this information in their current child orders. Given the widespread prevalence of optimized order-splitting of parent orders into flows of child orders, dynamic learning about aggregate parent demands is a critical part of market dynamics. 1 This paper is the first to provide an analytically tractable equilibrium model of dynamic learning, trading, and pricing with parent trading demands. We consider a continuous-time model with high-frequency trading at times t ∈ [0, 1] over short time-horizons with [0, 1] being a day or an hour. Trading occurs between price-sensitive optimizing traders with two different types of parent trading targets: One group has fixed individual targets, and the other group wants to track a stochastically evolving target over time. Since parent targets are initially not public, information about parent demand imbalances is partially revealed through market-clearing stock prices. Our analysis models the equilibrium dynamic learning process, stock holdings, and stock-price processes.
Our main results are: • We construct and solve two different equilibrium models: A simpler price-friction equilibrium and a subgame perfect Nash financial-market equilibrium. In the price-friction equilibrium, price impact is due to an exogenous trading friction, but in the subgame Nash equilibrium, price impact includes both exogenous frictions and an endogenous price impact due to market clearing with constrained market asset-holding capacity. We find that these two equilibria are numerically similar. • Intraday price drifts due to price pressure change over the trading day and are pathdependent. This leads to time-varying incentives for investors to provide liquidity to the child orders of other investors. • A practical application of our model is that we can compute total trading costs for investors given the effects of dynamic learning and optimal trading by other investors. We show these costs are quadratic in the rebalancers' trading targets. • Trading in our model reflects a combination of liquidity provision and speculation but not predatory trading. We conjecture that the absence of predatory trading is because our model replaces the exogenous price-elastic residual supply used in both Brunnermeier and Pedersen [9] and Carlin, Lobo, and Viswanathan [10] with endogenous demands coming from rational profit-maximizing investors. Our paper advances several strands of research on market microstructure. First, dynamic learning and trading have been extensively studied in the context of markets with strategic investors with long-lived asymmetric information as in Kyle [29]. However, equilibrium trading, learning, and pricing with optimal dynamic order-splitting by large uninformed investors are less understood. Thus, we model price pressure to equate supply and demand rather than adverse selection. Second, Grossman and Miller [21] model pricing and liquidity provision with impatient traders who submit single orders equal to their parent demands and with symmetric payoff information. In contrast, we model liquidity provision with optimal order-splitting of parent demands into child order flows. Third, Choi, Larsen, and Seppi [12] construct an equilibrium with optimal dynamic trading and learning in a market with a strategic rebalancer with an end-of-day trading target and an informed investor who trades on private long-lived asset-payoff information. By filtering the order flow over time, the rebalancer learns about the underlying asset payoff, the informed investor learns about the rebalancer's trading target, and market makers learn about both when setting prices. That earlier paper provides a characterization result for equilibrium and gives numerical examples but does not have an existence proof or analytic solutions. In contrast, our model is solved analytically and gives the equilibrium in closed form. Fourth, Brunnermeier and Pedersen [9] and Carlin, Lobo, and Viswanathan [10] show how dynamic rebalancing by a large investor can lead to predatory trading. However, these papers abstract from the learning problem by assuming the parent trading needs are publicly observable. They also make an ad hoc assumption about the price sensitivity of a residual market-maker trading demand due to exogenous price-elastic noise traders. In contrast, our model assumes the underlying parent trading demands are private information, which leads to a learning problem. In addition, our prices are rationally set with no ad hoc residual demand. Fifth, a large body of research models optimal order-splitting strategies for a single strategic investor given an exogenous pricing rule with no learning about latent trading demands of other investors (see, e.g., Almgren and Chriss [3,4], Almgren [2], and Schied and Schöneborn [34]). In contrast, we solve for optimal trades, learning, and pricing jointly. van Kerval, Kwan, and Westerholm [25] solve for optimal trading strategies for two dynamic rebalancers with learning over time about each other's latent trading demands. This leads to predictions about the effect of aggregate parent demand on individual investor child orders, which are then verified empirically. However, they assume an ad hoc linear pricing rule, and there are no existence proofs or analytic solutions. In contrast, price pressure in our Nash model is partly endogenously determined in equilibrium, and we solve our model analytically. As in van Kervel, Kwan, and Westerholm [25], trading in our model is a combination of speculation on expected future price changes and trading-demand accommodation.
The mathematics of our model is tractable because we use a modeling approach from the asset-pricing literature for non-dividend paying stocks. The simplification involves finding equilibrium price drifts that clear the market without determining the levels of market-clearing prices as discounted future cash flows. Karatzas and Shreve [27,Chap. 4] use this approach in complete market settings, and Cuoco and He [14] consider an extension to incomplete markets. Atmaz and Basak [1] show that non-dividend paying stocks are relevant for asset pricing. However, the non-dividend paying stock approach is new in the mainstream microstructure literature. Gârleanu and Pedersen [20], Bouchard, Fukasawa, Herdegen, and Muhle-Karbe [7], and Noh and Weston [31] use the zero-dividend stock approach to model prices given exogenous transaction costs. We extend this approach to include learning and endogenous price impact.

Model
We model equilibrium trading, learning, and pricing in a market with a risky stock and a riskless bank account over a short time horizon [0, 1] (e.g., a trading day). For simplicity, the net supply of both the stock and bank account are set to zero. Since the time horizon is short, the risk-free interest rate on the bank account is set to zero. Stock differs from the bank account in two ways: First, investors have individual parent demands for the stock. Second, stock prices are stochastic over time. Stock valuation can be viewed as the sum of two components: One component is a fundamental valuation of future dividends absent price pressure from trading targets. The other component is incremental price pressure for markets to clear given parent trading demand imbalances. It is the price pressure component that is the focus of our analysis. Our analysis treats these two components as being orthogonal and, for simplicity, normalizes the dividend valuation component to zero. Thus, hereafter, when we refer to the "stock price", this is shorthand for the "price pressure valuation component of stock prices." Our prices are random due to random trading demand imbalances. In a more complicated model, a separate fundamental dividend valuation component could be added to our stock-price pressure valuation to get the full stock price.
Two different groups of investors trade in our equilibrium model. Rebalancer i's control is her stock holdings, which are denoted by For simplicity, the initial endowed holdings of both the bank account and the stock are normalized to zero for all rebalancers. Whenã i is close to zero (ã i ≈ 0), rebalancer i is a "high-frequency" liquidity provider with inventory penalties. Becauseã i is private information for i, other traders k, k = i, do not know whether rebalancer i has an active latent trading demand (|ã i | >> 0) or is a liquidity provider (ã i ≈ 0). (ii) Price-sensitive trackers. Trackers j ∈ {M + 1, ..., M +M} all track a dynamic target given by a common exogenous Brownian motion process w t over time t ∈ [0, 1] where the initial target is w 0 ∼ N (0, σ 2 w 0 ), and w • t is a standard Brownian motion that starts at zero, has a zero drift, and a unit volatility. 2 While trackers observe the same w t at time t ∈ [0, 1], rebalancers do not and instead filter w t over time t ∈ [0, 1]. Tracker j's control is her stock holdings, which are denoted by (θ j,t ) t∈[0,1] for j ∈ {M + 1, ..., M +M}. Their initial stock and money market holdings are also normalized to zero. We assume the random variables (ã 1 , ...,ã M ), w 0 , and (w • t ) t∈ [0,1] are all independent.
van Kerval, Kwan, and Westerholm [25] show that interactions between multiple heterogenous investors are an empirically important part of the trading process. Our model with M ≥ 1 andM ≥ 1 lets us analyze such trading interactions. In the following, index k ∈ {1, ..., M +M} denotes any generic trader, index i ∈ {1, ..., M} denotes a rebalancer, and index j ∈ {M + 1, ..., M +M} denotes a tracker. This allows us to express the stock-market clearing condition as Investor stock demands change over time due to stochastic shocks to the tracker target w t and due to randomness in imperfect learning about the rebalancer targets. As a result, the stock-price process that clears the market as in (2.3) changes randomly over time. Thus, stock randomness in our model -given that the fundamental dividend valuation is normalized to zero -comes from learning about traders' parent targets (which are initially private information of the individual rebalancers and the trackers) and from random changes over time in the trackers' target w t . 3 Investor information is represented as generic filtrations F i,t and F j,t for rebalancers and trackers. These filtrations are constructed explicitly in the equilibria considered below. In the price-friction equilibrium in Sect. 3, the filtrations F i,t and F j,t are where S i,t and S j,t denote perceived stock-price processes for a rebalancer i and a tracker j.
In the Nash equilibrium in Sect. 4, more complicated filtrations are needed to derive traders' optimal off-equilibrium response functions.
Our model is a model of dynamic learning. As we shall see, trackers infer the aggregate targetã in (2.1) from the initial stock price, and so trackers have no need to filter the rebalancers' individual targets (ã 1 , ...,ã M ). The situation is different for each rebalancer i ∈ {1, ..., M}, who only observes her own targetã i and past and current stock prices. When σ w 0 > 0, these observations are insufficient to inferã and w t separately, so rebalancer i filters based onã i and on past and current stock-price observations to learn about the underlying latent parent demandsã and w t . In contrast, when σ w 0 := 0, the model only has static learning aboutã at time t = 0 from the initial stock price. At later times t ∈ (0, 1], the rebalancers can infer w t from their stock-price observations. The static learning model with σ w 0 := 0 was developed in Choi, Larsen, and Seppi [13].

Individual maximization problems
This section introduces the individual maximization problems. A generic trader k's optimal stock holdings are determined in terms of a trade-off between expected terminal wealth X k,1 and a penalty for deviations of their holdings θ k,t over time from their parent targetã i (rebalancers) or Brownian motion w t (trackers). An investor's terminal wealth X k,1 depends on the stock prices S k,t associated with k's holdings θ k,t over time. An exogenous continuous (deterministic) function κ : [0, 1] → [0, ∞] models the severity of the target penalty over time. 4 For example, more severe target penalties later in the day would be associated with a penalty severity function κ that is increasing with time t. The rebalancer and tracker objectives are (2.5) whereã i is the ideal holdings for rebalancer i and w t is the ideal holdings for tracker j at time t ∈ [0, 1]. However, stock-market clearing prevents θ i,t and θ j,t from beingã i and w t . The suprema in (2.5) are taken over progressively measurable holding processes θ i,t and θ j,t with respect to traders' filtrations F i,t and F j,t . As we shall see in Sects. 3 and 4 below, our traders optimally use controls given as smooth functions evaluated at a finite set of state processes (i.e., Markov controls). The next section constructs such a set of Markovian state processes. To rule out doubling strategies, we require square integrability Terminal wealth X k,1 in (2.5) is generated by trader k's perceived wealth process which is affected by k's holdings θ k,t both directly and also indirectly via the impact of k's holdings on an associated perceived stock-price process S k,t . Trader k's holdings θ k,t are price sensitive because market-clearing price pressure affects price drifts and, thus, investor wealth. In (2.7), the zero initial wealth X k,0 = 0 is because trader k's initial endowed money market and stock holdings are normalized to zero. Thus,ã i and w t are ideal holding changes relative to investors' normalized initial zero holdings. Given the objectives in (2.5), trading reflects a combination of motives: Investors seek to have stock holdings close to their own targets a i and w t , but they also seek to increase their expected terminal wealth by trading on price pressure from other investors trading on their targets. Thus, traders demand liquidity (to come close to their targets) and supply liquidity for markets to clear (by being willing to deviate from their targets so that other traders can trade towards their targets, given the appropriate price incentives), and speculate on future predictable price pressure. Our remaining model construction involves specifying investor stock-price perceptions S i,t and S j,t and the associated investor filtrations F i,t and F j,t . We then state conditions that these perceptions and filtrations must satisfy in equilibrium. Finally, we give theoretical results that ensure equilibria exist.

State processes
The fundamental underlying state of the market in our model depends on the aggregate parent demand imbalancesã andMw t . As already noted, there is a significant informational difference between trackers and rebalancers. Each tracker directly observes w t in (2.2) andas we shall see -can therefore infer the aggregate rebalancer targetã in (2.1) from the initial stock price. In contrast, rebalancers learn about w t andã using dynamic filtering. Thus, the rebalancer filtrations F i,t , i ∈ {1, ..., M}, and tracker filtrations F j,t , j ∈ {M +1, ..., M +M}, are not nested. Rebalancers know prices and their individual targetã i , whereas trackers know a , w t , and prices.
Before considering specific stock-price perceptions in Sects. 3 and 4 below, we describe a set of conjectured state processes (Y t , η t , q i,t , w i,t ) for rebalancer i ∈ {1, ..., M}. These processes are all endogenous in the equilibria we construct. However, it is convenient to describe the state processes' informational properties first, before showing how they arise in equilibrium. The processes (Y t , η t ) are public in that they are adapted to F k,t for all traders k ∈ {1, ..., M +M}. Furthermore, η t will be adapted to σ (Y u ) u∈ [0,t] . The state processes (q i,t , w i,t ) are specific to individual rebalancers. They are adapted to i's filtration F i,t , but they are not adapted to other traders' filtrations F k,t for k = i.
Rebalancers learn by extracting information about aggregate demand imbalances from stock prices. In the equilibria we construct, the information extracted from stock prices over time t is a state process Y t , which has the form where B : [0, 1] → R is a smooth deterministic function of time that is endogenously determined in equilibrium. The function B(t) controls howã and w t are mixed in stock prices. The process Y t is not directly observable for the rebalancers, but Lemma 3.1 below shows that Y t can be inferred from stock prices. Because rebalancer i ∈ {1, ..., M} also knows her own targetã i , by knowing Y t over time t ∈ [0, 1], she equivalently knows Unlike Y t in (2.8), the process Y i,t is independent of rebalancer i's private trading targetã i and satisfies Rebalancers use knowledge of Y t to estimateã and w t from stock prices at time t. For a continuously differentiable function B : [0, 1] → R, we define two processes Let the function (t) denote the remaining variance where the second equality follows from the zero-mean assumptions for (ã 1 , ...,ã M ) and w 0 . Because the targets (ã 1 , ...,ã M ) are assumed independent and homogeneously distributed The following result is a special case of the Kalman-Bucy result from filtering theory (See Appendix B for details).

Lemma 2.1 (Kalman-Bucy) For a continuously differentiable function B
: [0, 1] → R, the process w i,t is independent ofã i , is a Brownian motion, and satisfies (modulo P null sets) (2.14) Furthermore, the remaining variance at time t is given by in the sense that (2.14) holds. However, while w i,t on the left in (2.11) is observable by rebalancers, the individual terms w t andã in w i,t 's decomposition on the right of (2.11) are not. The stock-market clearing condition (2.3) lets us relate prices to the state processes driving investor demands. The sum M i=1 q i,t is an important term in this relation, so the following decomposition results are useful: (2.16) holds with the process η t being adapted to σ (Y u ) u∈[0,t] with Y t in (2.8) and (2.17)

The inverse relation
holds with deterministic functions F 1 (t) and F 2 (t) given by the ODEs There are two key points: First, no investor knows M i=1 q i,t , but it can be decomposed into a public term η t and a term A(t)ã that trackers know but not the rebalancers. Second, from (2.17), the process η t depends on the path of Y s over time s ∈ [0, t]. Thus, the state process η t reflects common path dependence due to w t . The expression (2.18) shows that the individual rebalancer expectation q i,t includes a common learning component η t M and then the effect of i's private informationã i . In particular, it follows from (2.19), that F 1 (t) and F 2 (t) are both positive so that, consistent with intuition, the loading onã i is negative in (2.18).

Price-friction equilibrium
Investor perceptions of the impact of their trading on stock prices are a key part of the optimizations in (2.5) and the resulting market equilibrium. We consider two specifications of investor stock-price perceptions. This section presents a simplified model in which perceived price impact is a fully exogenous trading friction. This approach is analogous to the exogenous price impact used in van Kerval, Kwan, and Westerholm [25]. We then solve for the endogenous stock-price process that clears the market (and also satisfies some weak consistency conditions) and the associated optimized investor-holding processes. Sect. 4 presents a richer model of price impact in which investor stock-price perceptions are partially endogenized in a subgame perfect Nash financial-market equilibrium.
Our equilibrium construction is a conjecture-and-verify analysis. Section 3.1 conjectures functional forms for investor perceptions of stock-price dynamics. Section 3.2 defines equilibrium and then solves for equilibrium price-perception coefficients and the associated price dynamics and holdings that satisfy the definition of equilibrium.

Stock-price perceptions
Recall that price pressure is different from the value of future dividends. It is a valuation adjustment needed to clear the stock market given trading demand imbalances. This allows us to model price pressure as zero-dividend asset prices as in, e.g., Karatzas and Shreve [27,Chap. 4].
Rebalancers optimize (2.5) with respect to perceived stock-price processes of the form and (α, γ ) are constants. The " f " superscript indicates that the perceived price S f i,t is defined with respect to a particular set of coefficient functions f in (3.1). The stock-price drift in (3.1) is perceived by rebalancer i to be affine in a set of state processes. Consistent with intuition, we will see that in equilibrium the loadings f 0 (t) and f 3 (t) on Y t and η t are negative. In particular, Y t with B(t) < 0 measures a mix of aggregate demand from rebalancers and trackers, and η t reflects public expectations of aggregate private rebalancer expectations about other rebalancers' parent-demand imbalances, both of which depress price change expectations. The other coefficients describe the perceived impact of rebalancer i on the stock-price drift.
Theorem 3.5 below endogenously determines The exogenous parameters (α, γ ) can be found by calibrating model output to empirical data. The term αθ i,t allows for ad hoc trading frictions. The price-friction parameter α is an exogenous model input. Price taking is a special case with α := 0, whereas the empirically relevant case is α < 0 such that buy (sell) orders decrease (increase) the future stock-price drifts.
The innovations in the rebalancers' perceived stock prices dw i,t come from new information rebalancer i learns over time about the underlying parent-demand state variable Y t , which has both a direct effect on the future stock-price drift and an additional indirect effect via its impact on η t since η t is adapted to σ (Y u ) u∈[0,t] from Lemma 2.2.
The zero-dividend stock valuation approach (see, e.g., Chapter 4 in Karatzas and Shreve, [27]) has several consequences: First, we model perceived and equilibrium stock-price drifts rather than price levels. Second, in (3.1), the stock's volatility and initial value are not determined in equilibrium but rather are model inputs. For simplicity, we set the volatility to be a constant γ > 0 (i.e., positive demand innovations dw i,t increase prices), and the initial price is set to be Y 0 in (3.1). However, other choices of S 0 would work equally well as long as S 0 satisfies σ (S 0 ) = σ (Y 0 ).
The next result shows that w i,t is rebalancer i's innovations process in the sense that w i,t is a Brownian motion relative to i's filtration defined with perceived stock prices S f i,t in (3.1) and such that S f i,t and w i,t generate the same information.
♦ Thus, given a path of perceived prices generated by a price process S f i,t of the form in (3.1) and her personal targetã i , rebalancer i can infer the path of w i,t . Furthermore, given the path w i,t , rebalancer i can infer Y i,t using (2.14) and, thus, can infer Y t from (2.10).
Trackers optimize (2.5) with respect to a perceived stock-price process of the form wheref 3 ,f 4 ,f 5 : [0, 1] → R are continuous (determinstic) functions, and the α is a constant. 6 Trackers have different information in that they observe w t directly and can infer a from the initial stock price Y 0 using (2.8) and their knowledge of w 0 . Therefore, their perceived stock prices differ from those of the rebalancers. Theorem 3.5 below endogenously determines (f 3 ,f 4 ,f 5 ) in equilibrium, and (α, γ ) are exogenous model inputs. Again, α := 0 is the special case of price-taking. The motivation for these price perceptions for the trackers is as follows. First, the perceptions in (3.3) allow trackers to condition their perceived price drift to take into account price pressure from target imbalancesã and w t that depress expected price changes. Since trackers and rebalancers trade differently on their targets, the price-drift impactsf 4 andf 5 are in general different. Second, the trackers understand that the state process Y t affects the rebalancer demand and, thus, the stock-price drift. However, Y t does not need to be included explicitly in the tracker perceived price drift in (3.3) since Y t is given by a linear combination ofã and w t , which are already included in the drift. Third, trackers know that rebalancers' can infer η t and that this potentially affects their price perceptions in (3.1), and, thus, is likely to affect their trading, and, thus, is likely to affect pricing. Thus, trackers allow for the pricing effect of η t in their perceptions in (3.3). Fourth, as already noted, α allows for possible exogenous trading frictions, if any.
An important difference between rebalancer and tracker perceived prices in (3.1) and (3.3) is that rebalancer price dynamics are based on the informational innovations dw i,t , whereas tracker price dynamics are based on the tracker target changes dw t . Reconciling the price perceptions of rebalancers and trackers will impose restrictions on equilibrium price perceptions and holdings and will rely on the relation between dw i,t and dw t in (2.11).
Given the price perceptions in (3.1) and (3.3), we solve (2.5) for optimal rebalancer and tracker holdings. [0,t] , and, provided the holding processeŝ The proof of Lemma 3.2 shows that pointwise quadratic maximization gives the maximizers for (2.5) for rebalancers and trackers for arbitrary f andf functions. Stock-price perceptions play two interconnected roles in our model. First, rebalancers and trackers solve their optimization problems in (2.5) based on their perceptions in (3.1) and (3.3) for how hypothetical holdings θ i,t and θ j,t affect price dynamics. Second, investor stock-price perceptions affect how they learn from observed prices. In particular, Lemma 3.1 shows that rebalancers use their stock-price perceptions (3.1) to infer the aggregate demand state variable Y t based on past and current stock prices. In other words, dynamic learning by rebalancers depends critically on their stock-price perceptions. Similarly, trackers also use their stock-price perception of Y 0 in (3.3) to infer the aggregate parent demandã from the initial price at time t = 0. However, thereafter, there is no additional learning from prices by the trackers at t > 0 since they directly observe their target w t .

Equilibrium
This section defines our first of two equilibrium concepts and then derives price-perception coefficients for the conjectured functional form in Sect. 3.1 that satisfy the equilibrium definition along with the associated equilibrium price dynamics and holdings. The notion of equilibrium in our first construction is relatively simple, being based just on market clearing and consistency of investor price perceptions.

Definition 3.3 Deterministic functions of time
(ii) Inserting trader k's maximizerθ k,t into the perceived stock-price processes (3.1) and (3.3) produces identical stock-price processes across all traders k ∈ {1, ..., M +M}. This common equilibrium stock-price process is denoted byŜ t .
(iii) The money and stock markets clear. ♦

Definition 3.3 places only minimal restrictions on the perceived stock-price coefficient functions in (3.1) and (3.3): Markets must clear and result in consistent perceived stock-price
processes when all investors use their equilibrium strategies. Section 4 below considers a subgame perfect Nash extension of our basic model that imposes more restrictions on allowable off-equilibrium stock-price perceptions such as off-equilibrium market clearing and various consistency requirements. Definition 3.3(ii) requires that in equilibrium rebalancers and trackers perceive identical stock-price dynamics when using their equilibrium holdings. However, rebalancers and trackers have different information (i.e., rebalancers form imperfect inferences about w t andã , whereas trackers observe w t directly and inferã at time 0). The resolution of this apparent paradox is investors' different information sets: Trackers and rebalancers all agree on dŜ t , but they disagree on how to decompose dŜ t into drift and volatility components. Because the trackers observe w t , they can use dw t in their decomposition of dŜ t . However, w t is not adapted to the rebalancers' filtrations and can therefore not be used in their dŜ t decompositions. Instead, rebalancers use their innovations processes dw i,t when decomposing dŜ t into drift and volatility. By for all rebalancers i ∈ {1, ..., M} and all trackers j ∈ {M + 1, ..., M +M}. We note that the right-hand side of (3.6) does not depend on the rebalancer index i. Matching up coefficients in front of ( and Y t in (2.8) produces five equations. In addition, insertingθ i,t andθ j,t in (3.4) into the market-clearing condition (2.3) and using (2.16) produce three more equations from matching (ã , η t , w t ) coefficients. All in all, we have eight equilibrium restrictions for give the equilibrium coefficient functions (A.1) in Appendix A and the ODE for B(t) in (3.7) below.
Our equilibrium existence result is based on the following technical lemma. It guarantees the existence of a solution to an autonomous system of coupled ODEs. In particular, given rebalancer stock-price perceptions of the form in (3.1) with an aggregate demand state variable Y t process of the form in (2.8) (and the associated η t process), we must construct a deterministic function B(t) that gives an equilibrium.
The solutions to the ODEs for A(t) and (t) in (3.7) agree with the expressions in (2.15) and (2.17). The exogenous price-friction coefficient α does not appear in the ODEs (3.7). It is possible to restate the ODE system (3.7) using a single path-dependent ODE . The special case B(0) := − 1 M produces a model with no dynamic learning because B (t) = 0 implies (t) = 0 and so dq i,t = 0. The following theorem gives the price-friction equilibrium in terms of the ODEs (3.7). In this theorem, the price-friction parameter α, volatility γ , and initial value B(0) ∈ R are free parameters. The intuition for B(0) being free is discussed after our equilibrium construction in Theorem 3.5. in Appendix A. (ii) The equilibrium in (i) has holdingsθ i,t for rebalancer i andθ j,t for tracker j given bŷ The equilibrium in (i) has the equilibrium stock-price processŜ t given byŜ 0 := w 0 − B(0)ã and dynamics with respect to the trackers' given by

9)
and dynamics with respect to the rebalancers' ♦ Several observations follow from Theorem 3.5: 1. Lemma 3.1 ensures that rebalancer i can infer her innovations process w i,t from perceived prices S f i,t andã i , but rebalancer i cannot infer the trackers' target w t from the equilibrium pricesŜ t in (3.9). This is because the aggregate targetã also appears in the drift of dŜ t andã is not observed by individual rebalancers. 2. The equilibrium holdings (3.8) follow from inserting the equilibrium f andf functions in (A.1) in Appendix A into (3.4). Thus, the holdings in (3.8) are expressed in terms of the investors' state processes, which, in particular, are adapted to the investors' filtrations. However, these state processes are not mutually independent and so we give such representations of (3.8) in (A.2) and (A.3) in Appendix A. First, the price-friction equilibrium rebalancer holdingsθ i,t in (3.8) can be written in terms of the independent variables (ã i ,ã −ã i , w 0 ) and a residual orthogonal term given as a stochastic integral with respect to w • t of a deterministic function of time. Likewise, the price-friction equilibrium tracker holdingsθ j,t can be written in terms of the independent variables (ã , w 0 ) and a residual orthogonal term given in terms of a stochastic integral with respect to w • t of a deterministic function of time. Both these residual terms are Gaussian. Section 3.4 illustrates the loading coefficients on these independent state processes. 3. Because the exogenous price-friction coefficient α ≤ 0 does not appear in the ODEs (3.7), α is irrelevant for the equilibrium stock-price dynamics (3.9). However, α does affect the equilibrium holdings in (3.8). 4. The stock-price volatility γ affects the stock-price drift and holdings via its impact on B(t) in (3.7) and, thus, on (A.1). 5. It can seem paradoxical that trackers and rebalancers all perceive the same equilibrium stock-price processŜ t , but they decompose its dynamics dŜ t into different perceived drifts and martingale terms (i.e., they have different Itô decompositions). The resolution lies in the rebalancers and trackers having different filtrations: 7 The drift and martingale terms in (3.10) are not adapted to F j,t and the drift and martingale terms in (3.9) are not adapted to F i,t . The dynamics (3.9) and (3.10) all produce the same processŜ t because the innovations process w i,t in (2.11) links dw t with dw i,t and the drift term B (t)(ã − a i − q i,t )dt. 6. Investors' off-equilibrium perceived stock-price drifts differ linearly from their equilibrium drifts due to the differences θ k,t −θ k,t between their off-equilibrium and equilibrium holdings. 8 Rebalancer i's perceived stock-price drift in (3.1) can be decomposed for arbitrary holdings θ i,t as (3.11) 7 We nickname this our "Rashomon Theorem" after the 1950 movie in which different characters perceive the same event differently given their different perspectives. Rebalancers and trackers both start with private information so their filtrations are not nested. However, in equilibrium, stock-price dynamics depend on w t andã . Because the trackers know w 0 at time t = 0, they inferã from S j,0 = w 0 − B(0)ã , and so they have no need to filter at later times. On the other hand, rebalancer i only has noisy dynamic predictions E[ã |F i,t ] = q i,t +ã i of the aggregate parent imbalanceã given her inferences based on the individual parent targetsã i and stock-price observations. 8 Eqs. (3.11) and (3.12) are similar to Eq. (3.14) in Choi, Larsen, and Seppi [13].
where we have used the formulas for where we have used the formulas for Continuity between equilibrium and off-equilibrium is a reasonable property of investor stock-price perceptions. The representation of the perceived rebalancer drift in (3.11) relative toθ i,t from (3.8) also explains the presence of the rebalancer-specific terms (ã i , q i,t ) in the rebalancers' perceptions in (3.1). 7. Investors initially use block trades at time 0 to trade to positions θ i,0 and θ j,0 from (3.8) that are generically different from their initial normalized holdings of 0. Thereafter, investors trade continuously at times t > 0. 8. Theorem 3.5 verifies that price-perception coefficients in (3.1) and (3.3) can be constructed such that an equilibrium satisfying Definition 3.3 exists. However, as with many other rational expectation models, we do not have a proof of uniqueness. For example, there may be other public state variables in addition to η t that could hypothetically be included in the perceived price drifts that might also be associated with other equilibria as defined in Definition 3.3.
The function B(t) from (3.7) is key both in constructing the equilibrium and for interpreting the equilibrium price and holding processes. First, there is the issue that the initial value B(0) is a free input in Theorem 3.5. The intuition is that our model determines equilibrium stock-price drifts but not price levels. As can be seen in Theorem 3.5(iii) , B(0) controls the initial price level in our model. Second, the relation between B(t) and price levels allows us to impose additional structure on B(t). In particular, w t andã represent different types of demand imbalances. Thus, if B(t) < 0, then Y t in (2.8) plays the role of an aggregate demand state variable. How the two component quantities w t andã are mixed in the aggregate demand state variable Y t is different given the two components' different informational dynamics (i.e.,ã is not time dependent while w t changes randomly over time) and given their different impacts on investor demands (i.e., each rebalancer only knows their personal a i component ofã where other rebalancers' targets do not affect investor i's parent demand whereas w t affects both an individual tracker's parent demand and is also information about other trackers' parent demands). It seems reasonable that the sign of the impact of w t andã on the price level should be the same, which imposes the additional restriction that B(t) < 0. From Lemma 3.4, a sufficient condition for B(t) < 0 for all t ∈ [0, 1] isM B(0) + 1 < 0, which implies B (t) < 0. 9 With the economically reasonable parametric restriction that B (t) < 0 and given that α ≤ 0 so that α − 2κ(t) < 0, we can sign the impact of various quantities in the model on holdings and prices, which leads to the following comparative statics: 1. In (3.8), the equilibrium holdingsθ i,t of rebalancers are positively related to their parent targetsã i . This is intuitive because rebalancers want holdings close toã i . Rebalancer holdingsθ i,t are negatively related to the aggregate demand imbalance state variable Y t . The fact that θ i,t is decreasing in Y t is consistent with the theoretical results and empirical evidence in van Kerval, Kwan, and Westerholm [25] that investors buy less when there is a positive parent-demand imbalance for other investors in the market. The same intuition applies to the negative impact of the common component η t onθ i,t . However, the impact of q i,t onθ i,t is positive. The intuition is that when rebalancer i expects the other rebalancers (given i's ability to filter using her private target informationã i ) to have a net positive parent-demand imbalance E[ã −ã i |F i,t ] from (2.11), she buys at time t to speculate on the resulting anticipated positive drift in future price pressure in (3.10). 2. In (3.8), the equilibrium holdingsθ j,t of trackers are increasing in w t (which reflects both her own parent demand and also information about the parent demands of other trackers). Tracker holdingsθ j,t are also decreasing in η t , which is related to imbalances in rebalancers' aggregate parent demand expectations. The negative effect of η t is consistent with the van Kerval, Kwan, and Westerholm [25] liquidity-provision result and empirical evidence. However, the impact ofã is ambiguous in (3.8), and numerical calculations in Sect. 3.4 show that the sign is positive. This is again consistent with speculation on future predicted price pressure due to the tracker's superior information about aggregated latent parent-demand imbalances. 3. The equilibrium stock-price drift in (3.9) is decreasing in the tracker parent demand w t .
However, the impact ofã in the price drift is again ambiguous, which is related to information aboutã being useful in forecasting future price pressure.

Tractability and model structure
This section discusses the key modeling features that make our model tractable. First, we assume all traders seek to maximize their individual objectives in (2.4). Linear-quadratic objectives have been used extensively in the literature because of their tractability. Such objectives have been used in, e.g., Sannikov and A. Skrzypacz [33], Gârleanu and Pedersen [20], and Bouchard, Fukasawa, Herdegen, and Muhle-Karbe [7]. The linear-quadratic objectives (2.5) allow us to solve for the optimal holdings in Lemma 3.2 using quadratic pointwise optimization. In the price-friction equilibrium, we could equivalently use dynamic programming to produce the same optimal holdings. Second, our stock does not pay dividends, which means that only the stock drift can be endogenously determined in equilibrium. Models with non-dividend paying stocks have been used extensively in the literature. The monograph Karatzas and Shreve [27] gives an overview. 10 In particular, non-dividend paying stock models have been used for short horizon models like ours where consumption only takes place at the terminal time. 11 The rebalancers' dynamic learning produces forward-running filtering equations and by considering a non-dividend paying stock, we circumvent having additional backward-running equations. Equilibrium models with both forward and backward-running equations include Kyle [29], 10 Similar to a money market account, a non-dividend paying stock is a financial asset in the sense that holding one stock at time t = 1, gives one unit of consumption at t = 1. Likewise, being short one stock at t = 1, means the trader provides one unit of consumption at t = 1. Both the bank account and the non-dividend paying stock have exogenous initial prices and volatilities. It is custom for the money market account's initial price to be one and its volatility to be zero. For the non-dividend paying stock, we set the initial price to be Y 0 , its volatility to be a positive constant γ > 0, and determine endogenously the drift. 11 There are long-lived non-dividend paying stocks too as; see, for example, Atmaz and Basak [1] write: "For example, Hartzmark and Solomon [24] find that over the long-sample of 1927-2011, the average proportion of no-dividend stocks is around 35% and accounts for 21.3% of the aggregate US stock market capitalization. Similarly, by taking into account of rising share repurchase programs since the mid-1980ies, Boudoukh et al. [8] report that over the 1984-2003 period, the average proportion of no-dividend stocks is 64% and no-payout stocks, i.e., no dividends or no share repurchases, is 51% with the relative market capitalizations of 16.4% and 14.2%, respectively." Foster and Viswanathan [18,19], Back, Cao, and Willard [5], and Choi, Larsen, and Seppi [12].
Third, price impact is often modeled as the impact of investor holdings and orders on price levels (e.g., as in Almgren [2]) and as the impact of orders on price changes (e.g., Kyle [29]). However, for the sake of tractability, we follow Cuoco and Cvitanić [15] and model price impact in terms of the impact of investor holdings on the price drift. Price impact matters for the trading decisions of strategic investors because of its effect on future expected price changes (e.g., high holding demand raises prices which lowers expected future price appreciation). Our price-friction specification simply assumes directly that investor holdings affect expected future price changes. Thus, while our price impact specification is a simplification, it is a reasonable simplification that preserve the essential economics of price impact.
Fourth, instead of exogenous noise traders, we use optimizing trackers with a Brownian motion target w t . Grossman and Stiglitz [22] and Kyle [29] are standard references with an exogenous Gaussian stock supply. Gaussian noise traders are also used in the predatory trading models in Brunnermeier and Pederson [9] and Carlin, Lobo, and Viswanathan [10]. In our setting, we could eliminate trackers by settingM := 0 and replace the stock-market clearing condition (2.3) by using w t to model the exogenous stock supply as in (3.13) Including noise traders as in (3.13) in the model would be tractable in the price-friction equilibrium. However, surprisingly, exogenous noise-traders complicate constructing a Nash equilibrium with dynamic learning, whereas -as we show in Sect. 4 -optimizing trackers and market learning in (2.3) produce a subgame perfect Nash financial-market equilibrium in closed form. The models in Sannikov and Skrzypacz [33] and Choi, Larsen, and Seppi [13] have optimizing trackers but no dynamic learning.

Numerics
Our price-friction equilibrium is straightforward to compute numerically. This is because equilibrium stock prices and holdings are available in closed form given the solutions to the associated coupled ODEs in (3.7). We illustrate our models for several different parameterizations. In these parameterizations, there are M := 5 rebalancers andM := 10 trackers. The penalty function is a constant over the trading day and set to κ(t) := 1. The rebalancer target volatility is normalized to σã := 1 whereas we consider σ w 0 ∈ { 1 10 , 1} to illustrate the impact of dynamic learning. Recall that σ w 0 := 0 gives the model with only initial learning ofã as developed in Choi, Larsen, and Seppi [13]. To be consistent with our negative B(t) restriction, we consider an initial value B(0) := −0.2. We consider two stock-price volatility parameters γ ∈ { 1 2 , 1} and a zero price-friction parameter α := 0 (i.e., the competitive equilibrium). As noted above, α does not affect the endogenous price-drift coefficients, but α does affect investor holdings.

Equilibrium holdings
First, we consider equilibrium holdings. Fig. 1 shows the coefficient functions for the equilibrium stock holdingsθ k,t in (3.8) for rebalancers and trackers using their orthogonal representations in (A.2) and (A.3) in Appendix A. Alternatively, we could plot coefficient loadings on the state processes (ã i , q i,t , η t , Y t ) and (η t , w t ,ã ) in (3.8). We prefer to illustrate orthogonal loadings to avoid cancelation effects in the different state processes. Fig. 1E shows rebalancer i's loadings over time on her own parent target a i . As expected, these loadings are positive, but they are less than 1 because trading towards a positive target depresses equilibrium price drifts in order for markets to clear. The rebalancer loading onã i is over 0.9, which implies a large initial block trade at time t = 0. The negative coefficients onã −ã i (for rebalancer i) in Fig. 1A andã (for tracker j) in Fig. 1B are demand accommodation. In particular, rebalancers and trackers reduce their holdings when other rebalancers want to buy. The loadings on w 0 in Fig. 1C and 1D are more subtle. When the initial tracker target w 0 has a high volatility (as in the red and amber trajectories), the tracker holdings load positively on w 0 over time in Fig. 1D, and the negative rebalancer loadings in Fig. 1C indicate demand accommodation by the rebalancers. However, when the initial tracker target has low volatility (as in the green and blue trajectories), the initial positive tracker loadings on w 0 eventually flip signs as do the initial negative rebalancer loadings. At first glance, this is puzzling. The explanation is that, as noted above, the trackers and rebalancers have different stock-price drift perceptions in (3.9) and (3.10) given their different filtrations. In particular, there is dynamic learning over time by the rebalancers based on the information Y t inferred from prices, whereas the trackers are fully informed aboutã and w t (trackers inferã at time 0). Fig. 3C and 3D below illustrate that the rebalancers' and trackers' stock-drift perceptions are quite different in these two low σ w 0 parameterizations.
In addition to the effects illustrated in Fig. 1, investor holdings are also affected by the realized path of w t = w 0 + w • t over time. This is because of fluctuations in the underlying tracker parent demand and also due to the effect of w • t on dynamic learning by the rebalancers. Appendix A shows the exact specification of this term in the tracker holdings (given as a dw • u integral of a deterministic function). Given the linearity of investor holdings and since the Brownian motion w • t has zero expected increments, this random path effect disappears in ex ante expected investor holdings.
To summarize, Fig. 1 shows there are three main drivers of investor holdings: First, investors' holdings in most cases are drawn partially towards their own targetsã i and w t . Second, investors provide partial accommodation to other investors' parent demands. Third, dynamic learning and speculation on the price drift affect demand accommodation. Interestingly, there is no evidence in Fig. 1 of predatory trading. Predatory trading differs from demand accommodation in that a predator first trades in the same direction as another investor and then subsequently unwinds her position. In this context, the hump-shape of the blue trajectories (for low σ w 0 ) in Fig. 1C and 1D do not indicate predatory trading: Because w 0 is the trackers' own target, the blue hump in Fig. 1D cannot reflect predatory trading. Furthermore, the blue hump-shaped trajectory in Fig. 1C also differs from predatory trading because the tracker and rebalancer loadings have opposite signs as seen in Fig. 1D. This is due to market clearing. For example, when the rebalancers are buying given w 0 > 0, the trackers are actually selling. Instead of predatory trading, we shall see below, that the blue trajectories are explained by price perceptions and dynamic learning. Fig. 2 plots the instantaneous intraday unconditional trading autocorrelations for the price-friction equilibrium holding processes for both the rebalancer and tracker in (3.8). These autocorrelations are scaled by the time step h > 0 (the unscaled versions converge to zero as h ↓ 0). Thus, consistent with empirical evidence, trading is autocorrelated due to order splitting. Fig.  2 shows that rebalancers' orders are positively autocorrelated (2A) whereas trackers' orders exhibit negative autorcorrelation (2B). Market clearing forces the intraday instantaneous unconditional cross correlation between rebalancers' and trackers' holdings to be negatively perfectly correlated

Equilibrium prices
Next, we consider the price-friction equilibrium stock-price dynamics in (3.9) and (3.10).
For the trackers, we can rewrite the drift in (3.9) in terms of the independent random variables (ã , w 0 ) and an residual orthogonal term given as a stochastic integral with respect to w • t of a deterministic function of time. For the rebalancers, we can rewrite the perceived drift in (3.10) in terms of the independent random variables (ã −ã i , w 0 ,ã i ) and an residual orthogonal term given as a stochastic integral with respect to w • t of a deterministic function of time. These formulas are given in (A.4) and (A.5) in Appendix A and are illustrated in Fig. 3. Fig. 3 shows that positive parent demandsã i ,ã −ã i , andã all depress perceived stockprice drifts. The same is true for the tracker perceived stock-price drift loading on the initial tracker parent demand w 0 . However, the relation between the rebalancer perceived drift and w 0 is more nuanced. When the initial tracker demand volatility σ w 0 is high (red and amber lines in Fig. 3C and 3D), then rebalancers perceive that w 0 depresses the price drift. However, when σ w 0 is low, then the dynamic learning process -given the inability of rebalancers to observe w 0 directly -causes the rebalancer perceived stock-price drift loading on w 0 to change sign. The blue and green lines in Fig. 3C and 3D illustrate that low values of σ w 0 make the trackers use their superior knowledge of w 0 to manipulate stock-price perceptions to create gains from trade that outweigh their penalties. More specifically, the blue and green lines in Fig. 1C and 1D show that rebalancers have large positive stock holdings and trackers have large negative holdings based on a positive realization w 0 > 0. Such large negative holdings imply that trackers incur large inventory penalties because they deviate from the target trajectory w t = w 0 + w • t . Trackers find this behavior optimal because their blue and green lines in Fig. 3D are negative (giving trackers large gains from trade) and rebalancers  are willing to hold these large positive stock positions because their blue and green lines in Fig. 3C are positive (giving also rebalancers large gains from trade). Fig. 4A plots the instantaneous intraday unconditional stock-price autocorrelation, which is again scaled relative to h (3.16) for the equilibrium stock-price processŜ t . Price pressure from persistent parent demands lead to rising intraday price autocorrelation over the trading day. Fig. 4B plots the time trajectory of the unconditional variance of intraday price drifts over the trading day based on the trackers' equilibrium perceptions in (3.9). Predictable price drifts are important in actual markets as incentives for intraday liquidity provision by HFT market makers (represented in our model by rebalancers with realizations a i ≈ 0.) We see that price-drift variability due to price pressure increases over the trading day. Fig. 5A shows that σ w 0 > 0 controls the starting point (0) > 0 whereas γ > 0 controls the speed of learning (i.e., how negative the slope of (t) is). For example, the green and red lines (γ = 1) illustrate a slower speed of learning relative to the amber and blue lines (γ = 0.1). These effects on (t) come from Fig. 5B and the formula for (t) in terms B (t) in (2.15). The red line in Fig. 5A also shows that the remaining variance (1) at t = 1 can be substantial.

Equilibrium welfare
In this section, we study the impact of the exogenous model input B(0) ∈ R on equilibrium welfare. There are many ways to measure social welfare (see, e.g., Vayanos [35,Section 6]). We follow Du and Zhu [17,Eq. 42] and consider maximizing the expected aggregate certainty equivalent for the M + M investors. The certainty equivalent CE k ∈ R for investor k ∈ {1, ..., M + M} is defined by the expressions in (2.5). The aggregate expected welfare is given by (3.17) where the expectation in (3.17) is ex ante in the sense that it is taken over the random variables (ã 1 , ...,ã M ) and w 0 (Gaussian and independent).    6 shows that in the price-friction equilibrium with α := 0, expected welfare is maximized at B(0) = 1 M . This is not too surprising because B(0) = 1 M implies full revelation and no dynamic learning takes place for t > 0 (see the discussion after Lemma 3.4). Aggregate welfare is decreasing in the initial tracker parent standard deviation σ w 0 both because more demand accommodation is required and also because the rebalancer learning problem is more difficult. This effect can be seen by comparing the blue and green (low initial standard deviation) and amber and red (high initial standard deviation) cases in Fig. 6.

Subgame perfect Nash equilibrium
This section builds on the analysis in Section 3 by endogenizing stock-price perceptions and price impact. In particular, we partially endogenize the impact of an investor's hypotheti-cal off-equilibrium holdings on off-equilibrium market-clearing stock prices based on her perceptions of how other investors perceive prices and on other investors' resulting optimal response functions to her off-equilibrium holdings. More specifically, a subgame perfect Nash equilibrium involves describing how each trader k 0 (who might be a rebalancer i 0 or a tracker j 0 with their different filtrations) perceives all other traders' price perceptions.
The major difference between the price-friction equilibrium in Sect. 3 and our subgame perfect Nash equilibrium lies in the traders' stock-price perceptions. For a subgame perfect Nash equilibrium, investor stock-price perceptions must be such that: (i) Trader k 0 's own stock-price perceptions must be consistent with market-clearing for any off-equilibrium holdings θ k 0 ,t used by k 0 , when other traders' holding responses are optimal given the stock-price dynamics k 0 perceives other traders k = k 0 to have. This off-equilibrium market-clearing requirement can be found in, e.g., Vayanos [35]. (ii) Trader k 0 's equilibrium holdings are found by solving her optimization problem using her own market-clearing stock-price dynamics from (i). (iii) All optimizers from (i) must be consistent with traders' equilibrium holdings in (ii). Definition 4.3 below makes properties (i)-(iii) operational. We refer to the last property (iii) as a consistency requirement between off-and on-equilibrium holdings.

Optimal off-equilibrium responses
In our subgame perfect Nash model, a generic trader k 0 perceives that other rebalancers and trackers have stock-price perceptions of the form S Z j,0 := Z 0 , j ∈ {M + 1, ..., M +M}, where W i,t and W j,t are Brownian motions and Z t is an arbitrary Itô process (i.e., Z t is a sum of drift and volatility). The "Z " superscript in (4.1) indicates that the perceived stock prices S Z i,t and S Z j,t are defined with respect to Z t . We use the market-clearing condition (2.3) to construct two such Itô processes in (4.5) and (4.8) below. These Z t processes differ from Y t in (3.1) and (3.3) in that we use Z t to capture the effect of arbitrary off-equilibrium stock holdings by trader k 0 on market-clearing prices given optimal responses by other investors k, k = k 0 . We then go on to determine endogenously the deterministic functions (μ 1 , μ 2 , μ 3 ,μ 4 ,μ 5 ) in equilibrium in Theorem 4.5 below.
Lemma 4.1 gives traders' optimal response to an arbitrary Itô process Z t and is the Nash equilibrium analogue of Lemma 3.2.

Market-clearing stock-price perceptions
Investor k 0 's perceptions about other investors' stock-price perceptions ensure that the stock market clears for any choice of k 0 's holdings. Thus, when solving for trader k 0 's individual equilibrium holdings, we require k 0 's perceived stock-price process (denoted by S ν k 0 ,t below) clears the stock market for arbitrary hypothetical holdings θ k 0 ,t . We assume that a given trader k 0 ∈ {1, ..., M +M} perceives that other traders k = k 0 perceive the stock-price processes in (4.1). Hence, trader k 0 perceives that other traders k, k = k 0 , optimally hold θ Z k,t in (4.2) shares of stock. Given this, we then find market-clearing Z k 0 ,t processes associated with arbitrary hypothetical holdings θ k 0 ,t for trader k 0 .
First, consider a rebalancer i 0 ∈ {1, ..., M}. We construct a process Z i 0 ,t such that the stock market clears in the sense where θ i 0 ,t denotes an arbitrary stock-holdings process for rebalancer i 0 and other investors' responses θ Z i 0 k,t are from (4.2) for Z t := Z i 0 ,t . Clearly, any solution Z i 0 ,t of (4.3) is specific for rebalancer i 0 . To describe one particular solution Z i 0 ,t , we insert (4.2) into (4.3). This produces an affine equation in (θ i 0 ,t , Z i 0 ,t ,ã i 0 , q i 0 ,t , η t , w t ,ã ). Because rebalancer i cannot observe nor infer w t andã seperately, she has to filter based on observing a linear combination of w t andã given by Y t := w t − B(t)ã where B : [0, 1] → R is a continuously differentiable function satisfying where A(t) is as in (2.17). The specific form of (4.4) comes from rewriting (4.3) in terms of (4.5) The process Z i 0 ,t in (4.5) captures the impact of arbitrary holdings θ i 0 ,t by rebalancer i 0 on market-clearing stock prices given i 0 's perceptions of how other traders k = i 0 optimally respond using θ Z i 0 k,t from (4.2) with Z t := Z i 0 ,t . Next, we describe rebalancer i 0 's stock-price perceptions for i 0 ∈ {1, ..., M}. Rebalancer i 0 filters based on her own targetã i and on observations of past and current perceived marketclearing stock prices S ν i 0 ,u defined by where (ã i 0 , θ i 0 ,t ) are known and (Z i 0 ,t , q i 0 ,t , η t 0 ) are inferred by rebalancer i 0 . The "ν" superscript in (4.6) indicates that the perceived stock prices are defined with respect to a particular set of deterministic functions (ν 0 , ν 1 , ν 2 , ν 3 ), which we endogenously determine in Theorem 4.5 below. More specifically, by observingã i 0 and (S ν i 0 ,u ) u∈[0,t] defined in (4.6), rebalancer i 0 infers Y t := w t − B(t)ã from (2.8) using the Volterra argument behind Lemma 3.1. To see this, we insert (4.5) into (4.6) to produce rebalancer i 0 's perceived market-clearing stock-price dynamics Next, consider a tracker j 0 ∈ {M + 1, ..., M +M}. For arbitrary off-equilibrium holdings θ j 0 ,t , the market-clearing solution Z j 0 ,t from is given by where A(t) is as in (2.17). Once again, Z j 0 ,t captures tracker j 0 's perceptions of the impact of her holdings θ j 0 ,t on market-clearing stock prices given j 0 's perceptions of other investors' k,t to θ j 0 ,t . Tracker j 0 's perceived market-clearing stock-price process is defined as d Sν j 0 ,t := Z j 0 ,t +ν 3 (t)η t +ν 4 (t)ã +ν 5 (t)w t + αθ j 0 ,t dt + γ dw t ,  . Inserting (4.9) into (4.10) gives tracker j 0 's perceived market-clearing stock-price dynamics (4.11) We note that tracker j 0 's perceived market-clearing stock-price dynamics d Sν j 0 ,t in (4.11) are driven by the exogenous Brownian motion w t from (2.2) whereas rebalancer i 0 's stock prices d S ν i 0 ,t in (4.7) are driven by i 0 's innovations process dw i 0 ,t from (2.11). This is due to the different information sets of rebalancers and trackers.
Unlike the price-friction equilibrium in Theorem 3.5, we see from (4.7) and (4.11) that, even with no direct price impact in the sense α := 0 in (4.6) and (4.10), the remaining net price impacts − 2ν 0 (t)κ(t) M+M−1 and − 2κ(t) M+M−1 of θ i,t and θ j,t are nonzero. This is because price pressure in (4.7) and (4.11) clears the stock market for arbitrary holdings θ i,t and θ j,t .
The next result gives the optimal holdings θ * k,t for all traders k 0 := k ∈ {1, ..., M +M} given their perceptions of market-clearing stock prices in (4.7) and (4.11). While both θ * k,t and the optimal response holdings θ Z k,t in (4.2) maximize (2.5), they differ because they are based on different perceived stock-price processes. On one hand, the optimal responses θ Z k,t in (4.2) are based on the stock-price perceptions in (4.1). On the other hand, the optimizer θ * k,t is based on the market-clearing stock-price perceptions in (4.7) and (4.11).
t] as in Lemma 4.1. Because these are i 0 's off-equilibrium perceptions, this is allowable as long as they are consistent with i's equilibrium holdings. We require this consistency in Definition 4.3(iii) below. We also note from Lemma 4.1 that rebalancer i can infer Z i 0 ,t in (4.5). In turn, this allows rebalancer i, i = i 0 , to also know the process 2(α−κ(t)) However, knowing (4.13) is insufficient for rebalancer i, i = i 0 , to infer rebalancer i 0 's private targetã i 0 . (ii) For k ∈ {1, ..., M +M}, inserting trader k's maximizer θ * k,t into the perceived marketclearing stock-price processes (4.7) and (4.11) produces identical stock-price processes across all traders. This common equilibrium stock-price process is denoted by S * t . (iii) Optimizers and equilibrium holdings must be consistent in the sense that trader k's perceived response to trader k 0 's maximizer θ * k 0 ,t is trader k's maximizer θ * k,t . (iv) The money and stock markets clear. ♦

Equilibrium
The identical stock-price requirement in Definition 4.3(ii) is similar to the one in Definition 3.3(ii). We see from the rebalancers' perceptions (4.6) that both the drifts and the martingale terms have i dependence. Similar to (3.5), we replace dw i,t in d S ν i,t in (4.6) with the decomposition of dw i,t in terms of dw t in (2.11) and rewrite d S ν i,t in (4.6) as (4.14) Therefore, to ensure identical equilibrium stock-price perceptions for all traders k ∈ {1, ..., M +M}, it suffices to match the drift of d Sν j,t in (4.10) for j ∈ {M + 1, ..., M +M} with the drift of d S ν i,t in (4.14) for the optimal holdings θ i,t := θ * i,t for i ∈ {1, ..., M}. This produces the requirement for all rebalancers i ∈ {1, ..., M} and all trackers j ∈ {M + 1, ..., M +M}. The right-hand side of (4.15) does not depend on the rebalancer index i. In (4.15), the process Z * i,t is (4.5) evaluated at θ i,t := θ * i,t , and Z * j,t is (4.9) evaluated at θ j,t := θ * j,t so that: , produces a Nash model with no dynamic learning because B (t) = 0 implies (t) = 0 and so dη t = dq i,t = 0. The resulting Subgame perfect Nash equilibrium model only has learning at t = 0 and can be seen as a special case of Choi, Larsen, and Seppi [13].
Our main theoretical result gives a Nash equilibrium in terms of the ODEs (4.19). In this theorem, the price-friction parameter α ≤ 0, volatility γ > 0, and initial value B(0) ∈ R are free parameters.  and dynamics with respect to the rebalancers' filtrations F i,t := σ (ã i , S ν i,u ) u∈ [0,t] given by The following observations follow from Theorem 4.5: 1. The logic for the initial value B(0) being a free input parameter is the same as in the price-friction equilibrium. 2. The price-friction parameter α and stock-price volatility γ affect the stock-price drift and holdings via its impact on B(t) in (4.19). The dependence on α is different from the price-friction equilibrium where the corresponding B(t) in (3.7) is independent of α. The reason is that α affects the perceived optimal responses in (4.2). 3. Similar to (3.11) and (3.12), for an arbitrary trader k 0 ∈ {1, ..., M +M} and her arbitrary holdings θ k 0 ,t , the optimal responses in (4.2) can be decomposed as where the equilibrium holdings (θ * i,t , θ * j,t , θ * k 0 ,t ) are in (4.20). 13 4. The subgame perfect Nash financial-market equilibrium is attractive because of its reasonable off-equilibrium market-clearing perceptions. However, although much of the mathematic structure is similar, the expressions for the equilibrium stock price and holding coefficients are algebraically more complex. Nonetheless, our numerical results in Sect. 3.4 below show that the differences between the price-friction and the subgame perfect Nash financial-market equilibria are quantitatively small. This, in turn, suggests that the economic logic from the price-friction equilibrium carries over to the Nash equilibrium.

Numerics
We have experimented extensively with the subgame perfect Nash model's numerics, and its numerics are very similar to the numerics of the price-friction equilibrium in Sect. 3. The numerical similarity of the two equilibria suggests that the intuitions for the signs of the various coefficients in the price-friction equilibrium carry over to the subgame perfect Nash financial-market equilibrium. Because the two equilibria produce similar numerics, it appears that the in-equilibrium market-clearing requirement (common in both equilibria) has a much larger effect on equilibrium prices relative to the off-equilibrium market-clearing requirement (only present in the subgame perfect Nash equilibrium).

Empirical predictions
The primary contribution of our analysis is theoretical. The Kyle model has provided a tractable framework for a large body of theoretical research on price discovery and dynamic order splitting given long-lived asymmetric information about stock cash flows. However, no corresponding tractable framework exists for modeling price discovery and dynamic order splitting with private trading targets (e.g., by large index funds). Our model provides such a framework. While our zero-dividend modeling approach precludes statements about the impact of order on price levels, our analysis does have empirical implications for intraday price drifts: First, intraday price predictability is an important empirical driver of high-frequency liquidity provision. Our model's equilibrium price dynamics in (3.9) and (4.21) suggest that intraday price drifts are path dependent (via the η t term) and also that learning about parentdemand imbalances early in the trading day is associated with predictable price drifts later in the day.
Second, our analysis provides insights about the determinants of price impact as it relates to imbalance-related parent trading demands and toxic cumulative order flow. In particular, the holdings θ k,t are cumulative trading up through time t, and large parent targetsã i lead to toxic streams of orders. Our subgame perfect Nash model endogenizes the price-drift impact of investor holdings (i.e., cumulative trading). The Nash model's price-friction coefficient in the rebalancer's perceived stock-price dynamics (4.7) is given by where we have inserted ν 0 (t) from (A. 6). An implication of (5.1) is that if, as is widely believed, investor target penalties become stronger as time passes (i.e., if κ(t) increases with time), then our Nash model predicts that the total price impact in (5.1) should increase. On its face, this is contrary to evidence in Barardehi and Bernhardt [6] that price impact declines over the trading day. We conjecture, however, that a richer model can be reconciled with these stylized facts if the number of investors (and, thus, the available inventory bearing capacity to absorb aggregate parent demand imbalances) is also allowed to grow as the market approaches the end of the trading day. Increased investor participation toward the end of the trading day is also empirically common.

Measuring execution costs
As an application, this section gives a measure of a rebalancer's costs of rebalancing from zero endowed shares at time t = 0 to a given targetã i . We present the measure in the pricefriction equilibrium in Sect. 3 (the Nash analogue is logically similar and produces similar numerics). In the price-friction equilibrium, rebalancer i's value function is is a martingale with respect to F i,t . Because rebalancer i's objective in (2.5) is linearquadratic, the value function J is again linear-quadratic in the state processes. Thus, J can be written as  (6.4) measures the dependence the change in profit (i.e., change in value function) associated with a non-zero targetã i . The rebalancing cost RC in (6.4) for a targetã i is computed as the difference between the value function evaluated atã i and the function evaluated atã i = 0. Since the value function J is highest atã i = 0, the measure RC is positive. Figure 7 plots the rebalancer's value function J for different target valuesã i for different model parameterizations. When the targetã i is close to zero, the rebalancers become high-frequency liquidity providers. Their value function is positive due expected profit from liquidity provision and price-pressure speculation. As the target moves away from zero, the rebalancer starts to have larger stock-holding penalties that eventually drive the rebalancer's value function negative. Interestingly, the impact of the stock-price volatility parameter γ on the rebalancer's value function can be positive or negative. Liquidity providing rebalancers are better off with a small γ whereas rebalancers with large rebalancing targets are better off when γ is large.

Conclusion
This paper presents the first analytically tractable model of dynamic learning about parent trading-demand imbalances with optimized order-splitting. In particular, we provide closedform expressions prices and stock holdings in terms of solutions to systems of coupled ODEs in both the price-friction and Subgame perfect Nash equilibria. Trading in our models reflects a combination of reaching investor's own trading targets, liquidity provision so that markets can clear, and speculation based on predictions of future price pressure.
There are many interesting directions for future research based on our analysis. First, replacing the zero-dividend stock approach with valuation based on a terminal payoff would be a significant technical step. Second, the model could be enriched by allowing for investor heterogeneity in the form of different penalty functions κ(t) and by having multiple tracker targets (which would weaken the trackers' informational advantage). Third, it would be interesting to investigate if other off-equilibrium refinements have larger equilibrium effects. Fourth, incorporating risk-aversion into the investors' objectives would be interesting too. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A.2.2 Price-friction equilibrium stock dynamics
For the trackers, we can rewrite the drift in (3.9) in terms of (ã , w 0 ) and an residual orthogonal term as For the rebalancers, we can rewrite the drift in (3.10) in terms of (ã −ã i , w 0 ,ã i ) and an residual orthogonal term as To explicitly solve for M i=1 q i,t , we note We get the solution M i=1 q i,t by integrating For the second part, we write the solution to the Ornstein-Uhlenbeck SDE for dη t in (2.17) as where the deterministic functions F 1 (t) and F 2 (t) are given by the ODEs in (2.19). Similarly, the the Ornstein-Uhlenbeck SDE for dq i,t in (B.2) has solution By comparing (C.6) and (C.7), we get (2.18). ♦

Proof of Lemma 3.1
The inclusion "⊇" in (3.2) follows from (2.4), (2.10), and (2.14). To see the inclusion "⊆", we use Y t in (2.8), η t in (C.5), and q i,t in (B.2) to find deterministic functions h 0 , h, and H such that (C.8) We define The inclusion "⊆" in (3.2) will follow from the inclusion (C.13) The equality in (C.13) follows from the square integrability condition (2.6), which ensures that the stochastic integral s 0 θ i,t dw i,t is a martingale with zero expectation. We can maximize the integrand in (C.13) pointwise because the second-order condition α < κ(t) holds. This gives the first formula in (3.4).
The second formula for a tracker j in (3.4) is proved similarly. ♦
To this end, we set K := 1 0 κ(s)ds < ∞. (C.14) First, the Riccati ODE for (t) has the explicit solution in (2.15), which cannot explode as t ↑ τ (even if B(t) should explode as t ↑ τ ).
Second, the initial value A(0) in (3.7) ensures A(0) ≥ −1 and to see that implies A(t) ≥ −1 for all t ∈ [0, τ ), we note This shows that A(t) cannot explode as t ↑ τ (even if B(t) should explode as t ↑ τ ). Third, we show B(t) is uniformly bounded for t ∈ [0, τ ); hence, also B(t) cannot explode as t ↑ τ . This then gives the desired contradiction because of Theorem II.3.1 in Hartman [23]. The affine ODE for B(t) in (3.7) has the explicit solution where K is as in (C.14). Similar to (C.17), the explicit solution of (C.23) is where c 0 is defined in (C.25). Because κ(t) is continuous on t ∈ [0, 1], κ(t) is bounded and from (C. 19) we know that B(t) is bounded too. Therefore, from (C.29), we see that B (t) is also uniformly bounded. Consequently, the variances V[q i,t ], V[η t ], and V[Y t ] are also uniformly bounded functions of t ∈ [0, 1]. As before, the coefficient functions for (ã i , q i,t , η t , Y t , w t ,ã ) in (4.20) are all uniformly bounded for t ∈ [0, 1]. Therefore, the square-integrability condition (2.6) holds.
The requirements in Definition 4.3 follow from the definition of the functions in (A.6). ♦