Optimal pairs trading with dynamic mean-variance objective

Pairs trading is a typical example of a convergence trading strategy. Investors buy relatively under-priced assets simultaneously, and sell relatively over-priced assets to exploit temporary mispricing. This study examines optimal pairs trading strategies under symmetric and non-symmetric trading constraints. Under the assumption that the price spread of a pair of correlated securities follows a mean-reverting Ornstein-Uhlenbeck(OU) process, analytical trading strategies are obtained under a mean-variance(MV) framework. Model estimation and empirical studies on trading strategies have been conducted using data on pairs of stocks and futures traded on China’s securities market. These results indicate that pairs trading strategies have fairly good performance.


Introduction
Statistical arbitrage trading strategies have been widely used in financial markets. The implementation of statistical arbitrage trading strategies may restrain excessive speculation, and enhance market liquidity. A convergence trade is a statistical arbitrage trade that exploits mispricing of two assets with similar trends in payoffs in the future. As reported by Liu and Timmermann (2013), convergence trades include merger arbitrage (risk arbitrage), pairs trading (relative value trades), on-the-run/off-the-run bond trades, tranched structured securities, and arbitrage between the same stocks trading in different markets. Pairs trading was pioneered by Gerry Bamberger, and further developed by Nunzio Tartaglia's quantitative group at Morgan Stanley in the 1980s (Gatev et al. 2006). The core idea of pairs trading is to sell overpriced security, and buy underpriced securities when the price spread widens. It also involves clearing the Extended author information available on the last page of the article trading position when the price spread converges. Huck (2010) proposed a general and flexible framework for selection of pairs and a multi-step-ahead forecast method. We refer the reader to Whistler (2004) and Reverre (2001) for more details about pairs trading.
Studies on pairs trading primarily focus on three major approaches, namely, the distance approach, stochastic spread approach and cointegration approach. The distance approach is a trading strategy that attempts to make a profit when the sum of squared differences between two stock prices triggers a prescribed threshold ( Nath 2003). The distance method lacks forecasting ability despite its straightforward structure, owing to the convergence time and the expected holding period (Do et al. 2006). The stochastic spread approach (Elliott et al. 2005) describes the temporary divergence in the prices of two correlated securities. The divergence in prices may be attributed to liquidity shortages, and is expected to converge to an equilibrium level in the future. Song and Zhang (2013) explored optimal stopping problems by maximizing the overall return under the mean-reverting assumption. Sperling and Siu (2018) further considered regime-switching by extending the model reported by Göncü and Akyildirim (2016). The cointegration approach is based on the premise that a pair of asset price series is cointegrated. Vidyamurthy (2004) and Gatev et al. (2006) pioneered the cointegration approach in pairs trading research. This approach was further developed by Lin et al. (2006) using optimal loss protection. Explicit optimal portfolio trading strategies were derived under the MV and expected utility objective functions (Liu and Timmermann 2013,Chiu and Wong 2013,Chiu and Wong 2015. Due to its tractability and flexibility, we consider the conintegration approach in this study. Markowitz (1952) pioneered the MV paradigm for portfolio selection in a singleperiod modelling framework. The MV criterion has been further investigated in the discrete-time multiperiod setting (Li and Ng 2000), continuous-time with bankruptcy prohibition (Bielecki et al. 2005), and mean-risk formulation (Cui et al. 2017). The expected utility framework has also been studied widely in the context of the portfolio selection problem since the pioneering works of (Merton 1969(Merton , 1971). These two frameworks represent different investment preferences of various market participants, and have attracted considerable attention in the finance literature. Mudchanatongsuk et al. (2008) and Tourin and Yan (2013) explored optimal pairs trading strategies with the expected utility on the terminal wealth. Inspired by these two works, we study the optimal pairs trading strategies of MV-preference investors. Wang and Zhou (2020) identified two main reasons for the popularity of the MV criterion. First, the MV criterion is intuitively appealing from a practical perspective. In addition, it is transparent in terms of capturing the tradeoff between risk and return, which is one of the main concerns of traders and investors. Second, the MV criterion leads to a theoretically intriguing issue of the Bellman's inconsistency inherent to the underlying stochastic control problems, which is interesting from a theoretical perspective. It may be noted that in some cases, the MV criterion may lead to a simple solution to the portfolio selection problem, which entails practically meaningful interpretation, though the challenging issue of Bellman's inconsistency needs to be revolved before achieving the simple solution. As noted in, for example, Bielecki et al. (2005) indicated that the basic concept of the MV model is a foundation of neo-classical finance theory, including the mutual fund theorem, the elegant capital asset pricing model etc.
In the MV framework, the inadequacy of the iterated-expectations property leads to the inability of applying the traditional dynamic programming approach. This renders optimality conceptually unclear (Björk and Murgoci 2010). A pre-commitment strategy was reported that aims to find a strategy or a control that maximises the initial value function at a fixed starting time point, while disregarding the fact that a decision maker or investor may have an incentive to deviate from the initial policy at a later time (Dang and Forsyth 2016;Kryger and Steffensen 2010). However, this strategy is not time-consistent. Specifically, when the same problem is solved at a later time, the resulting optimal control will be different from that obtained at the starting time. To address this time-inconsistency, Basak and Chabakauri (2010) adopted a game theoretic approach to solve a continuous-time MV problem for an investor who updates her nonlinear MV objective by taking future updates into account in a timeconsistent manner, and derived an equilibrium control policy. For more details about time-consistent equilibrium controls, we refer the reader to Strotz (1955), Krusell and Smith (2003), Björk et al. (2014) and Huang and Nguyenhuu (2018). In this study, we consider time-consistent trading strategies for the pairs trading problems.
Building on existing works such as Mudchanatongsuk et al. (2008), Basak and Chabakauri (2010), Tourin and Yan (2013), and Gu et al. (2020), an optimal trading strategy is formulated as a dynamic MV portfolio selection problem. The price spread of two correlated securities is modelled by an OU process, which captures the meanreverting property of the price spread. In Mudchanatongsuk et al. (2008) and Tourin and Yan (2013), the expected utility maximisation objective was considered using the Bellman principle. The objective of this study is to investigate time-consistent pairs trading strategies with an MV objective. By employing the approach based on the total variance formula in Basak and Chabakauri (2010), the original optimization problem is transformed into a quadratic form, and an analytical solution is obtained. To explore the potential implementation of the proposed approach, the empirical studies on the optimal trading strategies are conducted using data on pairs of stocks and futures traded on China securities market.
In summary, the key contributions of our paper are as follows. Firstly, a closed-form optimal trading strategy is obtained under the assumption that the spread of the asset prices follows an OU process, and the portfolio weights allocated to the two assets are symmetric. Secondly, we extend the model setup to allow for non-symmetric portfolio weights. This leads to a more general trading strategy. Third, we calibrate the model parameters for different pairs of assets from the Chinese securities market, including stocks and futures, to validate the analytical optimal solutions. The paper is structured as follows. The next section presents the model setup for pairs trading adopted from Mudchanatongsuk et al. (2008). Section 3 discusses the formulation of optimal pairs trading problems with a dynamic MV problem under two different settings. The time-consistent solutions to the problems in both situations are presented. Section 4 presents empirical illustrations, and finally, Sect. 5 concludes the paper. The proofs and derivations of some results are provided in the "Appendix".

The model dynamics in pairs trading
In this section, the dynamics for the price spread and the pairs trading strategies are described in a continuous-time modeling framework, as in Mudchanatongsuk et al. (2008). A continuous-time financial market is considered, where the time parameter set is [0, T ], (i.e., t ∈ [0, T ]). Hereafter, we simply use the (continuous) time index t without referring to the time parameter set for convenience. The uncertainties are described by a complete probability space ( , F, P), where P is a real-world probability measure. Now we consider three tradeable securities in the market, namely, a risk-free asset and two risky assets, where the price dynamics of two risky assets are assumed to be cointegrated. We also impose some standard assumptions for a perfect market as follows. There are no transaction costs or taxes in trading these securities and short selling was allowed. The main purpose of this study is to obtain optimal timeconsistent pairs trading strategies, and the method may be applicable when transaction costs or taxes are considered.
Let r be the continuously compounded rate of interest, which is assumed to be a positive constant for simplicity. The price of the risk-free asset at time t is denoted by M(t) and it satisfies the following differential equation: (1) Let A(t) and B(t) denote the prices of the pair of assets A and B at time t, respectively. We assume that the price of stock B follows the geometric Brownian motion: where μ and σ are the constant drift and volatility, respectively; {Z (t)} is a standard Brownian motion. Let X (t) denote the price spread of stocks A and B at time t, which is defined as follows: To capture the mean-reverting property, we assume that the above price spread follows an OU process: where {W (t)} is another standard Brownian motion; k > 0 is the rate of mean reversion; θ is the long-term mean of the process; η > 0 is the volatility of the price spread; ρ is the instantaneous correlation coefficient between the two Brownian motions {Z (t)} and {W (t)}. Therefore, by a straightforward calculation, we obtain The information structure of the model is specified by a filtration {F t }, which is the natural filtration generated by the two correlated Brownian motions {W (t)} and {Z (t)} augmented by the P-null sets. For notational convenience, we denote the conditional expectation and the conditional variance given F t as E t (·) and V ar t (·) respectively under the probability measure P. We calibrate the proposed model by following an approach based on the maximum likelihood estimation method proposed by Mudchanatongsuk et al. (2008).

The dynamic MV problem
In what follows, the optimal pairs trading problems are formulated as MV portfolio selection problems under two cases: following Basak and Chabakauri (2010) and Gu and Steffensen (2015). The MV problems for optimal pairs trading are solved by employing the dynamic programming principle, and two cases with different trading constraints are discussed. In the first case, the portfolio weights invested in the two risky assets are assumed to have a sum of zero. However, this constraint was relaxed in the second case. In the two cases, the problems were formulated as quadratic optimization problems. Then, the problems were solved by combining the Feymann-Kac formula and the obtained Hamilton-Jacobi-Bellman (HJB) equation. The main results of the time-consistent optimal solutions for the dynamic MV problems in the two situations are provided in Propositions 1 and 2.

Case I
Let V (t) be the value of a self-financing pairs trading portfolio. We denote h(t) and h(t) as the portfolio weights invested in stocks A and B at time t, respectively. In this model, we assume that the stocks A and B can only be traded as pairs. Specifically, we are only allowed to short one of them and long the other one in equal units. Thus, we require h(t) = −ĥ(t). The wealth process V (t) becomes: Substituting Eq.
(2) and Eq. (5) into Eq. (6) gives: denotes the present amount invested in the stocks. Eq. (7) can then be rewritten as follows: or equivalently, (9) The objective of the dynamic MV problem is given by: where λ < 0. Note that by the joint Markov property of (X (t), V (t)) with respect to the filtration {F t }, the conditional expectation E t and conditional variance V ar t are indeed of the form E(·|X (t); V (t)) and V ar(·|X (t); V (t)), respectively. Suppose that π * (·) denotes the time-consistent control and V * (·) denotes the respective wealth process. Then, we define the value function as follows: In short, we also write J (t, X (t), V (t)) as J t in the following content. We consider the situation where decisions are made in the time horizon It is known that the decision-makers follow the equilibrium law π * (s) after time t + τ . The objective function is different from the traditional dynamic one in the sense that there is a time-consistent adjustment term λV The presence of this time-consistent adjustment term implies that {π * (s)} s≥t+τ may not be optimal at time t, in addition to the failure of Bellman's optimality principle. The time-consistent adjustment term λV ar t [E t+τ (V (T ))] arises due to the "Total Variance Formula" (Basak and Chabakauri 2010). Applying the techniques in HJB dynamic programming by considering time consistency, the dynamic MV problem with the objective function in Eq. (10) and the dynamic budget constraint in Eq. (6) can be solved. The solution is presented in the following proposition.

Proposition 1 A time-consistent solution to the dynamic MV problem in Eq.
(10) with the dynamic budget constraint in Eq. (6) is given by: The respective optimal weight in pairs trading is given by: Proof The proof is given in the "Appendix".
Remark 1 -Proposition 1 implies that with an increase in volatility σ or an increase in the correlation coefficient ρ, the investor allocates more funds to risky assets.
This makes intuitive sense, because when σ increases, the amount of uncertainty also increases. This may lead to more opportunities for arbitrage. Furthermore, with an increase in the correlation of price pairs, the price spread tends to converge. This may lead to higher profits upon investing in risky securities. -From the expression π * (t) in Eq. (12), we can see that π * (t) = O((T − t) 2 ). We also obtain that This means that when T is sufficiently large, the optimal weight in pairs trading is considerably small. This highlights the insight that to prevent volatility risk, traders may tend to hold small positions when the trading period is long.
Proposition 2 (Verification Theorem) Assume thatJ is a solution of Eq. (18) with terminal conditionJ (T , X (T ), V (T )) = V (T ), and control π * realizes the supremum in the Eq. (18). Then π * is an equilibrium control and the corresponding value function isJ .
Proof For any perturbation π ,u (s) := u1 s∈[t,t+ ) + π * (s)1 s∈ [t+ ,T ] , we aim to prove that lim inf We skip the details of the proof, as it is similar to the proof of Theorem 7.1 in Björk and Murgoci (2010).

Case II
In the above analysis, we require that h(t) = −ĥ(t). The general situation where this trading constraint is relaxed is considered in this subsection. In this case, the wealth equation for {V (t)} is given by: This implies that The control problem becomes: Same as in Case I, the conditional expectation E t and conditional variance V ar t are of the form E(·|X (t); V (t)) and V ar(·|X (t); V (t)), respectively. Given the optimal policyπ * (·) and the respective wealth processV * (·), the value functionĴ is defined as follows: and we sometimes writeĴ t for short. The main result of this case is presented in the following proposition.

Proposition 3 A time-consistent solution to the dynamic MV problem in Eq.
(15) with the dynamic budget constraint in Eq. (13) is given by: where g is given by: The respective optimal weights, therefore, are given by: Proof The proof of this proposition is given in the "Appendix".
Remark 2 -Similarly to Case I, H * (t) → 0 when T → ∞. This coincides with the previous case and verifies again that the investor would be more cautious after a long period. -Similarly to Proposition 2, for the corresponding verification theorem, one may refer to the specific case of Theorem 7.1 in Björk and Murgoci (2010). -Tourin and Yan (2013) analyze the optimal pairs trading strategies with exponential utility function U (w) = −e −γ w . The optimal strategies under our set up with the exponential utility function are given as follows: 4γ (η 2 −σ 2 ) ⎞ ⎠ when r = 0. For investors with MV preference when r = 0, the optimal strategies are given as follows: The optimal strategies for investors with different preferences are quite different with each other. Mudchanatongsuk et al. (2008) consider expected power utility investors with "symmetric" positions(the same as case I in our setting), the optimal results obtained there is also quite different from ours which is obtained with MV criterion. Tourin and Yan (2013) investigate expected exponential utility investors with "asymmetric" positions(the same as case II in our setting) allocated to each risky asset. The results above demonstrate the differences between their optimal strategies and ours. In summary, market participants with different preferences behave heterogeneously. Furthermore, it is unclear if the properties discussed in Remarks 1 and 2 would still hold for the optimal solutions obtained by Mudchanatongsuk et al. (2008) and Tourin and Yan (2013).  Mudchanatongsuk et al. (2008), the related parameters are estimated with the selected training datasets. For the details about the analytical formulas for the parameters estimates, please refer to the "Appendix" of Mudchanatongsuk et al. (2008). Now we focus on the three pairs of stocks. Figures 1, 3 and 5 present the dynamics of pairs of stock prices, which show that the three price pairs converge at some time points. For illustration, we assume the interest rate r and the risk coefficient λ to be 5% and −1.5 respectively. By using the moving-window method, we conduct outof-sample testing for all stock datasets. We investigate the log-returns of our pairs trading strategies from 02 January 2014 to 31 March 2016 (2.25 year) and update the parameters on each trading during this period. Specifically, we estimate the related parameters for each trading day by using the data of the previous year, and update them accordingly. One sample path of investors' wealth obtained from time-consistent pairs trading strategies in cases I and II (V * (·) andV * (·) respectively) with an initial endowment of 100 units are presented in Figs. 2, 4 and 6, where the blue lines represent the wealth dynamics by applying the purely-buy-and-sell-securities strategy (with strict constraints), i.e. case I. The red lines represent the wealth dynamics by applying the trading strategy with relaxed constraints, i.e. case II. Figures 2, 4 and 6 indicate the effectiveness of our strategies by comparing them with the wealth dynamics(yellow lines) obtained using conservative investment strategies, which place all endowments in banking accounts. All three figures show that the asymmetrical strategies always dominate the symmetric ones. This phenomenon is reasonable, because the strategies in case II are more flexible. Specifically, since our model is asymmetric with two assets, different choices of risky assets assigned to A and B in Eq.

Empirical experiments
(3) yield distinct optimal results. The optimal wealths obtained with alternative choices of A and B are presented in the "Appendix". Investors may use the maximum likelihood estimation method to determine the configuration of the risky assets pairs. For a deeper investigation of these experiments, we simulated the scenarios 1000 times, and the statistical results of the investors' annual log-returns are shown in Table 1. In this table, S.D. stands for standard deviation. Table 1 indicates that for each pair of selected stocks, the mean of the annual yield (log-returns) under relaxed constraints dominates the respective results under strict constraints. This phenomenon is consistent with the results shown in Figs. 2, 4 and 6. Now, we examine the corresponding results for the selected pair of futures. By setting r = 5% and λ = −1.5, we provide the parameter estimates using the datasets in the period from 1 February 2016 to 31 May 2016. The price dynamics of the two futures are depicted in Fig. 7. Subsequently, we investigate the wealth dynamics using time-consistent pairs trading strategies in cases I and II ((V * (·) andV * (·) respectively)) and the conservative strategy with initial 100 units from 1 June 2016 to 31 August 2016 (Fig. 8). Due to the short testing period (1 June-31 August 2016), we dismissed the parameter updating. The wealth dynamics of three strategies in Fig. 8 show that the results of this example are in agreement with those for stock pairs. Table 2 reports the log-returns of investors with different risk parameters λ during the testing period (with 1000 simulations). We notice that the mean of log-returns decreases as λ decreases. This is reasonable, because when the risk parameter λ decreases, the investor becomes more risk averse. This may result in less expected profits. Thus, the obtained results show exceptional performance of the strategies. The following implicit assumptions may explain this phenomenon. First, the liquidity of the strategies, especially for shorting assets, is assumed to be quite high. Second, we ignore the related transaction costs. Third, the pairs that we have chosen exhibit great convergence trends, while the short-run arbitrage opportunities do not always exist in reality.

Conclusion
This study provides analytical equilibrium control strategies for the optimal MV problem of pairs trading. Specifically, we assume that the price spread of a pair of correlated risky securities follows a mean-reverting OU process. Explicit time-consistent results are derived by solving optimization problems using the dynamic programming approach, and we examine explicit solutions using selected stocks and futures traded on China's securities market. The numerical experiments indicate that our pairs trading strategies yield an annual profit with a modest standard deviation.
In this work, we mainly focus on exploring optimal strategies and considering an ideal market. However, funds trades have many constraints in reality. For instance, limitations in short-selling, regulatory constraints, and other market regulations. Furthermore, funds are always confronted by liquidity and funding risks. Adapting our proposed strategies to these issues is a potential scope for future research.
(17) This implies that as τ becomes small, By substituting Eq. (9) into Eq. (11), where c(x, t) represents the sum of the second and third terms in the above equation for convenience. By Eq. (9), , which is the expected gains or losses of the investor over the horizon T − t under the time-consistent control. Then Eq. (18) becomes: subject to J T = V (T ) and the constraint Eq. (7). By Basak and Chabakauri (2010), f is a function of x and t only. By applying Itô's lemma and the Feynman-Kac Theorem (Theorem 7.6, Karatzas and Shreve (2012)), Eq. (20) gives: where Dc denotes the Dynkin operator on the function c(x, t), and it is defined as follows: We obtain that Applying the Feynman-Kac theorem to f gives: Substituting π * (t) into Eq. (23) gives: We have an ansatz for f : With Eq. (24), we obtain a system of ODEs: Solving the system of ODEs in Eq. (25), we obtain that Since f (X (T ), T ) = 0, we can also solve the unknown constants as follows: Substituting f into Eq. (22) yields the reported result.
The objective function (29) is equivalent to: Since Q is a symmetric positive definite matrix, this is a convex optimization problem and the optimal solution is given byπ * (t) = −Q −1 b. By applying Feynman-Kac theorem tof , we have: Similar as before, we have an ansatz forf : For notational convenience, in what follows, we denoteÃ = μ + 1 2 η 2 + ρσ η − r .
Substituting Q −1 into the above equations gives: