1 Introduction

Portfolio Selection (PS) is a practical financial engineering problem that requires determining a strategy of investing wealth among a set of assets in order to achieve certain objectives, such as maximizing cumulative wealth or risk-adjusted return, in the long run. In this article, we investigate sequential portfolio selection (also termed online portfolio selection) strategies, which sequentially determine portfolios based on publicly available information.

Traditionally in finance, portfolios are often selected according to mean-variance theory (Markowitz 1952, 1959) or its variants, to trade off between return and risk. In recent years, this problem has also been actively studied from a learning to select portfolio perspective, with roots in the fields of machine learning, data mining, information theory and statistics. Rather than trading with a single stock using computational intelligence techniques, learning to select portfolio approach focuses on a portfolio, which consists of multiple assets/stocks. Several approaches for online portfolio selection, often characterized by machine learning formulations and effective optimization solutions, have been proposed in literature (Kelly 1956; Breiman 1961; Cover 1991; Ordentlich and Cover 1996; Helmbold et al. 1996; Borodin and El-Yaniv 1998; Borodin et al. 2000, 2004; Stoltz and Lugosi 2005; Hazan 2006; Györfi et al. 2006; Blum and Mansour 2007; Levina and Shafer 2008; Györfi et al. 2008). Despite being studied extensively, most approaches are limited in some aspects or the other.

Our goal of this work is to investigate a new online portfolio selection strategy that employs online learning techniques to exploit the financial markets. Some existing strategies adopt the trend following approach, that is, they assume that price relative will follow its historical trading days. However, this philosophy fails when price relatives do not go in any particular direction, but rather actively move within a range. So in this study, we exploit another well-known principle in finance, viz., mean reversion (Jegadeesh 1990), through an online machine learning framework. To this end, we propose a novel portfolio selection strategy named “Passive Aggressive Mean Reversion” (PAMR), which exploits the mean reversion property of financial markets by online passive aggressive learning (Crammer et al. 2006). PAMR’s key idea is to formulate a new loss function that can effectively exploit the mean reversion property, and then adopt passive aggressive online learning to search for optimal portfolio among the asset pool to maximize the cumulative return.

Under different scenarios, the proposed PAMR strategy either passively keeps last portfolio or aggressively approaches a new portfolio by following the mean reversion principle. By solving three well formulated optimization problems, we arrive at three simple portfolio update rules. It is interesting to find that the final portfolio update scheme reaches certain trade-offs between portfolio return and volatility risk, and explicitly reflects the mean reversion trading rule. Moreover, we propose a mixture algorithm, which mixes PAMR and other strategies, and show that the mixture can be universal if one universal strategy is included. The key advantages of PAMR are its highly competitive performance and fairly attractive computation time efficiency. Our extensive numerical experiments on various real datasets show that in most cases the proposed PAMR strategy is quite performance efficient in comparison to a number of state-of-the-art portfolio selection strategies under a variety of performance metrics. At the same time, the proposed strategy costs linear time with respect to the product of the number of stocks and trading days, and its computational time in back tests is orders of magnitude less than its competitors, showing its applicability to real-world large scale online applications.

As a summary, our contributions in this article include:

  1. 1.

    We propose a new algorithm for online portfolio selection, named “Passive Aggressive Mean Reversion” (PAMR). To the best of our knowledge, it is the first portfolio selection strategy that exploits both the mean reversion property in finance and the powerful online passive aggressive learning technique in machine learning.

  2. 2.

    We propose a mixture algorithm to mix the proposed PAMR algorithms and other universal strategies, resulting in a theoretically guaranteed universal mixture strategy.

  3. 3.

    We analyze the final portfolio update scheme of PAMR and show that it is essentially related to certain trade-offs between portfolio return and volatility risk.

  4. 4.

    We conduct an extensive set of numerical experiments on a number of up-to-date datasets from various markets. The results show that in most cases the proposed PAMR strategy not only outperforms the benchmarks (including market index, best stock and challenging best constant rebalanced portfolio (Cover 1991) in hindsight), but also outperforms various state-of-the-art strategies under various performance metrics tested.

  5. 5.

    We also extend the proposed strategy to handle some practical issues for a real-life portfolio selection task, viz., transaction cost and margin buying, and show its practical viability through the extensive empirical study.

  6. 6.

    We show that the time complexity of the proposed algorithm is linear with respect to the number of stocks per trading day, and its empirical computational time in the back tests is quite competitive compared with the state of the arts, indicating the proposed strategy is suitable for online large-scale real applications.

The rest of the article is organized as follows. Section 2 formally states online portfolio selection problem. Section 3 reviews existing state-of-the-art approaches tackling this problem, and highlights their limitations. Section 4 presents our proposed PAMR strategy and analyzes the algorithm. Section 5 validates the effectiveness of PAMR by extensive empirical studies on historical financial markets. Finally, Sect. 6 summarizes this article and indicates future directions.

2 Problem setting

Let us consider a financial market with m assets, over which we wish to invest. The changes of asset prices for n trading periods are represented by a sequence of non-negative, non-zero price relative vectors \({\mathbf{x}}_{1}, \ldots, {\mathbf{x}}_{n}\in{\mathbb{R}}_{+}^{m}\). Let us use x n to denote such a sequence of vectors. The ith component of the tth vector x ti denotes the ratio of closing price to last closing price of the ith asset on the tth trading day, thus an investment in asset i on the tth trading day increases by a factor of x ti .

An investment in the market is specified by a portfolio vector b t =(b t1,…,b tm ), where b ti represents the proportion of wealth invested in the ith asset. Typically, we assume portfolio is self-financed and no margin/short is allowed, therefore each entry of a portfolio is non-negative and adds up to one, that is, b t ∈Δ m , where \({\Delta}_{m}= \{{\mathbf{b}} : {\mathbf {b}}\in {\mathbb{R}}_{+}^{m}, \sum_{i=1}^{m} b_{i} = 1 \}\). The investment procedure is represented by a portfolio strategy, that is, a sequence of mappings \(\mathbf{b}_{1}= (\frac{1}{m}, \dots, \frac{1}{m} ), {\mathbf {b}}_{t}: {\mathbb{R}}_{+}^{m(t-1)}\rightarrow\Delta_{m}, t=2, 3, \ldots\), where b t =b t (x 1,…,x t−1) is the portfolio used on the tth trading period given past market price relatives x t−1={x 1,…,x t−1}. Let us denote by b n the portfolio strategy for n trading periods.

For the tth trading day, an investment according to portfolio b t results in a portfolio daily return s t , that is, the wealth increases by a factor of \({\mathbf{s}}_{t}={\mathbf{b}}_{t}^{\top}{\mathbf{x}}_{t}=\sum_{i=1}^{m}b_{ti}x_{ti}\). Since we use price relative, the investment results in multiplicative cumulative return. Thus, after n trading days, the investment according to a portfolio strategy b n results in portfolio cumulative wealth S n , which is increased by a factor of \(\prod_{t=1}^{n}{\mathbf{b}}_{t}^{\top} {\mathbf{x}}_{t}\), that is,

$${\mathbf{S}}_{n} \bigl({\mathbf{b}}^{n}, {\mathbf{x}}^{n} \bigr) = {\mathbf{S}}_{0} \prod _{t=1}^n {\mathbf{b}}_{t}^{\top} {\mathbf{x}}_{t},$$

where S 0 denotes the initial wealth, and is set to $1 in this article for convenience.

Finally, we formulate the online portfolio selection problem as a sequential decision problem. The portfolio manager is a decision maker whose goal is to make a portfolio strategy on financial markets to satisfy certain requirements. In this study, his target is to maximize the portfolio cumulative wealth. He computes his portfolios in a sequential fashion. On each trading day t, the portfolio manager has access to all previous sequences of price relative vectors x t−1={x 1,…,x t−1}, and previous sequences of portfolio vectors b t−1={b 1,…,b t−1}. On the basis of these historical information, the portfolio manager computes a new portfolio vector b t for coming price relative vector x t . Note that without historical information, the initial portfolio is set to uniform. The resulting portfolio is evaluated by its portfolio daily return. This procedure is repeated until the end of the trading periods, and the portfolio is finally evaluated according to the portfolio cumulative wealth achieved. Figure 1 models the portfolio selection problem as a sequential decision problem.

Fig. 1
figure 1

Portfolio selection as a sequential decision problem

In the above portfolio selection model, we make several general assumptions:

  1. 1.

    Transaction cost: we assume no transaction cost or taxes exists in this portfolio selection model;

  2. 2.

    Market liquidity: we assume that one can buy and sell required quantities at last closing price of any given trading period;

  3. 3.

    Impact cost: we assume that market behavior is not affected by a portfolio selection strategy in our study.

3 Related work

In this section, we review some popular portfolio selection approaches, and some machine learning and trading philosophies that inspire the proposed approach.

3.1 Benchmark approaches

The most common baseline is Buy-And-Hold (BAH) strategy, that is, one invests his/her wealth among a pool of assets with an initial portfolio and holds the portfolio all the time. The BAH strategy with a uniform initial portfolio is referred to as uniform BAH strategy, which is adopted as market strategy producing the market index in our study. Contrary to the static BAH strategy, active trading strategies usually change portfolios regularly during the entire trading periods. A classical active strategy is Constant Rebalanced Portfolios (CRP) (Cover and Gluss 1986), which keeps a fixed fraction of a investor’s wealth in each underlying asset every trading day. The best possible CRP strategy is often called Best CRP (BCRP), which apparently is only a hindsight strategy. The CRP strategy can take advantage of market fluctuations for active trading, and its underlying idea is based on the mean reversion principle, or known as “Buy Low, Sell High”. To handle transaction cost issue for CRP strategy, Blum and Kalai (1999) proposed semi-CRP that partially balances between potential return and potential transaction cost and rebalances to initial portfolio at the end of any subset of the trading periods rather than every trading day.

3.2 Online learning

In this section, we briefly introduce the related work on online machine learning (Rosenblatt 1958; Crammer and Singer 2003; Cesa-Bianchi et al. 2004; Crammer et al. 2006; Fink et al. 2006) to have the learning inspiration for our work. Perceptron algorithm (Rosenblatt 1958; Freund and Schapire 1999) is one important online approach which updates the learning function by adding a new example with a constant weight when it is misclassified. Recently a number of online learning algorithms have been proposed based on the criterion of maximum margin (Li and Long 1999; Gentile 2001; Kivinen et al. 2001; Crammer and Singer 2003; Crammer et al. 2006; Zhao et al. 2011). For example, Relaxed Online Maximum Margin (ROMMA) (Li and Long 1999) algorithm repeatedly chooses the hyper-planes that correctly classify the existing training examples with the maximum margin. Passive Aggressive (PA) (Crammer et al. 2006) algorithm updates the classification function when a new example is misclassified or its classification score does not exceed some predefined thresholds. As empirical studies show, the maximum margin based online learning algorithms are generally more effectively than the Perceptron algorithm. In this article, we mainly adopt the idea of Passive Aggressive learning since it is suitable for our motivations as further illustrated in Sect. 4.1.

3.3 Learning to select portfolio

Learning to select portfolio has been extensively studied in information theory and machine learning. Generally, a strategy selects one optimal strategy (it can be market strategy, challenging BCRP strategy, or even Oracle strategy which chooses the best stock every trading day) and tries to obtain the same cumulative return. The regret of a strategy is defined as the gap between its logarithmic cumulative wealth achieved and that of the optimal strategy.

One important type of learning to select portfolio is regret minimization approach, which chooses BCRP strategy as the optimal strategy. Cover (1991) proposed Universal Portfolios (UP) strategy, where the portfolio is historical performance weighted average of all constant rebalanced portfolio experts. The regret achieved by Cover’s UP is O(mlogn), and its run time complexity is O(n m), where m denotes the number of stocks and n denotes the number of trading days. The implementation is exponential in the number of stocks and thus restricts the number of assets used in experiments and real applications. Kalai and Vempala (2002) presented a time-efficient implementation of Cover’s UP based on non-uniform random walks that are rapidly mixing, which requires poly running time O(m 7 n 8). Following their work, Cover and Ordentlich (1996) developed universal procedures when side informationFootnote 1 is taken into account as a finite number of values. Cross and Barron (2003) proposed a new universal portfolio strategy tracking the best in-hindsight wealth achievable within target classes of linearly parameterized portfolio sequences, which are more general than the standard CRP class and permit the portfolio to display a continuous form of dependence on past prices or other side information. Belentepe (2005) presented a statistical view of Cover’s UP, showing that it is approximately equivalent to a constrained sequential portfolio optimization, which connects Cover’s UP with traditional mean-variance portfolio theory.

Another famous strategy is Exponential Gradient (EG) strategy (Helmbold et al. 1997, 1996) for online portfolio selection using multiplicative updates. In general, EG strategy tries to maximize the expected logarithmic portfolio daily return (approximated using the last price relative), and minimize the deviation between next portfolio and last portfolio. The regret achieved by EG is O(\(\sqrt{n \log m}\)) with O(mn) running time. While its regret is not as tight as Cover’s UP, its linear time complexity is substantially less than the latter.

Recently, convex optimization has been applied to resolve the portfolio selection problem (Agarwal et al. 2006; Agarwal and Hazan 2005; Hazan 2006; Hazan et al. 2007). Examples include Online Newton Step (ONS) strategy (Agarwal et al. 2006), which aims to maximize the expected logarithmic cumulative wealth (approximated using historical price relatives) and to minimize the variation of the expected portfolio. ONS exploits the second order information of the log wealth function and applies it to the online scenario. It theoretically achieves a regret of O(mlogn) which is the same as Cover’s UP, and has running time complexity of O(m 3 n). Following ONS, Hazan and Seshadhri (2009) recently proposed a new adaptive-regret approach with more decent theoretical results, which essentially is an ONS based strategy.

Another promising direction for portfolio selection is wealth maximization approach, which is based on the notion of approaching the Oracle as the optimal strategy. This idea was followed by Borodin et al. (2004) in their proposal of a non-universal portfolio strategy named Anti-Correlation (Anticor). Unlike the regret minimization approaches, Anticor strategy takes advantage of the statistical properties of financial market. The underlying motivation is to bet on the consistency of positive lagged cross-correlation and negative autocorrelation. It exploits the statistical information from the historical stock price relatives and adopts the classical mean reversion trading idea to transfer the wealth in the portfolio. Although it does not provide any theoretical guarantee, its empirical results (Borodin et al. 2004) showed that Anticor can outperform all existing strategies in most cases. Unlike the greedy algorithm by the Anticor strategy, Li et al. (2011b) very recently proposed Confidence Weighted Mean Reversion (CWMR) strategy to actively exploit the mean reversion property and the second order information of a portfolio, which produces better performance than Anticor.

In addition, Györfi et al. (2006) recently introduced a framework of Nonparametric Kernel-based Moving Window (BK) learning strategies for portfolio selection based on nonparametric prediction techniques (Györfi and Schäfer 2003). Their algorithm first identifies a list of similar historical price relative sequences whose Euclidean distances with recent market windows are smaller than a threshold, then optimizes the portfolio with respect to the list of similar sequences. Under the same framework, Györfi et al. (2007) proposed another variant called Nonparametric Kernel-based Semi-log-optimal strategy, which is actually an approximation of the BK strategy, mainly to improve the computational efficiency. Replacing log utility function by Markowitz-type utility function, Ottucsák and Vajda (2007) proposed Nonparametric Kernel-based Markowitz-type strategy, which connects return and risk (or mean and variance) with the online portfolio selection strategy. Following the same framework as BK strategy, Nonparametric Nearest Neighbor learning (BNN) strategy proposed in Györfi et al. (2008) aims to search for the nearest neighbors in historical price relative sequences rather than search price relatives within a specified Euclidean ball. This method has been empirically shown to be a robust trading strategy. Along this direction, Li et al. (2011a) recently proposed Correlation-driven Nonparametric learning (CORN) strategy to search for similar price relatives via correlation coefficient and considerably boosted the empirical performance of nonparametric learning approach.

Besides the main stream of learning to select portfolio, another type of trading strategy is based on switching between various strategies, that is, maintaining a probability distribution among the strategies. Singer (1997) proposed Switching Portfolios (SP), which aims to deal with changing markets by taking into account the possibility that the market changes its behavior after each trading day. It switches among a set of basic investment strategies and assumes the a priori duration of using one basic strategy is geometrically distributed. Levina and Shafer (2008) proposed Gaussian Random Walk (GRW) strategy, which is a Markov switching strategy. GRW switches among the basic investment strategies as a Gaussian random walk in the simplex of portfolios.

Last, we note that our work is very different from another great body of existing work in literature (Kimoto et al. 1993; Tay and Cao 2001; Cao and Tay 2003; Tsang et al. 2004; Lu et al. 2009), which attempted to make financial time series forecasting and stock price predictions by applying machine learning techniques, such as neural networks (Kimoto et al. 1993), decision trees (Tsang et al. 2004), and support vector machines (SVM) (Tay and Cao 2001; Cao and Tay 2003; Lu et al. 2009), etc. The key difference between these work and ours is that their learning goal is to make explicit predictions of future prices/trends while our learning goal is to directly optimize portfolio without predicting prices explicitly.

3.4 Analysis of existing work

One popular trading idea in reality is trend following or momentum strategy, which assumes that historically better-performing stocks would still perform better than others in future. Some existing algorithms, such as EG and ONS, approximate the expected logarithmic daily return and logarithmic cumulative return respectively using historical price relatives. Though this idea is easy to understand and makes fortunes to many of the best traders and investors in the world, trend following is very hard to implement effectively. In addition, in the short-term, the stock price relatives may not follow previous trends as empirically evidenced by Jegadeesh (1990) and Lo and MacKinlay (1990).

Besides the trend following approach, another widely adopted approach in the learning community is mean reversion (Cover and Gluss 1986; Cover 1991; Borodin et al. 2004), which is also termed as contrarian approach. This approach stems from the CRP strategy (Cover and Gluss 1986), which rebalances to the initial portfolio every trading day. The idea behind this approach is that if one stock performs worse than others, it tends to perform better than others in the next trading day. As a result, the defining characteristic of a contrarian strategy is the purchase of securities that have performed poorly in the past and the sale of securities that have performed well, or quite simply, “Sell the Winner, Buy the Loser”. According to Lo and MacKinlay (1990), the effectiveness of mean reversion is a consequence of positive cross-autocovariances across securities. Among existing algorithms, CRP, UP, and Anticor adopt this trading idea. However, CRP and UP passively revert to the mean, while empirical evidence from Anticor algorithm (Borodin et al. 2004) shows that active reversion to the mean may better exploit the fluctuation of financial markets and is likely to obtain a much higher profit. On the other hand, although Anticor actively reverts to the mean, it is a heuristic method based on statistical correlations to transfer the wealth within the portfolio. In other words, it may not effectively exploit the mean reversion property.

In between, pattern matching based nonparametric learning algorithms (BK and BNN, etc.) can identify many market conditions including both mean reversion and trend following. However, when locating similar price relatives, the nonparametric learning approaches may locate both mean reversion and trend following price relatives, whose patterns are essentially opposite, thus weakening the maximization of the expected cumulative wealth.

In a word, both trend following and mean reversion can generate profit in the financial markets, if appropriately used. In the following, we will propose an active mean reversion based portfolio selection method. Though simple in update rules, it empirically outperforms the above existing portfolio selection strategies in most cases. The success of the proposed portfolio selection strategy indicates that it appropriately takes advantage of the mean reversion trading idea and generates significantly high profits in the back tests with real market data.

4 Passive aggressive mean reversion approach for portfolio selection

4.1 Intuition and overview

The proposed approach is motivated by Constant Rebalanced Portfolios (Cover and Gluss 1986), which adopts the mean reversion trading idea. A simple but convincing example showing the mean reversion idea is illustrated in Table 1. Consider a fluctuating market with two stocks (A, B), and the stock price relative sequence is \((\frac{1}{2}, 2 ), (2, \frac{1}{2} ), \ldots\) , where each stock is not going anywhere but actively moving within a range. Obviously, in a long-term period, market strategy cannot achieve any abnormal return from this sequence since the cumulative wealth of each stock remains the same after 2n trading days. However, Best CRP in hindsight can achieve a growth rate of \((\frac{5}{4})^{n}\) for a n-trading period. Now let us analyze the BCRP strategy on the stock price relative sequence to show the underlying mean reversion trading idea. Suppose the initial portfolio is \((\frac{1}{2},\frac{1}{2} )\) and at the end of the 1st trading day, the closing price adjusted wealth distribution becomes \((\frac{1}{5}, \frac{4}{5} )\) with corresponding cumulative wealth increasing by a factor of \(\frac{5}{4}\). At the beginning of the 2nd trading day, portfolio manager rebalances the portfolio to initial portfolio \((\frac{1}{2},\frac{1}{2} )\) by transferring the wealth from better-performing stock (B) to worse-performing stock (A) in the previous trading day. At the beginning of the 3rd trading day, the wealth transfer with the mean reversion trading idea continues. Although the market strategy gains nothing, BCRP can achieve a growth rate of \(\frac{5}{4}\) per trading day using the mean reversion trading idea, which assumes that if one stock price performs worse, it tends to perform better in the subsequent trading day.

Table 1 Motivating example of CRP to show the mean reversion trading idea

Another motivation of the proposed PAMR algorithm is inspired by the fact that in financial crisis, all stocks drop synchronously or certain stocks drop significantly. Under these situations, actively rebalancing may not be appropriate since it puts too much wealth on “mine” stocks, such as Bear Stearns during the recent financial crisis. To avoid the potential risk concerning such “mine” stocks, it is a good choice to stick to the previous portfolio, which constitutes the CRP strategy. Here, the reason to choose the passive CRP strategy is that identifying these “mine” stocks a priori is almost impossible, which are usually known in hindsight. Thus, to avoid suffering too much from such situations, PAMR alternates the strategy between “aggressive” and “passive” reversion depending on the market conditions. The passive mean reversion strategy avoids the high risk of the aggressive approach that would put almost all wealth on these “mine” stocks when they drop significantly.

In this article, we propose a novel trading strategy named “Passive Aggressive Mean Reversion”, or PAMR for short. On the one hand, the underlying assumption of our approach is that better-performing stocks would perform worse than others in the next trading day. On the other hand, if the market drops too much, we would stop actively rebalancing the portfolio to avoid certain “mine” stocks and their associated risk. In order to exploit these intuitions, we suggest to adopt Passive Aggressive (PA) online learning (Crammer et al. 2006), which was originally proposed for classification tasks. Loosely speaking, the basic idea of PA for classification is that it passively keeps previous solution if loss is zero, while it aggressively updates the solution whenever the suffering loss is nonzero.

Let us now describe the basic idea of the proposed strategy in detail. Firstly, if the portfolio daily return is below a certain threshold, we will try to keep the previous portfolio such that it passively reverts to the mean to avoid the potential “mine” stocks. Secondly, if the portfolio daily return is above the threshold, we will actively rebalance the portfolio to ensure that the expected portfolio daily return is below the threshold in the belief that the stock price relatives will revert in the next trading day. This sounds a bit counter-intuitive, but it is indeed reasonable, because if the stock price relative reverts, keeping the expected portfolio daily return below the threshold is able to maintain a high portfolio daily return in the next trading day. Here, the expected portfolio return is calculated with respect to the historical price relatives, for example, in our study, the last price relative, which is consistent with EG algorithm (Helmbold et al. 1997, 1996).

To further illustrate why aggressive reversion to the mean can be more effective than a passive one, let us continue the example in Table 1 that has a market going to nowhere but actively fluctuating. We show that in such markets, the proposed strategy is much more powerful than BCRP in hindsight, a passive mean reversion trading strategy. Table 2 compares the two trading strategies. As the motivating example shows, the growth rate of BCRP is \((\frac{5}{4})^{n}\) for a n-trading period, while at the same time, the growth rate of the proposed PAMR strategy is \(\frac{5}{4}\times(\frac{3}{2})^{n-1}\) (the details of the calculation/algorithm will be presented later). We intuitively explain the success of PAMR below.

Table 2 Motivating example of comparison between BCRP and PAMR strategy

Assume the threshold for PAMR update is set to 1, that is, if portfolio daily return is below 1, we do nothing but keep the existing portfolio. Our strategy begins with a portfolio \((\frac{1}{2}, \frac {1}{2} )\). For the 1st trading day, the return is \(\frac{5}{4}>1\). Then at the beginning of the 2nd trading day, we rebalance the portfolio to satisfy the condition that approximate portfolio daily return based on last price relatives is below the threshold 1, and the resulting portfolio is \((\frac {2}{3}, \frac{1}{3} )\). Although it seems that we build a portfolio such that the approximate portfolio return is below the threshold, in practice, as the reversion to the mean suggests, we are maximizing the portfolio return in the next trading day. As we can observe, the return for the 2nd trading day is \(\frac{3}{2}>1\). Then following the same rule, we will rebalance the portfolio to \((\frac{1}{3}, \frac{2}{3} )\). As a result, in such a market, the growth rate of the proposed strategy is \(\frac{5}{4}\times (\frac{3}{2} )^{n-1}\) for a n-trading period, which is much more superior to that of BCRP, that is, \((\frac{5}{4} )^{n}\).

4.2 Formulations

Now we shall formally devise the proposed Passive Aggressive Mean Reversion (PAMR) strategy for portfolio selection problem. The PAMR strategy is based on the mean reversion idea as described in Sect. 4.1, and is equipped with Passive Aggressive (PA) online learning technique (Crammer et al. 2006).

First of all, given a portfolio vector b and a price relative vector x t , we define a ϵ-insensitive loss function for the tth trading day as

$$ \ell_{\epsilon} ({\mathbf{b}}; {\mathbf{x}_{t}} )= \begin{cases}0 & {\mathbf{b}}\cdot{\mathbf{x}_{t}} \leq\epsilon\\{\mathbf{b}}\cdot{\mathbf{x}_{t}} - \epsilon & \mathrm{otherwise},\end{cases} $$
(1)

where ϵ≥0 is the sensitivity parameter which controls the mean reversion threshold. Since typically portfolio daily return fluctuates around 1, we often empirically choose ϵ≤1 in order to buy worse performing stocks. The ϵ-insensitive loss is zero when return is less than the reversion threshold ϵ, and otherwise grows linearly with respect to the daily return. For conciseness, let us use \(\ell_{\epsilon}^{t}\) to denote ϵ (b;x t ), that is, the ϵ-insensitive loss of the tth trading day. By defining this loss function, we can distinguish the two motivating cases described in Sect. 4.1.

In the following parts, we will formulate three variants of the proposed strategy, and will propose specific algorithms to solve them in the subsequent section. Recalling that b t denotes the portfolio vector on the tth trading day, the first proposed method for Passive Aggressive Mean Reversion (PAMR) is formulated as the constrained optimization below:

Optimization Problem 1

(PAMR)

$$ {\mathbf{b}}_{t+1}=\mathop{\arg\min }_{{\mathbf{b}}\in{\Delta}_{m}}\frac{1}{2}\Vert {\mathbf{b}} - {\mathbf{b}}_{t}\Vert ^{2}\quad\mathrm{s.t.}\quad \ell_{\epsilon} ({\mathbf{b}};{\mathbf{x}}_{t} ) =0.$$
(2)

The above formulation attempts to find an optimal portfolio by minimizing the deviation from last portfolio b t under the condition of satisfying the constraint of zero loss. On the one hand, the above approach passively keeps the last portfolio, that is, b t+1=b t whenever \({\ell}_{\epsilon}^{t} = 0\) that means the portfolio daily return is below the threshold ϵ. On the other hand, whenever the loss is nonzero, it aggressively updates the solution by forcing it to strictly satisfy the constraint ϵ (b t+1;x t )=0. It is clear that this formulation is able to address the two motivations.

Although the above formulation is reasonable to address our concerns, it may have some undesirable properties in situations with noisy price relatives, which are common in real-word financial markets. For example, a noisy price relative appearing in some trending sequences may suddenly change the portfolio in a wrong direction due to the aggressive update. To avoid such problems, we propose two variants of PAMR that are able to trade off between aggressiveness and passiveness. The idea of formulating the two PAMR variants is similar to soft margin support vector machines by introducing some non-negative slack variables into optimization. Specifically, for the first variant, we modify the objective function by introducing a term that scales linearly with respect to ξ, which results in the following optimization:

Optimization Problem 2

(PAMR-1)

$$ {\mathbf{b}}_{t+1}=\mathop{\arg\min }_{{\mathbf{b}}\in\Delta_{m}}\frac{1}{2}\Vert {\mathbf{b}} - {\mathbf{b}}_{t}\Vert ^{2}+C\xi\quad\mathrm{s.t.} \quad \ell_{\epsilon} ({\mathbf{b}};{\mathbf{x}}_{t} ) \leq\xi\ \mathrm{and} \ \xi\geq0,$$
(3)

where C is a positive parameter to control the influence of the slack variable term on the objective function. We refer to this parameter as the aggressiveness parameter similar to PA learning (Crammer et al. 2006) and call this variant “PAMR-1”.

Instead of using a linear term of slack variable, in the second variant, we modify the objective function by introducing a slack variable term that scales quadratically with respect to ξ, which results in the following optimization problem:

Optimization Problem 3

(PAMR-2)

$$ {\mathbf{b}}_{t+1}=\mathop{\arg\min }_{{\mathbf{b}}\in\Delta_{m}}\frac{1}{2}\Vert {\mathbf{b}} - {\mathbf{b}}_{t}\Vert ^{2}+C\xi^{2}\quad\mathrm{s.t.} \quad \ell_{\epsilon} ({\mathbf{b}};{\mathbf{x}}_{t} ) \leq\xi.$$
(4)

Note that in the above formulation we do not need to enforce the constraint ξ≥0 as ξ 2 is always non-negative. We refer to this variant as “PAMR-2”.

4.3 Algorithms

We now derive the approximate solutions for the above three PAMR formulations using standard techniques from convex analysis (Boyd and Vandenberghe 2004), and present the proposed PAMR algorithms for portfolio selection task. Specifically, the following three propositions summarize the solutions to the PAMR methods.

Proposition 1

The solution to the Optimization Problem 1 (PAMR) without considering the non-negativity constraint (b⪰0) is expressed as:

$$ {\mathbf{b}}={\mathbf{b}}_{t}-\tau_{t} ({{\mathbf{x}}_{t}-\bar {{{x}}}_{t}{\mathbf{1}}} ),$$
(5)

where \(\bar{x}_{t}=\frac{\mathbf{x}_{t}\cdot\mathbf{1}}{m}\) denotes the market return, and τ t is computed as:

$$ \tau_{t}=\max \biggl\{ 0, \frac{{\mathbf{b}}_{t} \cdot {\mathbf{x}}_{t}-\epsilon}{\Vert {\mathbf{x}}_{t} - \bar{{x}}_{t} {\mathbf{1}} \Vert ^{2}} \biggr\}.$$
(6)

Proof

The proof can be found in Appendix A. □

Proposition 2

The solution to the Optimization Problem 2 (PAMR-1) without considering the non-negativity constraint (b⪰0) is expressed as:

$${\mathbf{b}}={\mathbf{b}}_{t}-\tau_{t} ({\mathbf{x}_{t}-\bar {x}_{t}{\mathbf{1}}} ),$$

where \(\bar{x}_{t}=\frac{\mathbf{x}_{t}\cdot\mathbf{1}}{m}\) denotes the market return, and τ t is computed as:

$$ \tau_{t}=\max \biggl\{0, \min \biggl\{C, \frac{{\mathbf{b}}_{t} \cdot {\mathbf{x}}_{t}-\epsilon}{\Vert {\mathbf{x}}_{t} - \bar{{{x}}}_{t} {\mathbf{1}}\Vert ^{2}} \biggr\}\biggr\}.$$
(7)

Proof

The proof can be found in Appendix B. □

Proposition 3

The solution to the Optimization Problem 3 (PAMR-2) without considering the non-negativity constraint (b⪰0) is expressed as:

$${\mathbf{b}}={\mathbf{b}}_{t}-\tau_{t} ({\mathbf{x}}_{t}-\bar {x}_{t}{\mathbf{1}} ),$$

where \(\bar{{x}}_{t}=\frac{\mathbf{x}_{t}\cdot{\mathbf{1}}}{m}\) denotes the market return, and τ t is computed as:

$$ \tau_{t}=\max \biggl\{0, \frac{{\mathbf{b}}_{t} \cdot {\mathbf{x}}_{t}-\epsilon}{{ \Vert {\mathbf{x}_{t}} -\bar{{{x}}}_{t} {\mathbf{1}}\Vert ^{2}} + \frac{1}{2C}} \biggr\}.$$
(8)

Proof

The proof can be found in Appendix C. □

Figure 2 summarizes the details of the proposed PAMR algorithms. Firstly, with no historical information, the initial portfolio is set to uniform portfolio \({\mathbf{b}}_{1}= (\frac{1}{m}, \ldots, \frac{1}{m} )\). At the beginning of tth trading day, we rebalance according to the portfolio determined at the end of last trading day. At the end of tth trading day, the market reveals a stock price relative vector, which represents the stock price movements. Since both the portfolio and the stock price relatives are already known, portfolio manager is able to measure the portfolio daily return b t x t and the suffering loss ϵ (b t ;x t ) as defined in (1). Then, we calculate an optimal step size τ t based on last portfolio and stock price relatives. Given the optimal step size τ t , we can update the portfolio for next trading day. Finally, we perform a normalization step to obtain the final portfolio by projecting the updated portfolio into the simplex domain.

Fig. 2
figure 2

The proposed Passive Aggressive Mean Reversion (PAMR) strategies

4.4 Analysis and interpretation

To reflect the mean reversion trading idea, we are interested in analyzing the resulting update rules of the proposed PAMR algorithms, which mainly involve the portfolio b t+1 and the step size τ t . In particular, we want to examine how the update rules are related to return and risk—the two most important concerns in a portfolio selection task.

First of all, we analyze the resulting portfolio update rule in (5) for the three PAMR algorithms, that is, \(\mathbf{b}_{t+1} = \mathbf{b}_{t}-\tau_{t} ({\mathbf {x}_{t}-\bar{x}_{t}\mathbf{1}} )\). In the update rule, the step size τ t is non-negative, and \(\bar{{x}}_{t}\) is the mean return or market return. For term \({\mathbf{x}_{t}-\bar{x}_{t}\mathbf{1}}\), we can see it represents stock abnormal returns with respect to the market on the tth trading day. More precisely, we can interpret it as the directional vector for the weight transfer. The negative sign before the term indicates that the resulting update scheme is consistent with the motivation, that is, the weights shall be transferred from better performing stocks (with positive abnormal returns) to worse performing stocks (with negative abnormal returns) at the beginning of next day.

Besides, another important update is the step size τ t calculated as (6), (7), and (8), for three PAMR methods, respectively. The step size τ t adaptively controls the weights to be transferred by taking effect on the directional vector. One interesting term in common for the three updates of τ t is \(\frac{\ell_{\epsilon}^{t}}{\Vert \mathbf{x}_{t}-\bar{x}_{t}\mathbf {1}\Vert ^{2}}\). The numerator of the term equals to the tth portfolio daily return minus the mean reversion threshold. Assuming other variables are constant, if the return is high (low), it leads to a large (small) value of τ t , which would more (less) aggressively transfer the wealth from better performing stocks to worse performing stocks. The denominator is essentially the market quadratic variability, that is, the number of stocks times the market variance of the tth trading day. In modern portfolio theory, variance of stock return is typically regarded as a volatility risk term for a portfolio (Markowitz 1952). As indicated by the denominator, if the risk is high (low), the step size τ t would become small (large). As a result of small (large) step size, the weight transfer made by the update scheme will be weakened (strengthened), which is consistent with our intuition that prediction would be not accurate in drastically dropping markets, and we opt to make relatively less transfer in order to reduce risk. Moreover, PAMR-1 caps the step size by a constant C, while PAMR-2 decreases the step size by adding a constant \(\frac{1}{2C}\) to its denominator. Both measures can prevent drastic weight transfer in case of noisy price relatives, which is consistent with their motivations.

From the above analysis on the updates of direction and step size, we can conclude that PAMR nicely balances between return and risk and clearly reflects the mean reversion trading idea. To the best of our knowledge, this important trade-off between return and risk has been considered by only one existing approach, that is, nonparametric kernel-based Markowitz-type strategy (Ottucsák and Vajda 2007). While the kernel-based Markowitz-type strategy trades off the return and risk with respect to similar historical price relatives, the proposed PAMR explicitly trades off the return and risk with respect to last price relatives. This nice property distinguishes the proposed approach from most existing approaches that often cater to return, but ignore the risk concern, and are therefore undesirable according to modern portfolio theory (Markowitz 1952).

Now let us briefly analyze the time complexity of the proposed PAMR algorithms. From Fig. 2, we can see that besides the normalization step, PAMR strategy takes O(m) per trading day, where m denotes the number of assets. Moreover, the normalization or projection step (Step 7 in Fig. 2) can be efficiently implemented (Michelot 1986; Duchi et al. 2008). In our implementation, we adopt the projectionFootnote 2 according to Duchi et al. (2008), which takes linear time with respect to m. Thus, the total time complexity is O(mn), where n is the total number of trading days. Such time complexity is the same as that of EG algorithm and is much superior to other existing methods. Linear time complexity enables the proposed algorithm to handle transactions in certain scenarios where low latency is of crucial importance, such as high frequency trading (Aldridge 2009).

4.5 Discussions

4.5.1 Discussion on intuitions

Although the motivating example in Sect. 4.1 demonstrates the effectiveness of PAMR over BCRP strategy, PAMR may not always outperform BCRP. In general, PAMR is an online algorithm while BCRP is offline optimal for an i.i.d. market (see Cover and Thomas 1991, Theorem 15.3.1). Next, we discuss some possible situations where PAMR may fail to outperform BCRP.

Consider a special case where one stock crashes and the other explodes, e.g., a market sequence of two stocks as \((\frac{1}{2}, 2 ),(\frac{1}{2}, 2 ), \ldots\) . Assuming the same parameter settings as the motivating example, BCRP will increase at an exponential rate 2n as it wholly invests in the 2nd asset, while PAMR will keep a fixed wealth on \(\frac{5}{4}\) over the trading period. Obviously, in such a situation, PAMR performs much worse than BCRP does, i.e., PAMR produces a cumulative wealth of \(\frac{5}{4}\) against 2n achieved by BCRP over a n trading period. Though not shiny in such situations, PAMR still bounds its losses. Moreover, such a market, which violates the mean reversion assumption, is occasional, at least from the view point of our empirical studies.

4.5.2 Discussion on loss function

In our definition of loss function, that is, (1), we use the original portfolio expected return bx t , while it is possible to use log utility (Latané 1959) on the return, that is, log(bx t ). With this log utility, the optimization problems (2), (3), and (4) are all non-convex and nonlinear, and thus difficult to solve. One way to solve these non-convex optimization problems is to use log’s first-order Taylor expansion at last portfolio and ignore the higher order terms, that is, \(\log (\mathbf{b}\cdot\mathbf{x}_{t}) \approx\log (\mathbf{b}_{t}\cdot\mathbf{x}_{t})+\frac{\mathbf{x}_{t}}{\mathbf{b}_{t}\cdot\mathbf{x}_{t}} (\mathbf {b}-\mathbf{b}_{t} )\). After linear approximation, the optimization problems can be solved using the same techniques used in our derivation. However, such linear approximation of loss function may have some drawbacks. First of all, linear approximation yields a upper bound on regret in terms of a log utility loss function. There is no way to justify the goodness of the linear approximation. Moreover, if we use log utility, then the loss function is flat, then sharply rises and finally flattens out. While linear approximation is good in the two flat regimes, it is typically terrible at the point of non-differentiability and sub-par in the sharply rising region.

On the other hand, for the loss function in form of (1) without log utility or with linear approximation of log utility, the best possible regret in a minimax sense is at most O(\(\sqrt{n}\)) (Abernethy et al. 2009), while true log loss minimization algorithm can routinely achieve O(logn). However, although our loss function is non-differentiable and it would achieve a potential regret of O(\(\sqrt{n}\)), it is not a traditional loss function maximizing return (like traditional loss function, −log(bx t )), but only a tool to realize mean reversion. Thus the regret achieved using our loss function does not represent a regret about return, which may not be meaningful as traditional regret bound is.

Anyway, the potential worse bound may have unknown weaknesses, which may not be elicited by the following empirical evaluations. Though on our experiments PAMR works well, anyone who cares about its theoretical aspects should be notified about the possible worse bound.

4.5.3 Discussion on formulation

Although our formulations mainly focus on the portfolio daily return without explicitly dealing with risk (e.g., volatility of daily returns), the final derived algorithms can be nicely interpreted as certain trade-offs between risk and return, as discussed in Sect. 4.4. Such interesting observation is further verified by our empirical evaluation in Sect. 5.4.2, which shows that the proposed PAMR algorithms achieve good risk-adjusted return in terms of two risk-related metrics (i.e., volatility risk and drawdown risk, respectively).

Similar to previous studies, we avoid incorporating transaction cost in the original formulations, which simplifies the formulations and clearly highlights PAMR’s key ingredients. To further show the impact of transaction costs, it is not difficult to evaluate the effect of transaction costs, as shown in Sect. 5.2.2. In the following empirical study, we present results on both cases: with and without transaction costs. From the empirical results in Sect. 5.4.5, we find that in most markets, the proposed PAMR algorithms work well without or even with moderate transaction costs.

Besides, it is important to note that there are two key parameters in the proposed PAMR algorithm and its variants, viz., the sensitivity parameter ϵ and the aggressiveness parameter C. In practice, the choice of these parameters could affect the performance of the proposed algorithms. To achieve a good performance in a specific market, these parameters have to be finely tuned. We will thoroughly examine the effects of the two parameters on real-life datasets in Sect. 5.4.4, and make suggestions for the empirical selection of their values.

4.5.4 Discussion on PAMR variants

In this section, we will show an example to illustrate different behaviors of the three update rules, viz., PAMR, PAMR-1, and PAMR-2. As discussed in Sect. 4.2, one objective for PAMR-1 and PAMR-2 is to prevent the portfolio being affected too much from noisy price relatives, which might drastically change the portfolio. Let us assume the environments and parameter settings as follows. Let the tth price relative x t =(1.00,0.01), which represents the situations that the 2nd price relative is a noise, and the tth portfolio b t =(1,0). Setting the parameters ϵ=0.30 and C=1.00, let us calculate next portfolio b t+1. This market environment describes the situations where certain price relatives drop significantly, which is similar to some stocks during recent financial crisis. Without tuning, the original PAMR algorithm would transfer a large proportion of wealth to the 2nd asset in the next trading day. This can be verified by examining the portfolio calculated by PAMR, viz., PAMR calculates the update step size τ t =1.43 and obtains the subsequent portfolio b t+1=(0.29,0.71). However, a natural choice of avoiding such noisy price relatives is to put less proportion of wealth to the second asset. Now, when calculating the next portfolios by PAMR-1 and PAMR-2, we obtain the update step size τ t =1.00 and τ t =0.71, respectively, which are smaller than the update step size of the original PAMR, that is, τ t =1.43. Accordingly, we obtain the next portfolios b t+1=(0.50,0.50) and b t+1=(0.65,0.35) for PAMR-1 and PAMR-2, respectively. Clearly, PAMR-1 and PAMR-2 transfer less wealth to the 2nd asset than the original PAMR does. Thus, PAMR-1 and PAMR-2 in general suffer relatively less from noisy price relatives, though we cannot completely avoid such suffering situation.

4.6 Mixture algorithm

One theoretical result desired by existing online portfolio selection algorithms is universal property (Cover 1991). Since mean reversion trading idea is counter-intuitive (Borodin et al. 2004), we find it is hard to prove the universality of PAMR. Alternatively, we present a general mixture algorithm, which guarantees worst-case performance, not for PAMR itself but for the mixture algorithm.

Briefly speaking, the proposed mixture algorithm frames PAMR as one “expert” in a mixture-of-experts setting, while at least one universal algorithm serves as other “experts”. Then, the proposed mixture adopts no-regret expert learning (Cesa-Bianchi and Lugosi 2006) to bound the regret of the overall system with respect to the best of these experts. If the mixture algorithm contains at least one universal algorithm,Footnote 3 then the universality of the mixture algorithm can be straightforwardly proved according to Cesa-Bianchi and Lugosi (2006) (see Example 10.3 and Theorem 10.3 for rigorous proofs). In our implementation, we adopt uniform buy and hold (BAH) mixture strategy, that is, we give equal proportion of portfolio wealth to each expert, let them run, and finally pool them again. We denote the BAH mixture algorithm as “MIX”. Other expert learning methods, such as exponential weighted, can also replace the buy and hold strategy, and they can also provide provable guarantees and get potentially stronger empirical performance. Though MIX seems trivial since it has a more involved mixing rule, one can make it nontrivial by extending the setting in a more general setting, such as the framework proposed by Akcoglu et al. (2002) and Das and Banerjee (2011). Obviously, such a mixture algorithm can be applied to any portfolio selection algorithm, either universal or not.

Though it is convenient to propose a mixture model consisting of PAMR such that the mixture model can achieve universality, PAMR’s universal consistency is still an open question and deserves further exploration.

5 Numerical experiments

To examine the empirical efficacy of the proposed PAMR strategy, we conduct an extensive set of numerical experiments on a variety of real datasets. In our experiments, we adopt six real datasets, which were collected from several diverse financial markets. The performance metrics include cumulative wealth and risk-adjusted returns (volatility risk and drawdown risk). We also compare the proposed PAMR algorithms with all existing algorithms stated in the related work section.

5.1 Experimental testbed on real data

In this study, we focus on historical daily prices in stock markets which are easy to obtain from public domains (such as Yahoo Finance and Google Finance), and thus publicly available to other researchers. Data from other types of markets, such as high frequency intra-day quotes and Forex markets, are either too expensive or hard to obtain and process, and thus may reduce the experimental reproducibility. In general, we employ six real and diverse datasets from several types of financial markets,Footnote 4 which are summarized in Table 3.

Table 3 Summary of the six real datasets in our numerical experiments

The first one is NYSE dataset, one “standard” dataset pioneered by Cover (1991) and followed by several other researchers (Singer 1997; Helmbold et al. 1996; Borodin et al. 2004; Agarwal et al. 2006; Györfi et al. 2006, 2008). This dataset contains 5651 daily price relatives of 36 stocksFootnote 5 in New York Stock Exchange (NYSE) for a 22-year period from Jul. 3rd 1962 to Dec. 31st 1984. We denote this dataset by “NYSE (O)” for short.

The second dataset is the extended version of the above NYSE dataset. For consistency, we collected the latest data in New York Stock Exchange (NYSE) from Jan. 1st 1985 to Jun. 30th 2010, which consists of 6431 trading days. We denote this new dataset as “NYSE (N)”.Footnote 6 It is worth noting that this new dataset consists of 23 stocks rather than the previous 36 stocks owing to amalgamations and bankruptcies. All self-collected price relatives are adjusted for splits and dividends, which is consistent with the previous “NYSE (O)” dataset.

The third dataset “TSE” is collected by Borodin et al. (2004), which consists of 88 stocks from Toronto Stock Exchange (TSE) containing price relatives of 1259 trading days, ranging from Jan. 4th 1994 to Dec. 31st 1998. The fourth dataset “SP500” is collected by Borodin et al. (2004), which consists of 25 stocks with the largest market capitalizations in the 500 SP500 components. It ranges from Jan. 2nd, 1998 to Jan. 31st 2003, containing 1276 trading days.

The fifth dataset is “MSCI”, a collection of global equity indices which are the constituents of MSCI World Index.Footnote 7 It contains 24 indices which represent the equity markets of 24 countries around the world, and consists of a total of 1043 trading days, ranging from Apr. 1st 2006 to Mar. 31st 2010. The final dataset is the “DJIA” dataset collected by Borodin et al. (2004), which consists of Dow Jones 30 composite stocks. DJIA contains 507 trading days, ranging from Jan. 14th 2001 to Jan. 14th 2003.

Besides the above six real market data, in the experiments, we also ran each dataset in their reverses (Borodin et al. 2004). For each dataset, we created a reversed dataset, which reverses the original order and inverts the price relatives. We denote these reverse datasets using a ‘−1’ superscript on the original dataset names. In nature, these reverse datasets are quite different from the original datasets, and we are interested in the behaviors of the proposed algorithm on these artificial datasets.

Unlike the previous studies, the above testbed covers much longer trading periods from 1962 to 2010 and much more diversified markets, which enables us to examine how the proposed PAMR strategy performs under different events and crises. For example, it covers several well-known events in the stock markets, such as dot-com bubble from 1995 to 2000 and subprime mortgage crisis from 2007 to 2009. The five stocks datasets are mainly chosen to test the capability of the proposed PAMR on regional stock markets, while the “MSCI” dataset aims to test PAMR’s capability on global indices, which may be potentially applicable to “Fund on Fund” (FOF).Footnote 8 As a remark, although we numerically test the PAMR algorithm on stock markets, we note that the proposed strategy could be generally applied to any type of financial markets.

5.2 Experimental setup and metrics

Regarding the parameter settings, there are two key parameters in the proposed PAMR algorithms. One is the sensitivity parameter ϵ and the other is the aggressiveness parameter C. Roughly speaking, the best values for these parameters are often dataset dependent. In the experiments, we simply set these parameters empirically without tuning for each dataset separately. Specifically, for all datasets and experiments, we set the sensitivity parameter ϵ to 0.5 in the three algorithms, and set the aggressiveness parameter C to 500 in both PAMR-1 and PAMR-2, with which the cumulative wealth achieved tends to be stable for the proposed PAMR on most datasets. It is worth noting that these choices for parameters are not always the best. Our experiments on the parameter sensitivity in Sect. 5.4.4 show that the proposed PAMR algorithms are quite robust with respect to different parameter settings.

For the proposed mixture algorithm (MIX), we set the expert poolFootnote 9 as initial uniform combination of PAMR, ONS, Anticor, and BNN, and individual experts are set according to their respective studies.

We adopt the most common metric, cumulative wealth, to primarily compare different trading strategies. In addition to the cumulative wealth, we also adopt annualized Sharpe Ratio (SR) to compare the performance of different trading algorithms. In general, the higher the values of the cumulative wealth, and the annualized Sharpe Ratio, the better the performance of the compared algorithm. Besides, we also adopt Maximum Drawdown (MDD) and Calmar Ratio (CR) for analyzing the downside risk of the PAMR strategy. The lower the MDD value, the more preferable the trading algorithm concerning the downside risk. The higher the CR value, the more performance efficient the trading algorithm concerning the downside risk. The performance criteria are detailed in the following section.

5.2.1 Performance criteria

One of the standard criteria to evaluate the performance of a strategy is portfolio cumulative wealth achieved by the strategy until the end of the whole trading period. In our study, we simply set the initial wealth S 0=1 and thus the notation S n also denotes portfolio cumulative return at the end of the nth trading day, which is the ratio of the portfolio cumulative wealth divided by the initial wealth. Another equivalent criterion is annualized percentage yield (APY) which takes the compounding effect into account, that is, \({\mathrm{APY}}=\sqrt[y]{\mathbf{S}_{n}}-1\), where y is the number of years corresponding to n trading days. APY measures the average wealth increment that one strategy could achieve compounded in a year. Typically, the higher the value of portfolio cumulative wealth or annualized percentage yield, the more performance preferable the trading strategy is.

For some process-dependent investors (Moody et al. 1998), it is important to evaluate risk and risk-adjusted return of portfolios (Sharpe 1963, 1994). One common way to achieve this is to use annualized standard deviation of daily returns to measure the volatility risk and annualized Sharpe Ratio (SR) to evaluate the risk-adjusted return. For portfolio risk, we calculate the standard deviation of daily returns, and multiply by \(\sqrt{252}\) (here 252 is the average number of annual trading days) to obtain annualized standard deviation. For risk-adjusted return, we calculate annualized Sharpe Ratio according to, \({\mathrm{SR}}=\frac{{\mathrm{APY}}-R_{f}}{\sigma_{p}}\), where R f is the risk-free return (typically the return of Treasury bills, fixed at 4% in this work), and σ p is the annualized standard deviation of daily returns. Basically, higher annualized Sharpe Ratios indicate better performance of a trading strategy concerning the volatility risk.

The investment community often analyzes DrawDown (DD) (Magdon-Ismail and Atiya 2004) to measure the decline from a historical peak in the cumulative wealth achieved by a financial trading strategy. Formally, let S(⋅) denote the process of cumulative wealth achieved by a trading strategy, that is, {S 1,…,S t ,…,S n }. The DrawDown at any time t, is defined as DD(t)=max[0,max i∈(0,t) S(i)−S(t)]. The Maximum DrawDown for a horizon n, MDD(n) is defined as, MDD(n)=max t∈(0,n)[DD(t)], which is an excellent way to measure the downside risk of different strategies. Moreover, we also adopt Calmar Ratio (CR) to measure the return relative of the drawdown risk of a portfolio, calculated as \({\mathrm{CR}} = \frac{\mathrm{APY}}{\mathrm{MDD}}\). Generally speaking, the smaller the Maximum DrawDown, the more downside risk tolerable the financial trading strategy. Higher Calmar Ratios indicate better performance of a trading strategy concerning the drawdown risk.

To test whether simple luck can generate the return of the proposed strategy, we can also conduct a statistical test to measure the probability of this situation, as is popularly done in the fund management industry (Grinold and Kahn 1999). First, we separate the portfolio daily returns into two components: one benchmark-related and the other non-benchmark-related by regressing the portfolio excess returnsFootnote 10 against the benchmark excess returns. Formally, s t s t (F)=α+β(s t (B)−s t (F))+ϵ(t), where s t stands for the portfolio daily returns, s t (B) denotes the daily returns of the benchmark (market index) and s t (F) is the daily returns of the risk-free assets (here we simply choose Treasury bill and set it to 1.000156, or equivalently, annual interest of 4%). This regression estimates the portfolio’s alpha (α), which indicates the performance of the investment after accounting for the involved risk. Then we conduct a statistical t-test to evaluate whether alpha is significantly different from zero, by using the t statistic \(\frac{\alpha}{\mathrm{SE} (\alpha )}\), where SE(α) is the standard error for the estimated alpha. Thus, by assuming the alpha is normally distributed, we can obtain the probability that the returns of the proposed strategy are generated by simple luck. Generally speaking, the smaller the probability, the higher confidence the trading strategy.

5.2.2 Practical issues

While our model described in Sect. 2 is concise and not complicate to understand, it omits some practical issues in the portfolio management industry. We shall now relax some constraints in our model to address these issues.

In reality, an important and unavoidable issue is transaction cost. Generally, there are two ways to handle the transaction costs. The first, commonly adopted by learning to select portfolio strategies, is that the portfolio selection process doesn’t take into account the transaction cost while the following rebalancing incurs transaction costs. The second is that the transaction cost is directly involved in the portfolio selection process (Györfi and Vajda 2008). In this work, we take the first way and adopt proportional transaction cost model proposed in Blum and Kalai (1999) and Borodin et al. (2004). To be specific, rebalancing the portfolio incurs a transaction cost on every buy and sell operation, based upon a transaction cost rate γ∈(0,1). At the beginning of the tth trading day, the portfolio manager rebalances the portfolio from the previous closing price adjusted portfolio \({\hat{\mathbf{b}}}_{t-1}\) to a new portfolio b t , incurring a transaction cost of \(\frac{\gamma}{2} \times \sum_{i}{\vert b_{(t,i)}-\hat{b}_{(t-1, i)}\vert }\), where the initial portfolio is set to (0,…,0). Thus, the cumulative wealth achieved by the end of the nth trading day can be expressed as:

$${\mathbf{S}}_{n}^{c (\gamma )}={\mathbf{S}}_0\prod_{t=1}^{n} \biggl[ ({\mathbf{b}}_t\cdot{\mathbf{x}}_t )\times \biggl(1-\frac{\gamma}{2} \times\sum_{i}{\big \vert b_{(t, i)}-\hat{b}_{(t-1,i)}\big \vert } \biggr) \biggr].$$

Another practical issue in portfolio selection is margin buying, which allows the portfolio managers to buy securities with cash borrowed from security brokers. Following previous studies (Cover 1991; Helmbold et al. 1996; Agarwal et al. 2006), we relax this constraint in the model and evaluate it empirically in Sect. 5.4.5. In this study, the margin setting is assumed to be 50% down and 50% loan, at an annual interest rate of 6%, so the interest rate of the borrowed money, c is set to 0.000238. Thus, for each security in the asset pool, a new asset named “Margin Component” is generated. Following the down and loan percentage, the price relative for the “Margin Component” of asset i would be 2∗x ti −1−c, where x ti is the price relative of the ith asset for the tth trading day. In cases of \(x_{ti}\leq\frac {1+c}{2}\), that is, certain stocks drop more than half, we simply set “Margin Component” to 0. By adding this “Margin Component”, we magnify both the potential profit and loss of the trading strategy on the ith asset.

5.3 Comparison approaches

In our experiments, we implement the proposed PAMR strategy and its two variants, viz., PAMR-1 and PAMR-2. We compare them with a number of benchmarks and existing strategies as described in Sect. 3. Below we summarize the list of compared algorithms, whose parameters are set according to the recommendations from their respective studies.

  1. 1.

    Market: Market strategy, that is, uniform Buy-And-Hold (BAH) strategy;

  2. 2.

    Best-Stock: Best stock in the market, which is a strategy in hindsight;

  3. 3.

    BCRP: Best Constant Rebalanced Portfolios strategy in hindsight;

  4. 4.

    UP: Cover’s Universal Portfolios implemented according to Kalai and Vempala (2002), where the parameters are set as δ 0=0.004, δ=0.005, m=100, and S=500;

  5. 5.

    EG: Exponential Gradient (EG) algorithm with the best parameter η=0.05 as suggested by Helmbold et al. (1996);

  6. 6.

    ONS: Online Newton Step (ONS) with the parameters suggested by Agarwal et al. (2006), that is, η=0, β=1, \(\gamma=\frac{1}{8}\);

  7. 7.

    SP: Switching Portfolios with parameter \(\gamma=\frac{1}{4}\) as suggested by Singer (1997);

  8. 8.

    GRW: Gaussian Random Walk strategy with parameter σ=0.00005 recommended by Levina and Shafer (2008);

  9. 9.

    M0: Prediction based algorithm M0 with parameter β=0.5 as suggested by Borodin et al. (2000);

  10. 10.

    Anticor: BAH30(Anticor(Anticor)) as a variant of Anticor to smooth the performance, which achieves the best performance among the three solutions proposed by Borodin et al. (2004);

  11. 11.

    BK: Nonparametric kernel-based moving window (BK) strategy with W=5, L=10 and threshold c=1.0 which has the best empirical performance according to Györfi et al. (2006);

  12. 12.

    BNN: Nonparametric nearest neighbor based strategy (BNN) with parameters W=5, L=10 and \(p_{\ell}=0.02+0.5\frac{\ell-1}{L-1}\) as the authors suggested (Györfi et al. 2008).

5.4 Experimental results

5.4.1 Experiment 1: evaluation of cumulative wealth

We first compare the performance of the competing approaches based on their cumulative wealth. From the experimental results shown in Table 4, we can draw several observations below.

Table 4 Cumulative wealth achieved by various trading strategies on the six datasets and their reversed datasets. The top two best results in each dataset are highlighted in bold font

First of all, we observe that learning to select portfolio strategies generally perform better than three common benchmarks, which shows that it is promising to investigate learning algorithms for portfolio selection. Second, we find that although the cumulative wealth achieved by the regret minimization approaches (UP, EG and ONS) is higher than market strategy, their performance is significantly lower than that achieved by the wealth maximization approaches (Anticor, BK and BNN). This shows that to achieve better investment return, it is more powerful and promising to exploit the wealth maximization approaches for portfolio selection. Third, from the top two results indicated on each original dataset, it is clear that the proposed PAMR strategy (PAMR, PAMR-1, and PAMR-2) significantly outperforms most (except DJIA datasets) competitors including Anticor, BK and BNN, which are the state of the arts. The encouraging results in cumulative wealth validate the importance of exploiting the mean reversion property in the financial markets by an effective online learning strategy. On the other hand, though MIX beats the benchmarks on the DJIA dataset, PAMR algorithms perform bad on the DJIA dataset. This may be attributed to the reason that the motivating mean reversion does not exist in this dataset. This raises an important question, “How to select the portfolio pool such that the motivating mean reversion exists on target portfolio?” Sect. 5.5.2 provides some discussions on this question.

Further examining the details, we find that the most impressive performance is achieved by PAMR on the standard NYSE (O) dataset, where its initial wealth grows by a factor of more than 5 quadrillion at the end of the 22-year period. We note that the main reason PAMR achieved such exceptional results is that it is powerful to exploit highly volatile price relatives. To verify this, we examine the detailed performance of PAMR in Table 4 by looking into individual stocks, and we find that it relies considerably on one single stock (“Kin Ark”) which has the highest volatility in terms of standard deviation. After removing this stock from the portfolio, we find that the cumulative wealth significantly reduces to 1.27E+08. We will investigate the volatility issue in more details by another experiment on dataset sensitivity in Sect. 5.4.3.

On the reverse datasets, though not performing as shiny as the original datasets, PAMR also performs well. Though some algorithms fail badly, in all cases, PAMR beats the benchmarks, including the market and BCRP strategies. In certain cases, it beats all competitors. It is worth noting these reverse datasets are artificial datasets, which never exist in real markets. PAMR’s performance on these datasets provides strong evidences that mean reversion does exist in even reverse market datasets and PAMR can successfully exploit it.

In addition to the final cumulative wealth, we are also interested in examining how the cumulative wealth changes over different trading periods. Figure 3 shows the trends of the cumulative wealth by the proposed PAMR algorithm and four algorithms (two benchmarks and two state-of-the-art algorithms). From the results, we can see that the proposed PAMR strategy consistently surpasses the benchmarks and the competing strategies over the entire trading period on most datasets (except DJIA dataset), which again validates the efficacy of the proposed technique.

Fig. 3
figure 3

Trends of cumulative wealth achieved by various strategies during the entire trading periods on the stock datasets

Finally, to measure whether the excess return can be simply obtained by luck, we conduct a statistical t-test as described in Sect. 5.2.1. Table 5 shows the statistical results, which clearly show that the observed excess return is impossible to obtain by simple luck in most datasets. To be specific, the probabilities for achieving the excess returns by luck are almost 0 on datasets except DJIA. However, the statistics on DJIA dataset show that in this dataset, the assumption of mean reversion may not exist. Nevertheless, the results show that the PAMR strategy is a promising and reliable portfolio selection technique to achieve high return with high confidence.

Table 5 Statistical t-test of the performance of the PAMR on the stock datasets

5.4.2 Experiment 2: evaluation of risk and risk-adjusted return

We now evaluate the risk in terms of volatility risk and drawdown risk, and the risk-adjusted return in terms of annualized Sharpe ratio and Calmar ratio. Figure 4 shows the evaluation results on the six datasets. In addition to the proposed PAMR, we also plot two benchmarks (Market and BCRP) and two state-of-the-art algorithms (Anticor and BNN) for comparison. As shown in Fig. 4, Figs. 4(a) and 4(b) depict the volatility risk (standard deviation of daily returns) and the drawdown risk (maximum drawdown) on the six stock datasets. Figures 4(c) and 4(d) compare the corresponding Sharpe ratio and Calmar ratio.

Fig. 4
figure 4

Risk and risk-adjusted performance of various strategies on the six different datasets. In each diagram, the rightmost bars represent the results achieved by PAMR

In previous cumulative wealth results, we find that PAMR achieved the highest cumulative return on most original datasets. Of course, high return is associated with high risk, which is commonly acceptable in finance, as no real financial instrument can guarantee a high return without risk. The volatility risk in Fig. 4(a) shows that PAMR almost achieves the highest risk in terms of volatility risk. On the other hand, the drawdown risk in Fig. 4(b) shows that PAMR achieves modest drawdown risk in most datasets. These results validate the above notion that high return is often associated with high risk.

To further evaluate the return and risk, we examine the risk-adjusted return in terms of annualized Sharpe ratio and Calmar ratio. The results shown in Figs. 4(c) and 4(d) clearly show that PAMR achieves excellent performance in most cases, except DJIA dataset. These encouraging results show that PAMR is able to reach a good trade-off between return and risk, even though we do not explicitly consider risk in our problem formulation.

5.4.3 Experiment 3: dataset sensitivity

As observed in Sect. 5.4.1, it is interesting that PAMR gained the excess return from the stock markets. In this section, we aim to examine how the dataset sensitivity affects the proposed PAMR strategy by evaluating performance on datasets of different volatilities.

To examine the effect of the dataset volatility, we create two datasets each consisting of 5 stocks, chosen from NYSE (N) dataset according to their volatility values. To be specific, we ranked the 23 stocks based on their daily volatility values measured by standard deviation of the logarithm of the price relatives (Hull 2008). Then we created two datasets of different volatility: NYSE (H) and NYSE (L), each consisting of 5 stocks of the highest and lowest volatility values, respectively. Table 6 shows the results achieved by various strategies on these two datasets.

Table 6 Cumulative wealth achieved by various strategies on portfolios of extreme volatilities. The “H/L ratiocolumn shows the ratio between the cumulative wealth achieved on the high-volatility dataset and that achieved on the low-volatility dataset

From the results, we find that different strategies perform diversely on these two datasets. The regret minimization approaches (UP, EG and ONS), perform well regardless of the market volatilities as the theoretical universal property shows, while the wealth maximization approaches (Anticor, BK and BNN) and the proposed PAMR strategy achieved significantly higher cumulative wealth on NYSE (H), the high-volatility dataset. These results show that the volatility of datasets does considerably affect some algorithms, including the wealth maximization approaches and the proposed PAMR strategy. Specifically, we find that the proposed PAMR strategy could benefit much from a high-volatility dataset. For example, on the NYSE (L) dataset, the cumulative wealth achieved by PAMR algorithm is about 132, which is significantly boosted to 1.35E+05 on the NYSE (H) dataset. To further examine which algorithm can benefit most from high-volatility dataset, we calculate the “H/L ratio” value, which is the ratio of cumulative wealth achieved on the high-volatility dataset over that achieved on the low-volatility dataset. From the ratios, we can observe that the PAMR strategy obtained the highest H/L ratio, indicating that PAMR can benefit most from the high-volatility dataset among all the competing methods.

5.4.4 Experiment 4: parameter sensitivity

We now evaluate how different choices of parameters affect the performance of the proposed PAMR strategy. All three PAMR algorithms require to set sensitivity parameter ϵ, while aggressiveness parameter C is needed for PAMR-1 and PAMR-2.

First, we examine the effect of the sensitivity parameter ϵ on the cumulative wealth achieved by PAMR. As ϵ becomes greater than 1, PAMR degrades to uniform CRP strategy and the wealth stabilizes at the wealth achieved by uniform CRP. Thus, we evaluate the effect of ϵ in the range of [0,1.5]. Figure 5 shows the cumulative wealth achieved by PAMR with varying ϵ and those of the two benchmarks, that is, Market and BCRP strategies. Most results, besides DJIA dataset, show that the cumulative wealth achieved by PAMR grows as ϵ approaches 0, that is, the more sensitive the higher the wealth, which validates that the motivating mean reversion does exist on the stock markets. Moreover, in most cases, the cumulative wealth achieved by PAMR tends to stabilize as ϵ crosses certain dataset dependent thresholds. As stated before, we choose ϵ=0.5 in the experiments, with which the cumulative wealth becomes stabilized in most cases. We also note that on some datasets PAMR with ϵ=0 achieves the best. Though ϵ=0 means moving more weights to the worse performing stocks, it may not mean moving everything to the worst stock. On the one hand, the objectives in the formulations would prevent next portfolio far from last portfolio. On the other hand, PAMR-1 and PAMR-2 are designed to alleviate the huge changes. In a word, this experimental results clearly show that the proposed algorithm is robust with respect to the mean reversion sensitivity parameter. On the other side, for the failing case, DJIA, the mean reversion effect is different. As ϵ approaches 0, the cumulative wealth achieved by PAMR drops. This phenomena can be interpreted as that the motivating mean reversion does not exist in the DJIA dataset, at least in the sense of our motivation.

Fig. 5
figure 5

Parameter sensitivity of the cumulative wealth achieved by PAMR with respect to sensitivity parameter ϵ

Second, we evaluate the other important parameter for both PAMR-1 and PAMR-2 algorithms, that is, aggressiveness parameter C. Figures 6 and 7 show the effects on the cumulative wealth with varying sensitivity parameter ϵ from 0 to 1.5 and aggressiveness parameter C from 50 to 5000, on PAMR-1 and PAMR-2, respectively. Each heat map indicates the cumulative wealth achieved by PAMR with different C and ϵ combination. The indication bar on the right side of each heat map illustrates that each color represents a level of cumulative wealth achieved. It is clear that in most cases, except DJIA, we observe that as ϵ decreases and C increases, the cumulative wealth increases and then stabilizes as ϵ and C cross certain data-dependent thresholds. Moreover, we find C does not have a significant effect on the cumulative wealth achieved. We also find that the proposed PAMR algorithms are not so parameter sensitive, since a wide range of values correspond to the highest cumulative wealth. This again exhibits that the proposed PAMR strategy is robust with respect to its parameters. Similarly, the heat map on DJIA again shows that the mean reversion effect does not exist on the dataset, in the sense of our motivation.

Fig. 6
figure 6

Parameter sensitivity of the cumulative wealth achieved by PAMR-1 with respect to sensitivity parameter ϵ and aggressiveness parameter C

Fig. 7
figure 7

Parameter sensitivity of the cumulative wealth achieved by PAMR-2 with respect to sensitivity parameter ϵ and aggressiveness parameter C

5.4.5 Experiment 5: evaluation of practical issues

For a real-world application, there are some important practical issues for portfolio selection, including the issues of transaction cost and margin buying. This experiment aims to examine how these practical issues affect the proposed PAMR strategy.

First, transaction cost is an important and unavoidable issue that should be addressed in practice. In our experiment, we adopt proportional transaction cost model stated in Sect. 5.2.2 to test the effect of the transaction cost on the proposed PAMR strategy. Figure 8 depicts the effect of proportional transaction cost when PAMR is applied on the six datasets, where the transaction cost rate γ varies from 0 to 1%. We only present the results achieved by PAMR since the effect of its variants, that is, PAMR-1 and PAMR-2, is quite similar to that of PAMR. For comparison, we also plot the results achieved by two state-of-the-art strategies (Anticor and BNN) and the cumulative wealth achieved by the two benchmarks (BCRP and Market). Since BCRP is the target strategy for regret minimization approaches (UP, EG and ONS) and for consistency, we do not plot the results achieved by these approaches.

Fig. 8
figure 8

Scalability of the cumulative wealth achieved by PAMR with respect to transaction cost rate (γ). The break-even transaction cost rates to the market index are about 0.7%, 0.4%, 0.1%, 0.3% and 0% on the six datasets, respectively

From the results shown in the figure, we can observe that PAMR can withstand reasonable transaction cost rates. For example, with a transaction cost rate of 0.2%, PAMR can beat the BCRP strategy on the four datasets. The break-even transaction cost rates with respect to the market index ranges from 0.1% to 0.7% on the datasets, except DJIA. Since PAMR more actively reverts to the mean and thus results in more drastic portfolio changes, it surpasses Anticor with low or medium transaction costs while it underperforms Anticor with high transaction costs, On the other hand, it outperforms BNN in most cases. Note that the transaction cost rate in real market is low.Footnote 11 This experiment clearly shows the practical applicability of the proposed PAMR strategy when we take transaction cost into consideration.

Second, margin buying is another practical concern for a real-world portfolio selection task. In the following, we evaluate the performance of the approaches when margin buying is allowed with the model described in Sect. 5.2.2. Table 7 presents the cumulative wealth achieved by the competing approaches without/with margin loans on the six stock datasets. As we can observe, when margin buying is allowed, the profitability of PAMR increases, and in most cases, it achieves higher cumulative wealth than other competing approaches. These results clearly demonstrate that the proposed PAMR strategy can be extended to handle margin buying issue and benefit from margin buying, and thus has a better practical applicability.

Table 7 Cumulative wealth achieved by various strategies on the stock datasets with/without margin loans (ML). Top two achievements on each dataset are highlighted

5.4.6 Experiment 6: evaluation of computational time cost

Our last experiment is to evaluate the computational time costs of different approaches, which is also an important issue in developing a practical online trading strategy. As stated in Sect. 4.3, the proposed PAMR algorithm enjoys linear time complexity per iteration, which is comparable to EG algorithm. Table 8 presents the computational time cost (in seconds) of the performance comparable approaches (Anticor, BK and BNN) on the six stock datasets. All the experiments were conducted on an Intel Core 2 Quad 2.66 GHz processor with 4 GB RAM, using Matlab 2009b on Windows XP.

Table 8 Computational time cost on the real datasets (in seconds)

From the results, we can clearly see that in all cases the proposed PAMR takes significant less computational time than the three performance comparable strategies. Even though the computational time in the back tests, especially per trading day, is small, it is important in certain scenarios such as high frequency trading (Aldridge 2009), where transactions may occur in a fraction of a second. Nevertheless, the results clearly demonstrate the computational efficiency of the proposed PAMR strategy, which is also an important concern for real-world large-scale applications.

5.5 Discussions and threads to validity

5.5.1 Discussion on model assumption

Any statement about such encouraging empirical results would be incomplete without acknowledging the simplified assumptions made in Sect. 2. To recall, we had made several assumptions regarding transaction cost, market liquidity and market impact, which would affect the practical deployment of the proposed algorithm.

The first assumption is that no transaction cost exists. In Sect. 5.4.5 we have already examined the effect of varying transaction costs, and the results show that the proposed algorithm can withstand moderate transaction costs. Currently, with the wide-spread adoption of electronic communication networks (ECNs) and multilateral trading facilities (MTFs) on financial markets, various online trading brokers charge very small transaction cost rates, especially for large institutional investors. They also use a flat-rate,Footnote 12 based on the volume threshold one reaches. Such measures can facilitate the portfolio managers to lower their transaction cost rates.

The second assumption is that the market is liquid and one can buy and sell any quantity at the quoted price. In practice, low market liquidity results in a large bid-ask spread—the gap between prices quoted for an immediate bid and an immediate ask. As a result, the execution of orders may incur a discrepancy between the prices sent by the algorithm and the prices actually executed. Moreover, stocks are often traded in multiples of lot, which is the standard trading unit containing certain number of stock shares. In this situation, the quantity of the stocks may not be arbitrary divisible. In the experiments, we have tried to minimize the effect of market liquidity by choosing the stocks that have large market capitalization, which usually have small bid-ask spreads and discrepancy, and thus have a high market liquidity.

The other assumption is that the portfolio strategy would have no impact on the market, that is, the stock market will not be affected by the trading algorithm. In practice, the impact can be neglected if the market capitalization of the portfolio is not too large. However, as the experimental results show, the portfolio wealth generated by PAMR increases astronomically, which would inevitably impact the market. One simple way to handle this issue is to scale down the portfolio, as done by many quantitative funds. Moreover, the development of algorithmic trading, which slices a big order into multiple smaller orders and schedules these orders to minimize the market impact, can significantly decrease the potential market impact of the proposed algorithm.

Here, we emphasize again that this study assumes a “perfect market”, which is consistent with previous studies in literature. It is important to note that even in such a perfect financial market, no algorithm has ever claimed such high performance, especially on the standard NYSE (O) dataset. Though it is common investment knowledge that past performance may not be reliable indicator of future performance, such high performance does provide us confidence that the proposed PAMR algorithm may work well in future unseen markets.

5.5.2 Discussion on PAMR assumption

Though the proposed algorithm performs well on most datasets, we can not claim that PAMR can perform well on arbitrary portfolio pools. It is worth noting that PAMR relies on the assumption that mean reversion exists in a portfolio pool, that is, buying worse performing stocks is profitable. Preceding experiments seem to show that in most cases mean reversion does exist in the market. However, it is still possible that this assumption fails to exist in certain cases, especially when portfolio components are wrongly selected. PAMR’s performance on DJIA dataset indicates that mean reversion may not exist in its portfolio components. Though both based on mean reversion, PAMR and Anticor are formulated with different time periods of mean reversion, which may interpret why Anticor achieves a good performance on DJIA. Thus before investing in real market, it is of crucial importance to ensure that the motivating mean reversion does exist among the portfolio pools. In academic, mean reversion property in single stock has been extensively studied (Poterba and Summers 1988; Hillebrand 2003; Exley et al. 2004), one natural way is to calculate the sign of auto-correlation (Poterba and Summers 1988). On the contrary, the mean reversion property among a portfolio lacks academic attention. Compared with mean reversion in single stock, for a portfolio, not only the mean reversion of single stock matters, but rather the interaction among stocks matters.

On the other hand, the mixture algorithm, that is, MIX, performs well on the DJIA dataset, beating three benchmarks. As we discussed in Sect. 4.6, the mixture algorithm can provide a worst-case guarantee, which is lacked for the original PAMR algorithms. This can somehow solve the problem that PAMR itself does not have a worst-case guarantee. Moreover, it is worth noting that even with worst-case guarantee, some existing universal algorithms also perform poorly on the dataset.

Now let us briefly analyze the reason that PAMR failed on DJIA. To test whether mean reversion exists in the DJIA dataset, we propose a naïve trading strategy to test our motivating mean reversion in the dataset. The test strategy sets the weights proportional to differences between assets’ returns and that of last best stock, that is, last best stock will be given zero weight, while the worst performing stock will be given a maximum weight. We are interested in whether this simple algorithm produces positive return among existing datasets. If it produces positive daily return, then the assumption that buying worse stocks may work well. Otherwise, our motivating assumption fails. The test is conducted on all six datasets. We calculated their arithmetic average daily returns and their standard deviations of daily returns. Since we are interested in absolution return, we compare their average values with 1. From the statistics in Table 9, we can find that the five successful datasets release average profit (>1.0), while DJIA releases average loss (<1.0). Thus, on DJIA dataset, it is expected to produce losses by purchasing worse performing stocks in the portfolio. Though expected daily loss is small, it would produce huge cumulative loss with a long trading period.

Table 9 Average daily return and standard deviation of the test strategy

It is interesting to observe above results, however, we cannot claim that this method can definitely identify successful portfolio pools. Analyzing the mean reversion property in portfolio scenario and selecting portfolio components such that the portfolio satisfies mean reversion deserve further attention.

5.5.3 Discussion on back tests

Back tests in historical markets may suffer from “data-snooping bias” issue. One common “data-snooping bias” is dataset selection issue. On the one hand, we selected four datasets, that is, NYSE (O), TSE, SP500, and DJIA datasets, based on previous studies without consideration to the proposed approach. On the other hand, we developed the PAMR algorithm based solely on NYSE (O) dataset, while other five datasets (NYSE (N), TSE, SP500, MSCI and DJIA datasets) were obtained after the algorithm was fully developed. However, even we are cautious about the dataset selection issue, it may still appear in the experiments, especially for the datasets with relatively long history, that is, NYSE (O) and NYSE (N). The NYSE (O) dataset, pioneered by Cover (1991) and followed by other researchers, becomes one “standard” dataset in the learning community. Since it contains 36 large cap NYSE stocks that survived in hindsight for 22 years, thus it suffers from extreme survival bias. Nevertheless, it still has the merit to compare the performance among algorithms as done in all previous work. The NYSE (N) dataset, as a continuation of NYSE (O), contains 23 assets survived from previous 36 stocks for another 25 years. Therefore, it becomes even worse than the previous NYSE (O) dataset in terms of survival bias. In a word, even the experiment results on these datasets clearly show the effectiveness of the proposed PAMR algorithm, one can not make claims without noticing the deficiencies of these datasets.

Another common bias is asset selection issue. Four of the six datasets (NYSE (O), TSE, SP500, and DJIA) are collected by others, and to the best of our knowledge, their assets are mainly the largest blue chip stocks in their respective markets. As a continuation of NYSE (O) dataset, we self-collected NYSE (N) , which again contains several largest survival stocks in NYSE (O). The remaining dataset (MSCI) is chosen according to the world indices. In a word, we try to avoid the asset selection bias via arbitrarily choosing the representative stocks in their respective markets, which usually have large capitalization and thus high liquidity. Moreover, investing in these largest assets may reduce the market impact caused by the proposed portfolio strategy. Finally, following existing model assumption and experimental setting, we do not consider the assets of low quality, such as the bankrupt stocks and penny stocks. On the one hand, the bankrupt stock data is difficult to acquire, thus we cannot observe their behaviors and predict the behaviors of PAMR on datasets with bankrupt stocks. In reality, the bankruptcy situation happens rarely for the blue chip stocks as typically a bankrupt stock would be removed from the list of blue chip stocks before it actually goes bankruptcy. On the other hand, the penny stocks lack the required liquidity to support the trading frequency in current research. Besides, one could also explore many practical strategies to exclude the low quality stocks from the asset pool at some early stage, such as some financial methods via either technical or fundamental analysis.

6 Conclusion

In this article, we proposed a novel portfolio selection strategy, “Passive Aggressive Mean Reversion” (PAMR). Motivated by the idea of mean reversion and passive aggressive learning, PAMR outperforms all benchmarks and various existing strategies on a number of real datasets from different markets. PAMR can also be easily extended to handle certain practical issues, e.g., transaction cost and margin buying. At the same time, PAMR executes in much less time than existing approaches, making it suitable for online applications. We also find that the update scheme of PAMR is based on the trade-off between the return and volatility risk, which is ignored by most existing learning strategies. This interesting property connects the PAMR strategy with modern portfolio theory, which may provide further explanation from the aspect of finance.

Although in most cases the proposed PAMR strategy achieves encouraging empirical results, it is still far from perfect for a real investment task, and may be improved in the following aspects. First of all, though universality may not be required in real investment, PAMR’s universality is still an open question. Second, none of existing algorithms considers the bankrupt assets, which may happen in real investment. It is thus interesting to study the behaviors of the bankrupt assets and design strategies to exploit them. Besides, we note that PAMR sometimes fails when the mean reversion property does not exist in the portfolio components. Then it is crucial to propose efficient methods to test mean reversion. Finally, though PAMR handles the issue of transaction costs well, it is not formally addressed in our problem formulation. It would be interesting to incorporate the transaction cost issue when formulating the problem in order to improve the performance in case of high transaction costs and gain higher break-even ratios with respect to the market index.