To examine the empirical efficacy of the proposed PAMR strategy, we conduct an extensive set of numerical experiments on a variety of real datasets. In our experiments, we adopt six real datasets, which were collected from several diverse financial markets. The performance metrics include cumulative wealth and risk-adjusted returns (volatility risk and drawdown risk). We also compare the proposed PAMR algorithms with all existing algorithms stated in the related work section.
Experimental testbed on real data
In this study, we focus on historical daily prices in stock markets which are easy to obtain from public domains (such as Yahoo Finance and Google Finance), and thus publicly available to other researchers. Data from other types of markets, such as high frequency intra-day quotes and Forex markets, are either too expensive or hard to obtain and process, and thus may reduce the experimental reproducibility. In general, we employ six real and diverse datasets from several types of financial markets,Footnote 4 which are summarized in Table 3.
Table 3 Summary of the six real datasets in our numerical experiments
The first one is NYSE dataset, one “standard” dataset pioneered by Cover (1991) and followed by several other researchers (Singer 1997; Helmbold et al. 1996; Borodin et al. 2004; Agarwal et al. 2006; Györfi et al. 2006, 2008). This dataset contains 5651 daily price relatives of 36 stocksFootnote 5 in New York Stock Exchange (NYSE) for a 22-year period from Jul. 3rd 1962 to Dec. 31st 1984. We denote this dataset by “NYSE (O)” for short.
The second dataset is the extended version of the above NYSE dataset. For consistency, we collected the latest data in New York Stock Exchange (NYSE) from Jan. 1st 1985 to Jun. 30th 2010, which consists of 6431 trading days. We denote this new dataset as “NYSE (N)”.Footnote 6 It is worth noting that this new dataset consists of 23 stocks rather than the previous 36 stocks owing to amalgamations and bankruptcies. All self-collected price relatives are adjusted for splits and dividends, which is consistent with the previous “NYSE (O)” dataset.
The third dataset “TSE” is collected by Borodin et al. (2004), which consists of 88 stocks from Toronto Stock Exchange (TSE) containing price relatives of 1259 trading days, ranging from Jan. 4th 1994 to Dec. 31st 1998. The fourth dataset “SP500” is collected by Borodin et al. (2004), which consists of 25 stocks with the largest market capitalizations in the 500 SP500 components. It ranges from Jan. 2nd, 1998 to Jan. 31st 2003, containing 1276 trading days.
The fifth dataset is “MSCI”, a collection of global equity indices which are the constituents of MSCI World Index.Footnote 7 It contains 24 indices which represent the equity markets of 24 countries around the world, and consists of a total of 1043 trading days, ranging from Apr. 1st 2006 to Mar. 31st 2010. The final dataset is the “DJIA” dataset collected by Borodin et al. (2004), which consists of Dow Jones 30 composite stocks. DJIA contains 507 trading days, ranging from Jan. 14th 2001 to Jan. 14th 2003.
Besides the above six real market data, in the experiments, we also ran each dataset in their reverses (Borodin et al. 2004). For each dataset, we created a reversed dataset, which reverses the original order and inverts the price relatives. We denote these reverse datasets using a ‘−1’ superscript on the original dataset names. In nature, these reverse datasets are quite different from the original datasets, and we are interested in the behaviors of the proposed algorithm on these artificial datasets.
Unlike the previous studies, the above testbed covers much longer trading periods from 1962 to 2010 and much more diversified markets, which enables us to examine how the proposed PAMR strategy performs under different events and crises. For example, it covers several well-known events in the stock markets, such as dot-com bubble from 1995 to 2000 and subprime mortgage crisis from 2007 to 2009. The five stocks datasets are mainly chosen to test the capability of the proposed PAMR on regional stock markets, while the “MSCI” dataset aims to test PAMR’s capability on global indices, which may be potentially applicable to “Fund on Fund” (FOF).Footnote 8 As a remark, although we numerically test the PAMR algorithm on stock markets, we note that the proposed strategy could be generally applied to any type of financial markets.
Experimental setup and metrics
Regarding the parameter settings, there are two key parameters in the proposed PAMR algorithms. One is the sensitivity parameter ϵ and the other is the aggressiveness parameter C. Roughly speaking, the best values for these parameters are often dataset dependent. In the experiments, we simply set these parameters empirically without tuning for each dataset separately. Specifically, for all datasets and experiments, we set the sensitivity parameter ϵ to 0.5 in the three algorithms, and set the aggressiveness parameter C to 500 in both PAMR-1 and PAMR-2, with which the cumulative wealth achieved tends to be stable for the proposed PAMR on most datasets. It is worth noting that these choices for parameters are not always the best. Our experiments on the parameter sensitivity in Sect. 5.4.4 show that the proposed PAMR algorithms are quite robust with respect to different parameter settings.
For the proposed mixture algorithm (MIX), we set the expert poolFootnote 9 as initial uniform combination of PAMR, ONS, Anticor, and BNN, and individual experts are set according to their respective studies.
We adopt the most common metric, cumulative wealth, to primarily compare different trading strategies. In addition to the cumulative wealth, we also adopt annualized Sharpe Ratio (SR) to compare the performance of different trading algorithms. In general, the higher the values of the cumulative wealth, and the annualized Sharpe Ratio, the better the performance of the compared algorithm. Besides, we also adopt Maximum Drawdown (MDD) and Calmar Ratio (CR) for analyzing the downside risk of the PAMR strategy. The lower the MDD value, the more preferable the trading algorithm concerning the downside risk. The higher the CR value, the more performance efficient the trading algorithm concerning the downside risk. The performance criteria are detailed in the following section.
Performance criteria
One of the standard criteria to evaluate the performance of a strategy is portfolio cumulative wealth achieved by the strategy until the end of the whole trading period. In our study, we simply set the initial wealth S
0=1 and thus the notation S
n
also denotes portfolio cumulative return at the end of the nth trading day, which is the ratio of the portfolio cumulative wealth divided by the initial wealth. Another equivalent criterion is annualized percentage yield (APY) which takes the compounding effect into account, that is, \({\mathrm{APY}}=\sqrt[y]{\mathbf{S}_{n}}-1\), where y is the number of years corresponding to n trading days. APY measures the average wealth increment that one strategy could achieve compounded in a year. Typically, the higher the value of portfolio cumulative wealth or annualized percentage yield, the more performance preferable the trading strategy is.
For some process-dependent investors (Moody et al. 1998), it is important to evaluate risk and risk-adjusted return of portfolios (Sharpe 1963, 1994). One common way to achieve this is to use annualized standard deviation of daily returns to measure the volatility risk and annualized Sharpe Ratio (SR) to evaluate the risk-adjusted return. For portfolio risk, we calculate the standard deviation of daily returns, and multiply by \(\sqrt{252}\) (here 252 is the average number of annual trading days) to obtain annualized standard deviation. For risk-adjusted return, we calculate annualized Sharpe Ratio according to, \({\mathrm{SR}}=\frac{{\mathrm{APY}}-R_{f}}{\sigma_{p}}\), where R
f
is the risk-free return (typically the return of Treasury bills, fixed at 4% in this work), and σ
p
is the annualized standard deviation of daily returns. Basically, higher annualized Sharpe Ratios indicate better performance of a trading strategy concerning the volatility risk.
The investment community often analyzes DrawDown (DD) (Magdon-Ismail and Atiya 2004) to measure the decline from a historical peak in the cumulative wealth achieved by a financial trading strategy. Formally, let S(⋅) denote the process of cumulative wealth achieved by a trading strategy, that is, {S
1,…,S
t
,…,S
n
}. The DrawDown at any time t, is defined as DD(t)=max[0,max
i∈(0,t)
S(i)−S(t)]. The Maximum DrawDown for a horizon n, MDD(n) is defined as, MDD(n)=max
t∈(0,n)[DD(t)], which is an excellent way to measure the downside risk of different strategies. Moreover, we also adopt Calmar Ratio (CR) to measure the return relative of the drawdown risk of a portfolio, calculated as \({\mathrm{CR}} = \frac{\mathrm{APY}}{\mathrm{MDD}}\). Generally speaking, the smaller the Maximum DrawDown, the more downside risk tolerable the financial trading strategy. Higher Calmar Ratios indicate better performance of a trading strategy concerning the drawdown risk.
To test whether simple luck can generate the return of the proposed strategy, we can also conduct a statistical test to measure the probability of this situation, as is popularly done in the fund management industry (Grinold and Kahn 1999). First, we separate the portfolio daily returns into two components: one benchmark-related and the other non-benchmark-related by regressing the portfolio excess returnsFootnote 10 against the benchmark excess returns. Formally, s
t
−s
t
(F)=α+β(s
t
(B)−s
t
(F))+ϵ(t), where s
t
stands for the portfolio daily returns, s
t
(B) denotes the daily returns of the benchmark (market index) and s
t
(F) is the daily returns of the risk-free assets (here we simply choose Treasury bill and set it to 1.000156, or equivalently, annual interest of 4%). This regression estimates the portfolio’s alpha (α), which indicates the performance of the investment after accounting for the involved risk. Then we conduct a statistical t-test to evaluate whether alpha is significantly different from zero, by using the t statistic \(\frac{\alpha}{\mathrm{SE} (\alpha )}\), where SE(α) is the standard error for the estimated alpha. Thus, by assuming the alpha is normally distributed, we can obtain the probability that the returns of the proposed strategy are generated by simple luck. Generally speaking, the smaller the probability, the higher confidence the trading strategy.
Practical issues
While our model described in Sect. 2 is concise and not complicate to understand, it omits some practical issues in the portfolio management industry. We shall now relax some constraints in our model to address these issues.
In reality, an important and unavoidable issue is transaction cost. Generally, there are two ways to handle the transaction costs. The first, commonly adopted by learning to select portfolio strategies, is that the portfolio selection process doesn’t take into account the transaction cost while the following rebalancing incurs transaction costs. The second is that the transaction cost is directly involved in the portfolio selection process (Györfi and Vajda 2008). In this work, we take the first way and adopt proportional transaction cost model proposed in Blum and Kalai (1999) and Borodin et al. (2004). To be specific, rebalancing the portfolio incurs a transaction cost on every buy and sell operation, based upon a transaction cost rate γ∈(0,1). At the beginning of the tth trading day, the portfolio manager rebalances the portfolio from the previous closing price adjusted portfolio \({\hat{\mathbf{b}}}_{t-1}\) to a new portfolio b
t
, incurring a transaction cost of \(\frac{\gamma}{2} \times \sum_{i}{\vert b_{(t,i)}-\hat{b}_{(t-1, i)}\vert }\), where the initial portfolio is set to (0,…,0). Thus, the cumulative wealth achieved by the end of the nth trading day can be expressed as:
$${\mathbf{S}}_{n}^{c (\gamma )}={\mathbf{S}}_0\prod_{t=1}^{n} \biggl[ ({\mathbf{b}}_t\cdot{\mathbf{x}}_t )\times \biggl(1-\frac{\gamma}{2} \times\sum_{i}{\big \vert b_{(t, i)}-\hat{b}_{(t-1,i)}\big \vert } \biggr) \biggr].$$
Another practical issue in portfolio selection is margin buying, which allows the portfolio managers to buy securities with cash borrowed from security brokers. Following previous studies (Cover 1991; Helmbold et al. 1996; Agarwal et al. 2006), we relax this constraint in the model and evaluate it empirically in Sect. 5.4.5. In this study, the margin setting is assumed to be 50% down and 50% loan, at an annual interest rate of 6%, so the interest rate of the borrowed money, c is set to 0.000238. Thus, for each security in the asset pool, a new asset named “Margin Component” is generated. Following the down and loan percentage, the price relative for the “Margin Component” of asset i would be 2∗x
ti
−1−c, where x
ti
is the price relative of the ith asset for the tth trading day. In cases of \(x_{ti}\leq\frac {1+c}{2}\), that is, certain stocks drop more than half, we simply set “Margin Component” to 0. By adding this “Margin Component”, we magnify both the potential profit and loss of the trading strategy on the ith asset.
Comparison approaches
In our experiments, we implement the proposed PAMR strategy and its two variants, viz., PAMR-1 and PAMR-2. We compare them with a number of benchmarks and existing strategies as described in Sect. 3. Below we summarize the list of compared algorithms, whose parameters are set according to the recommendations from their respective studies.
-
1.
Market: Market strategy, that is, uniform Buy-And-Hold (BAH) strategy;
-
2.
Best-Stock: Best stock in the market, which is a strategy in hindsight;
-
3.
BCRP: Best Constant Rebalanced Portfolios strategy in hindsight;
-
4.
UP: Cover’s Universal Portfolios implemented according to Kalai and Vempala (2002), where the parameters are set as δ
0=0.004, δ=0.005, m=100, and S=500;
-
5.
EG: Exponential Gradient (EG) algorithm with the best parameter η=0.05 as suggested by Helmbold et al. (1996);
-
6.
ONS: Online Newton Step (ONS) with the parameters suggested by Agarwal et al. (2006), that is, η=0, β=1, \(\gamma=\frac{1}{8}\);
-
7.
SP: Switching Portfolios with parameter \(\gamma=\frac{1}{4}\) as suggested by Singer (1997);
-
8.
GRW: Gaussian Random Walk strategy with parameter σ=0.00005 recommended by Levina and Shafer (2008);
-
9.
M0: Prediction based algorithm M0 with parameter β=0.5 as suggested by Borodin et al. (2000);
-
10.
Anticor: BAH30(Anticor(Anticor)) as a variant of Anticor to smooth the performance, which achieves the best performance among the three solutions proposed by Borodin et al. (2004);
-
11.
BK: Nonparametric kernel-based moving window (BK) strategy with W=5, L=10 and threshold c=1.0 which has the best empirical performance according to Györfi et al. (2006);
-
12.
BNN: Nonparametric nearest neighbor based strategy (BNN) with parameters W=5, L=10 and \(p_{\ell}=0.02+0.5\frac{\ell-1}{L-1}\) as the authors suggested (Györfi et al. 2008).
Experimental results
Experiment 1: evaluation of cumulative wealth
We first compare the performance of the competing approaches based on their cumulative wealth. From the experimental results shown in Table 4, we can draw several observations below.
Table 4 Cumulative wealth achieved by various trading strategies on the six datasets and their reversed datasets. The top two best results in each dataset are highlighted in bold font
First of all, we observe that learning to select portfolio strategies generally perform better than three common benchmarks, which shows that it is promising to investigate learning algorithms for portfolio selection. Second, we find that although the cumulative wealth achieved by the regret minimization approaches (UP, EG and ONS) is higher than market strategy, their performance is significantly lower than that achieved by the wealth maximization approaches (Anticor, BK and BNN). This shows that to achieve better investment return, it is more powerful and promising to exploit the wealth maximization approaches for portfolio selection. Third, from the top two results indicated on each original dataset, it is clear that the proposed PAMR strategy (PAMR, PAMR-1, and PAMR-2) significantly outperforms most (except DJIA datasets) competitors including Anticor, BK and BNN, which are the state of the arts. The encouraging results in cumulative wealth validate the importance of exploiting the mean reversion property in the financial markets by an effective online learning strategy. On the other hand, though MIX beats the benchmarks on the DJIA dataset, PAMR algorithms perform bad on the DJIA dataset. This may be attributed to the reason that the motivating mean reversion does not exist in this dataset. This raises an important question, “How to select the portfolio pool such that the motivating mean reversion exists on target portfolio?” Sect. 5.5.2 provides some discussions on this question.
Further examining the details, we find that the most impressive performance is achieved by PAMR on the standard NYSE (O) dataset, where its initial wealth grows by a factor of more than 5 quadrillion at the end of the 22-year period. We note that the main reason PAMR achieved such exceptional results is that it is powerful to exploit highly volatile price relatives. To verify this, we examine the detailed performance of PAMR in Table 4 by looking into individual stocks, and we find that it relies considerably on one single stock (“Kin Ark”) which has the highest volatility in terms of standard deviation. After removing this stock from the portfolio, we find that the cumulative wealth significantly reduces to 1.27E+08. We will investigate the volatility issue in more details by another experiment on dataset sensitivity in Sect. 5.4.3.
On the reverse datasets, though not performing as shiny as the original datasets, PAMR also performs well. Though some algorithms fail badly, in all cases, PAMR beats the benchmarks, including the market and BCRP strategies. In certain cases, it beats all competitors. It is worth noting these reverse datasets are artificial datasets, which never exist in real markets. PAMR’s performance on these datasets provides strong evidences that mean reversion does exist in even reverse market datasets and PAMR can successfully exploit it.
In addition to the final cumulative wealth, we are also interested in examining how the cumulative wealth changes over different trading periods. Figure 3 shows the trends of the cumulative wealth by the proposed PAMR algorithm and four algorithms (two benchmarks and two state-of-the-art algorithms). From the results, we can see that the proposed PAMR strategy consistently surpasses the benchmarks and the competing strategies over the entire trading period on most datasets (except DJIA dataset), which again validates the efficacy of the proposed technique.
Finally, to measure whether the excess return can be simply obtained by luck, we conduct a statistical t-test as described in Sect. 5.2.1. Table 5 shows the statistical results, which clearly show that the observed excess return is impossible to obtain by simple luck in most datasets. To be specific, the probabilities for achieving the excess returns by luck are almost 0 on datasets except DJIA. However, the statistics on DJIA dataset show that in this dataset, the assumption of mean reversion may not exist. Nevertheless, the results show that the PAMR strategy is a promising and reliable portfolio selection technique to achieve high return with high confidence.
Table 5 Statistical t-test of the performance of the PAMR on the stock datasets
Experiment 2: evaluation of risk and risk-adjusted return
We now evaluate the risk in terms of volatility risk and drawdown risk, and the risk-adjusted return in terms of annualized Sharpe ratio and Calmar ratio. Figure 4 shows the evaluation results on the six datasets. In addition to the proposed PAMR, we also plot two benchmarks (Market and BCRP) and two state-of-the-art algorithms (Anticor and BNN) for comparison. As shown in Fig. 4, Figs. 4(a) and 4(b) depict the volatility risk (standard deviation of daily returns) and the drawdown risk (maximum drawdown) on the six stock datasets. Figures 4(c) and 4(d) compare the corresponding Sharpe ratio and Calmar ratio.
In previous cumulative wealth results, we find that PAMR achieved the highest cumulative return on most original datasets. Of course, high return is associated with high risk, which is commonly acceptable in finance, as no real financial instrument can guarantee a high return without risk. The volatility risk in Fig. 4(a) shows that PAMR almost achieves the highest risk in terms of volatility risk. On the other hand, the drawdown risk in Fig. 4(b) shows that PAMR achieves modest drawdown risk in most datasets. These results validate the above notion that high return is often associated with high risk.
To further evaluate the return and risk, we examine the risk-adjusted return in terms of annualized Sharpe ratio and Calmar ratio. The results shown in Figs. 4(c) and 4(d) clearly show that PAMR achieves excellent performance in most cases, except DJIA dataset. These encouraging results show that PAMR is able to reach a good trade-off between return and risk, even though we do not explicitly consider risk in our problem formulation.
Experiment 3: dataset sensitivity
As observed in Sect. 5.4.1, it is interesting that PAMR gained the excess return from the stock markets. In this section, we aim to examine how the dataset sensitivity affects the proposed PAMR strategy by evaluating performance on datasets of different volatilities.
To examine the effect of the dataset volatility, we create two datasets each consisting of 5 stocks, chosen from NYSE (N) dataset according to their volatility values. To be specific, we ranked the 23 stocks based on their daily volatility values measured by standard deviation of the logarithm of the price relatives (Hull 2008). Then we created two datasets of different volatility: NYSE (H) and NYSE (L), each consisting of 5 stocks of the highest and lowest volatility values, respectively. Table 6 shows the results achieved by various strategies on these two datasets.
Table 6 Cumulative wealth achieved by various strategies on portfolios of extreme volatilities. The “H/L ratio” column shows the ratio between the cumulative wealth achieved on the high-volatility dataset and that achieved on the low-volatility dataset
From the results, we find that different strategies perform diversely on these two datasets. The regret minimization approaches (UP, EG and ONS), perform well regardless of the market volatilities as the theoretical universal property shows, while the wealth maximization approaches (Anticor, BK and BNN) and the proposed PAMR strategy achieved significantly higher cumulative wealth on NYSE (H), the high-volatility dataset. These results show that the volatility of datasets does considerably affect some algorithms, including the wealth maximization approaches and the proposed PAMR strategy. Specifically, we find that the proposed PAMR strategy could benefit much from a high-volatility dataset. For example, on the NYSE (L) dataset, the cumulative wealth achieved by PAMR algorithm is about 132, which is significantly boosted to 1.35E+05 on the NYSE (H) dataset. To further examine which algorithm can benefit most from high-volatility dataset, we calculate the “H/L ratio” value, which is the ratio of cumulative wealth achieved on the high-volatility dataset over that achieved on the low-volatility dataset. From the ratios, we can observe that the PAMR strategy obtained the highest H/L ratio, indicating that PAMR can benefit most from the high-volatility dataset among all the competing methods.
Experiment 4: parameter sensitivity
We now evaluate how different choices of parameters affect the performance of the proposed PAMR strategy. All three PAMR algorithms require to set sensitivity parameter ϵ, while aggressiveness parameter C is needed for PAMR-1 and PAMR-2.
First, we examine the effect of the sensitivity parameter ϵ on the cumulative wealth achieved by PAMR. As ϵ becomes greater than 1, PAMR degrades to uniform CRP strategy and the wealth stabilizes at the wealth achieved by uniform CRP. Thus, we evaluate the effect of ϵ in the range of [0,1.5]. Figure 5 shows the cumulative wealth achieved by PAMR with varying ϵ and those of the two benchmarks, that is, Market and BCRP strategies. Most results, besides DJIA dataset, show that the cumulative wealth achieved by PAMR grows as ϵ approaches 0, that is, the more sensitive the higher the wealth, which validates that the motivating mean reversion does exist on the stock markets. Moreover, in most cases, the cumulative wealth achieved by PAMR tends to stabilize as ϵ crosses certain dataset dependent thresholds. As stated before, we choose ϵ=0.5 in the experiments, with which the cumulative wealth becomes stabilized in most cases. We also note that on some datasets PAMR with ϵ=0 achieves the best. Though ϵ=0 means moving more weights to the worse performing stocks, it may not mean moving everything to the worst stock. On the one hand, the objectives in the formulations would prevent next portfolio far from last portfolio. On the other hand, PAMR-1 and PAMR-2 are designed to alleviate the huge changes. In a word, this experimental results clearly show that the proposed algorithm is robust with respect to the mean reversion sensitivity parameter. On the other side, for the failing case, DJIA, the mean reversion effect is different. As ϵ approaches 0, the cumulative wealth achieved by PAMR drops. This phenomena can be interpreted as that the motivating mean reversion does not exist in the DJIA dataset, at least in the sense of our motivation.
Second, we evaluate the other important parameter for both PAMR-1 and PAMR-2 algorithms, that is, aggressiveness parameter C. Figures 6 and 7 show the effects on the cumulative wealth with varying sensitivity parameter ϵ from 0 to 1.5 and aggressiveness parameter C from 50 to 5000, on PAMR-1 and PAMR-2, respectively. Each heat map indicates the cumulative wealth achieved by PAMR with different C and ϵ combination. The indication bar on the right side of each heat map illustrates that each color represents a level of cumulative wealth achieved. It is clear that in most cases, except DJIA, we observe that as ϵ decreases and C increases, the cumulative wealth increases and then stabilizes as ϵ and C cross certain data-dependent thresholds. Moreover, we find C does not have a significant effect on the cumulative wealth achieved. We also find that the proposed PAMR algorithms are not so parameter sensitive, since a wide range of values correspond to the highest cumulative wealth. This again exhibits that the proposed PAMR strategy is robust with respect to its parameters. Similarly, the heat map on DJIA again shows that the mean reversion effect does not exist on the dataset, in the sense of our motivation.
Experiment 5: evaluation of practical issues
For a real-world application, there are some important practical issues for portfolio selection, including the issues of transaction cost and margin buying. This experiment aims to examine how these practical issues affect the proposed PAMR strategy.
First, transaction cost is an important and unavoidable issue that should be addressed in practice. In our experiment, we adopt proportional transaction cost model stated in Sect. 5.2.2 to test the effect of the transaction cost on the proposed PAMR strategy. Figure 8 depicts the effect of proportional transaction cost when PAMR is applied on the six datasets, where the transaction cost rate γ varies from 0 to 1%. We only present the results achieved by PAMR since the effect of its variants, that is, PAMR-1 and PAMR-2, is quite similar to that of PAMR. For comparison, we also plot the results achieved by two state-of-the-art strategies (Anticor and BNN) and the cumulative wealth achieved by the two benchmarks (BCRP and Market). Since BCRP is the target strategy for regret minimization approaches (UP, EG and ONS) and for consistency, we do not plot the results achieved by these approaches.
From the results shown in the figure, we can observe that PAMR can withstand reasonable transaction cost rates. For example, with a transaction cost rate of 0.2%, PAMR can beat the BCRP strategy on the four datasets. The break-even transaction cost rates with respect to the market index ranges from 0.1% to 0.7% on the datasets, except DJIA. Since PAMR more actively reverts to the mean and thus results in more drastic portfolio changes, it surpasses Anticor with low or medium transaction costs while it underperforms Anticor with high transaction costs, On the other hand, it outperforms BNN in most cases. Note that the transaction cost rate in real market is low.Footnote 11 This experiment clearly shows the practical applicability of the proposed PAMR strategy when we take transaction cost into consideration.
Second, margin buying is another practical concern for a real-world portfolio selection task. In the following, we evaluate the performance of the approaches when margin buying is allowed with the model described in Sect. 5.2.2. Table 7 presents the cumulative wealth achieved by the competing approaches without/with margin loans on the six stock datasets. As we can observe, when margin buying is allowed, the profitability of PAMR increases, and in most cases, it achieves higher cumulative wealth than other competing approaches. These results clearly demonstrate that the proposed PAMR strategy can be extended to handle margin buying issue and benefit from margin buying, and thus has a better practical applicability.
Table 7 Cumulative wealth achieved by various strategies on the stock datasets with/without margin loans (ML). Top two achievements on each dataset are highlighted
Experiment 6: evaluation of computational time cost
Our last experiment is to evaluate the computational time costs of different approaches, which is also an important issue in developing a practical online trading strategy. As stated in Sect. 4.3, the proposed PAMR algorithm enjoys linear time complexity per iteration, which is comparable to EG algorithm. Table 8 presents the computational time cost (in seconds) of the performance comparable approaches (Anticor, BK and BNN) on the six stock datasets. All the experiments were conducted on an Intel Core 2 Quad 2.66 GHz processor with 4 GB RAM, using Matlab 2009b on Windows XP.
Table 8 Computational time cost on the real datasets (in seconds)
From the results, we can clearly see that in all cases the proposed PAMR takes significant less computational time than the three performance comparable strategies. Even though the computational time in the back tests, especially per trading day, is small, it is important in certain scenarios such as high frequency trading (Aldridge 2009), where transactions may occur in a fraction of a second. Nevertheless, the results clearly demonstrate the computational efficiency of the proposed PAMR strategy, which is also an important concern for real-world large-scale applications.
Discussions and threads to validity
Discussion on model assumption
Any statement about such encouraging empirical results would be incomplete without acknowledging the simplified assumptions made in Sect. 2. To recall, we had made several assumptions regarding transaction cost, market liquidity and market impact, which would affect the practical deployment of the proposed algorithm.
The first assumption is that no transaction cost exists. In Sect. 5.4.5 we have already examined the effect of varying transaction costs, and the results show that the proposed algorithm can withstand moderate transaction costs. Currently, with the wide-spread adoption of electronic communication networks (ECNs) and multilateral trading facilities (MTFs) on financial markets, various online trading brokers charge very small transaction cost rates, especially for large institutional investors. They also use a flat-rate,Footnote 12 based on the volume threshold one reaches. Such measures can facilitate the portfolio managers to lower their transaction cost rates.
The second assumption is that the market is liquid and one can buy and sell any quantity at the quoted price. In practice, low market liquidity results in a large bid-ask spread—the gap between prices quoted for an immediate bid and an immediate ask. As a result, the execution of orders may incur a discrepancy between the prices sent by the algorithm and the prices actually executed. Moreover, stocks are often traded in multiples of lot, which is the standard trading unit containing certain number of stock shares. In this situation, the quantity of the stocks may not be arbitrary divisible. In the experiments, we have tried to minimize the effect of market liquidity by choosing the stocks that have large market capitalization, which usually have small bid-ask spreads and discrepancy, and thus have a high market liquidity.
The other assumption is that the portfolio strategy would have no impact on the market, that is, the stock market will not be affected by the trading algorithm. In practice, the impact can be neglected if the market capitalization of the portfolio is not too large. However, as the experimental results show, the portfolio wealth generated by PAMR increases astronomically, which would inevitably impact the market. One simple way to handle this issue is to scale down the portfolio, as done by many quantitative funds. Moreover, the development of algorithmic trading, which slices a big order into multiple smaller orders and schedules these orders to minimize the market impact, can significantly decrease the potential market impact of the proposed algorithm.
Here, we emphasize again that this study assumes a “perfect market”, which is consistent with previous studies in literature. It is important to note that even in such a perfect financial market, no algorithm has ever claimed such high performance, especially on the standard NYSE (O) dataset. Though it is common investment knowledge that past performance may not be reliable indicator of future performance, such high performance does provide us confidence that the proposed PAMR algorithm may work well in future unseen markets.
Discussion on PAMR assumption
Though the proposed algorithm performs well on most datasets, we can not claim that PAMR can perform well on arbitrary portfolio pools. It is worth noting that PAMR relies on the assumption that mean reversion exists in a portfolio pool, that is, buying worse performing stocks is profitable. Preceding experiments seem to show that in most cases mean reversion does exist in the market. However, it is still possible that this assumption fails to exist in certain cases, especially when portfolio components are wrongly selected. PAMR’s performance on DJIA dataset indicates that mean reversion may not exist in its portfolio components. Though both based on mean reversion, PAMR and Anticor are formulated with different time periods of mean reversion, which may interpret why Anticor achieves a good performance on DJIA. Thus before investing in real market, it is of crucial importance to ensure that the motivating mean reversion does exist among the portfolio pools. In academic, mean reversion property in single stock has been extensively studied (Poterba and Summers 1988; Hillebrand 2003; Exley et al. 2004), one natural way is to calculate the sign of auto-correlation (Poterba and Summers 1988). On the contrary, the mean reversion property among a portfolio lacks academic attention. Compared with mean reversion in single stock, for a portfolio, not only the mean reversion of single stock matters, but rather the interaction among stocks matters.
On the other hand, the mixture algorithm, that is, MIX, performs well on the DJIA dataset, beating three benchmarks. As we discussed in Sect. 4.6, the mixture algorithm can provide a worst-case guarantee, which is lacked for the original PAMR algorithms. This can somehow solve the problem that PAMR itself does not have a worst-case guarantee. Moreover, it is worth noting that even with worst-case guarantee, some existing universal algorithms also perform poorly on the dataset.
Now let us briefly analyze the reason that PAMR failed on DJIA. To test whether mean reversion exists in the DJIA dataset, we propose a naïve trading strategy to test our motivating mean reversion in the dataset. The test strategy sets the weights proportional to differences between assets’ returns and that of last best stock, that is, last best stock will be given zero weight, while the worst performing stock will be given a maximum weight. We are interested in whether this simple algorithm produces positive return among existing datasets. If it produces positive daily return, then the assumption that buying worse stocks may work well. Otherwise, our motivating assumption fails. The test is conducted on all six datasets. We calculated their arithmetic average daily returns and their standard deviations of daily returns. Since we are interested in absolution return, we compare their average values with 1. From the statistics in Table 9, we can find that the five successful datasets release average profit (>1.0), while DJIA releases average loss (<1.0). Thus, on DJIA dataset, it is expected to produce losses by purchasing worse performing stocks in the portfolio. Though expected daily loss is small, it would produce huge cumulative loss with a long trading period.
Table 9 Average daily return and standard deviation of the test strategy
It is interesting to observe above results, however, we cannot claim that this method can definitely identify successful portfolio pools. Analyzing the mean reversion property in portfolio scenario and selecting portfolio components such that the portfolio satisfies mean reversion deserve further attention.
Discussion on back tests
Back tests in historical markets may suffer from “data-snooping bias” issue. One common “data-snooping bias” is dataset selection issue. On the one hand, we selected four datasets, that is, NYSE (O), TSE, SP500, and DJIA datasets, based on previous studies without consideration to the proposed approach. On the other hand, we developed the PAMR algorithm based solely on NYSE (O) dataset, while other five datasets (NYSE (N), TSE, SP500, MSCI and DJIA datasets) were obtained after the algorithm was fully developed. However, even we are cautious about the dataset selection issue, it may still appear in the experiments, especially for the datasets with relatively long history, that is, NYSE (O) and NYSE (N). The NYSE (O) dataset, pioneered by Cover (1991) and followed by other researchers, becomes one “standard” dataset in the learning community. Since it contains 36 large cap NYSE stocks that survived in hindsight for 22 years, thus it suffers from extreme survival bias. Nevertheless, it still has the merit to compare the performance among algorithms as done in all previous work. The NYSE (N) dataset, as a continuation of NYSE (O), contains 23 assets survived from previous 36 stocks for another 25 years. Therefore, it becomes even worse than the previous NYSE (O) dataset in terms of survival bias. In a word, even the experiment results on these datasets clearly show the effectiveness of the proposed PAMR algorithm, one can not make claims without noticing the deficiencies of these datasets.
Another common bias is asset selection issue. Four of the six datasets (NYSE (O), TSE, SP500, and DJIA) are collected by others, and to the best of our knowledge, their assets are mainly the largest blue chip stocks in their respective markets. As a continuation of NYSE (O) dataset, we self-collected NYSE (N) , which again contains several largest survival stocks in NYSE (O). The remaining dataset (MSCI) is chosen according to the world indices. In a word, we try to avoid the asset selection bias via arbitrarily choosing the representative stocks in their respective markets, which usually have large capitalization and thus high liquidity. Moreover, investing in these largest assets may reduce the market impact caused by the proposed portfolio strategy. Finally, following existing model assumption and experimental setting, we do not consider the assets of low quality, such as the bankrupt stocks and penny stocks. On the one hand, the bankrupt stock data is difficult to acquire, thus we cannot observe their behaviors and predict the behaviors of PAMR on datasets with bankrupt stocks. In reality, the bankruptcy situation happens rarely for the blue chip stocks as typically a bankrupt stock would be removed from the list of blue chip stocks before it actually goes bankruptcy. On the other hand, the penny stocks lack the required liquidity to support the trading frequency in current research. Besides, one could also explore many practical strategies to exclude the low quality stocks from the asset pool at some early stage, such as some financial methods via either technical or fundamental analysis.