1 Introduction

Whether technical analysis is capable of generating consistent profits is a matter of intense debate, both in research and in practice. According to several academic studies, technical trading rules,Footnote 1 which generate trading signals solely based on past price and volume information, are popular among professional and retail investors.Footnote 2 However, proponents of the efficient market hypothesis (Fama 1965, 1970) argue that technical analysis is meaningless, such as Malkiel (1999) who states that “technical analysts build their strategies on dreams of castles in the air” (p. 138) and that these investors follow an “abnormally dedicated cult” (p. 119).

The empirical literature mainly disagrees on whether technical analysis adds value for investors.Footnote 3 Most of these studies, however, focus exclusively on the leading US stock market indices and investigate narrow sets of technical trading heuristics. In addition to that, the research on technical trading rules generally faces the challenge of an accurate statistical analysis, which is necessary to produce reliable results. Data snooping bias is a major concern when assessing multiple models at the same time, which is often the case in the research of technical trading rules.Footnote 4 Since the risk of data snooping bias is immense when analyzing large sets of trading rules simultaneously, a statistical procedure to account for data snooping is inevitable. However, adequately dealing with data snooping is technically challenging, such that statistical tests that address this problem were not developed until after White (2000) introduced a “Reality Check” for data snooping. Irwin and Park (2007) argue that the technical rigor of previous studies is generally low due to outdated statistical procedures and that many of these positive contributions to the literature may be significantly flawed due to problems with data snooping.

This paper revisits the mixed results found by the previous literature, whose contributions are mainly limited in at least one of three dimensions: (i) the scope of investigated market data, (ii) the range of considered technical trading rules, and (iii) the adequacy of statistical tests to address concerns of data snooping bias. Our goal is to provide a comprehensive study that does not fall short on any of these points. We investigate the predictability of 23 developed and 18 emerging market indices with a set of 6406 common technical trading rules.Footnote 5 The sample includes daily close prices spanning up to 66 years.Footnote 6 We employ the “Stepwise Superior Predictive Ability Test” proposed by Hsu et al. (2010), which is a state-of-the-art multiple hypothesis test designed to assess the statistical significance of the performance of large sets of trading rules while controlling for data snooping bias. The test combines features of related multiple hypothesis tests of Hansen (2005) and Romano and Wolf (2005), and is able to identify as many predictive technical trading rules as possible in a stepwise procedure.Footnote 7

We apply the Stepwise Superior Predictive Ability Test to all stock indices in our sample and find statistically significant outperformance of technical trading rules over simple buy-and-hold strategies in 13 of the 23 developed countries and in 14 of the 18 emerging markets based on a Sharpe ratio (Sharpe 1966, 1994) performance measure.Footnote 8 Particularly for developed market indices, the number of rules that outperform the respective index is generally low. We then divide the sample periods into subsamples of 7 years in length to examine the evolution of predictive ability over time. We find a significant time trend in the predictive power of technical rules, which is highest during the first subperiods and declines sharply thereafter. Almost no developed market is predictable with the considered technical trading rules during the last two subperiods starting in 2002. In contrast, we still find that half of the emerging markets are predictable, with at least one rule during the period between 2002 and 2008. However, in the last subperiod between 2009 and 2016, almost all emerging market indices are unpredictable. These in-sample results are consistent with the predictions of the adaptive market hypothesis of Lo (2004). In contrast to the efficient market hypothesis, the adaptive market hypothesis assumes that markets evolve over time and that efficiency gradually increases as market participants are subject to a steady learning process. The degree of efficiency and the speed of adjustment depends on the competition among traders and their ability to learn. Consequently, initially superior (proprietary) trading strategies may eventually turn unprofitable due to either changing market environments or excessive competition as more participants adopt these strategies. In line with that, for all markets in our sample, the predictability through technical trading rules vanishes over time, and markets that are assumed to be less competitive generally exhibit higher predictability through technical trading rules.

Next, we test for the sensitivity of technical trading to transaction costs. In contrast to the previous literature, we refrain from calculating simple break-even transaction costs for technical rules. This approach treats transaction costs as exogenous and, thereby, neglects the impact of these costs on the performance of trading rules relative to each other.Footnote 9 We circumvent this issue by running the Stepwise Superior Predictive Ability Test multiple times with steadily increasing single-trip transaction costs. The analysis reveals that the predictive power of many rules is quickly offset by the introduction of low transaction costs. Only 5 of the 23 developed markets and 4 of the 18 emerging markets exhibit significantly outperforming trading rules for single-trip transaction costs of at least 20 basis points. The high sensitivity of technical trading performance to transaction costs is in line with prior findings in the literature (e.g., Allen and Karjalainen 1999; Bajgrowicz and Scaillet 2012; Ready 2002).

A significant part of this paper is devoted to an out-of-sample analysis on the applicability of technical heuristics in a way that aims to mimics the trading behavior of a real trader. Most of the previous literature on technical analysis examines only the ex-post performance of technical heuristics and neglects whether these heuristics actually exhibit out-of-sample persistent performance in the future. We identify the best trading rules in 3 year subperiods that are eligible for trading in each subsequent 3 year period. While the in-sample performance in terms of Sharpe ratios of the best trading rules is highly economically and statistically significant, the corresponding out-of-sample performance relative to buying and holding the respective index is either insignificant or negative and significant.Footnote 10 We also construct three equally weighted portfolios based on the out-of-sample technical trading algorithm applied to all 41 markets, as well as all developed markets and all emerging markets, respectively. Assuming transaction costs of zero, we find that none of the three portfolios generates a Sharpe ratio that is significantly different from the ones of the equally weighted buy-and-hold portfolios. For positive transaction costs, the technical trading portfolios significantly underperform the buy-and-hold portfolios most of the time. These findings suggest that the performance of best rules is not persistent and technical trading signals can be considered as noise.

We contribute to the empirical literature with a clear argument against the usefulness of technical trading rules in today’s markets. We consider this contribution significant for several reasons. To the best of our knowledge, this paper is the first broad-based analysis using the powerful and rigorous Stepwise Superior Predictive Ability Test for an analysis of technical trading rules in a comparative analysis of equity markets.Footnote 11 In addition, we leverage the test procedure to determine the distribution of rules with superior performance at different transaction cost levels. Unlike existing transaction cost analyses in the literature, we do not perform a pure break-even analysis, which may be biased since transaction costs are treated as exogenous. Another contribution is the out-of-sample test for technical trading performance. A similar persistence analysis has only been conducted by Bajgrowicz and Scaillet (2012), who apply it to a single time series of daily index data. To the best of our knowledge, our study provides the most comprehensive analysis of a large set of technical trading rules in terms of the number of investigated stock markets, the power of the applied testing procedure, and the rigor of the transaction cost and out-of-sample analysis. Given the general disagreement in the academic literature on the predictive ability of technical trading rules, our work aims to shed light on this ongoing debate.

The remainder of this paper is organized as follows. Section 2 discusses data snooping bias and multiple hypothesis tests in research on technical trading rules. Section 3 introduces the technical trading rules used in the empirical analysis of this paper. Section 4 deals with the performance measurement. Section  5 describes the data. Sections 6 and 7 present the in-sample and out-of-sample results, respectively. Section 8 concludes.

2 Data snooping bias and research on technical analysis

In the following, we provide a discussion of data snooping and appropriate testing procedures to cope with it in the context of technical analysis. While this section is mainly nontechnical, Appendix 1 introduces the basic steps required to implement the Stepwise Superior Predictive Ability Test by Hsu et al. (2010) used in the empirical part of this paper.

Data snooping becomes a problem if data are used more than once for statistical inference or to assess the applicability of multiple models (White 2000). This occurs frequently in empirical research as prior empirical findings likely motivate researchers to reexamine already investigated data (Lo and MacKinlay 1990b). Thus, datasets that happen to yield “desired” results are likely studied more intensively, while other datasets may be studied less frequently and the corresponding results are rarely published. With respect to that, Merton (1987) stresses the potential for biased results due to an overuse of certain data. He argues that standard statistical tests are inappropriate for these applications as they do not account for potential biases caused by data snooping. In addition, data snooping bias is a concern when multiple hypotheses are evaluated based on one dataset. One may (unintentionally) over adjust and fine tune the parametrization of technical heuristics based on past information, and succumb to the impression of superior performance. Thus, technical analysis research is highly susceptible to data snooping bias due to the large number of potentially testable trading rules. Bajgrowicz and Scaillet (2012) exemplify the danger of data snooping with the following exaggerated example: “[I]magine you put enough monkeys on type writers and that one of the monkeys writes ‘The Iliad’ in ancient Greek. Because of the sheer size of the sample, you are likely to find a lucky monkey once in a while. Would you bet any money that he is going to write ‘The Odyssey’ next?” (p. 474). The essence of data snooping bias is, therefore, that ample search for successful trading rules (or, more generally, for some models) in past data may likely yield superior but spurious results.

Classical hypothesis testing procedures disregard the impact of search on statistical significance in problems involving a high number of models to be evaluated. Following the argument of Hsu et al. (2016), if one aims to test for statistical significance of the maximum element from the vector \(\theta =\left( \theta _1,\theta _2,\dots ,\theta _m\right)\) (e.g., the highest performance measure from a set of m trading rules), a traditional test setup may simply formulate a null hypothesis based on the maximum element of \(\theta\), \(\theta ^{\max }= \max \left\{ \theta _1,\theta _2,\dots ,\theta _m\right\}\). However, according to White (2000), if an extensive specification search is required to identify the best-performing forecasting model, there is a high risk that the performance of this model is spurious. Even if one is only interested in the significance of the best model, the search process involved in identifying such a model implicitly turns the setup into a multiple hypothesis problem. That is because weaker models are also evaluated along the search process. As a result, neglecting the specification search and, thus, disregarding worse models in the test procedure of the best model increases the probability of committing a type I error of incorrectly rejecting the null hypothesis. Hence, for large m, one likely overstates the statistical significance of the highest performance in \(\theta\) and understates the actual probability of a type I error if the test procedure does not account for data snooping. This concern is particularly pronounced when testing technical trading rules, as the free choice of parameter values typically leads to many different (albeit correlated) models.

An early and simple approach to account for multiple hypotheses in statistical tests is provided by the popular Bonferroni correction. If one targets an overall significance level (often referred to as familywise error rate) of \(\alpha\) while testing m models simultaneously, the Bonferroni correction requires that each model is tested with a significance level of \(\alpha /m\). This approach might be suitable for applications where m is small. However, the test becomes very conservative for large m.Footnote 12 Also, the Bonferroni correction disregards a potentially complex correlation structure of trading signals of “related” trading rules.Footnote 13 The higher the correlation, the more conservative a Bonferroni test is. In the extreme case of a perfect correlation between all considered trading rules, a multiple hypothesis test is equivalent to a single hypothesis test. An appropriate test for evaluating the performance of technical trading rules should take this correlation into account.

White (2000) is the first to propose a joint testing method that considers the dependence structure of individual time series. His Reality Check tests whether the best trading rule from a potentially large set of trading rules shows significant outperformance of a benchmark such as a buy-and-hold strategy. White (2000) encounters the problem that the distribution of \(\theta ^{\max }\) from the vector \(\theta\) with correlated elements is highly complex or unknown (e.g., if the elements of \(\theta\) are correlated normal random variables, the distribution of \(\theta ^{\max }\) is unknown). White (2000) shows that a null hypothesis of the form \({\text{H}}_{0} :\, \max _{{k = 1, \ldots ,m}} \theta _{k} \le 0\) can be tested using the stationary bootstrap of Politis and Romano (1994). According to that, the original data are resampled in blocks of stochastic length to obtain pseudo time series that are used to compute an empirical bootstrap distribution for \(\theta ^{\max }\). The p-value for the best model is then calculated as the proportion of bootstrapped maximum performance measures that are larger than the actual sample maximum.

The Reality Check has two limitations from a statistical and a practical perspective. First, as Hansen (2005) notes, the Reality Check has low statistical power if many poorly performing trading rules are tested. This is because an increasing number of poor models disguises the good ones, resulting in lower test power.Footnote 14 Second, the Reality Check only tests for the significance of \(\theta ^{\max }\), such that there is no statistical evaluation of all strategies that are inferior to the best strategy. In practice, however, it may not be sufficient to just assess the statistical significance for the best trading rule, since one may be interested in all rules that exhibit real outperformance.

Both drawbacks of White’s (2000) Reality Check are addressed in advanced versions of the test. Hansen (2005) proposes the “Superior Predictive Ability Test,” an improved version of the Reality Check in which bootstrap samples are recentered when models perform particularly poorly, assigning lower weights to those models. Still, the Superior Predictive Ability Test also just tests the single best model, regardless of the total number of considered models. Both Romano and Wolf (2005) and Hsu et al. (2010) introduce stepwise versions of the Reality Check and the Superior Predictive Ability Test, respectively. These tests extend the existing counterpart by an algorithm that allows to identify all models with statistically significant outperformance in a stepwise procedure that asymptotically controls for the probability of committing a type I error. If the full set of models contains models with superior performance, these are removed from the full set and the procedure is reestimated over the remaining subset. This stepwise procedure terminates as soon as the test no longer identifies a model that exhibits superior performance at the prespecified significance level. For our empirical study, we employ the Stepwise Superior Predictive Ability Test of Hsu et al. (2010), as it is the most progressive test procedure by combining the Superior Predictive Ability Test of Hansen (2005) with a stepwise algorithm.Footnote 15

3 Technical trading rules

We use a sample of 6406 technical trading rules that are categorized in five major strategy classes. For some rules, we additionally use up to three trading filters. While we briefly introduce these five strategies and three trading filters here, Appendix 2 provides the technical definitions required to implement them for computer-based automated trading.

The relative strength index is a so-called oscillator, which was first introduced and studied by Levy (1967a, 1967b) and Wilder (1978). It defines a ratio of the sum of positive to the sum of all absolute price changes over a prespecified time period. The frequency and the magnitude of positive price changes raises the relative strength index. Analogously, negative price changes decrease the relative strength index. By definition, the index ranges from 0 to 100. Technical analysts interpret a high relative strength index (usually values of 70 and above are considered high) as an indication of a recently overbought stock that is expected to undergo a downward correction in the near future. A value of 30 or below is usually interpreted in the other direction.

Filter rules were studied by Alexander (1961, 1964) as well as Blume and Fama (1966). These rules assume that recent trends in stock prices are likely to continue in the future. Trading signals are generated once a stock shows a trend (i.e., positive or negative cumulative price changes of a certain magnitude) during a specific period of time. Adverse price movements indicate the reversal of trends.

Moving average rules attempt to identify price trends by smoothing the time series of past prices. Trading signals are triggered when two moving averages cross which is perceived as a reversal of a trend. Specifically, once the “slow” moving average (i.e., the smoother moving average due to an inclusion of more historical data) penetrates the “fast” moving average from above, the crossover event is interpreted as a bullish signal. Likewise, a penetration from below is interpreted as a bearish signal.

Support and resistance rules rely on the assumption that the security price has not moved below or above a certain price level several times in the recent past (i.e., the price has formed a “support level” from above or a “resistance level” from below). Once a price breaks through the support level or falls below the resistance level, it is assumed to continue moving in the respective direction.

Channel breakout rules are based on the idea that a security price moves within the boundaries of a certain price range. More specifically, a channel is defined by two parallel lines between which the price moves back and forth over a specific period of time. A break through the upper (lower) boundary of the channel is interpreted as the beginning of a positive (negative) trend.

We supplement the trading rules with up to three common trading filters.Footnote 16 These filters are designed to improve trading results by filtering out “unfavorable” trading signals ex ante.Footnote 17 First, a band filter sets an additional return by which the price of an asset must move in the desired direction to open a trading position after a trading signal is generated. It is, therefore, designed to prevent the entry into a position if the price movement is not strong enough. Second, a time-delay filter simply delays a trade by a specified time interval after a signal is generated. Third, a fixed-length filter defines a fixed holding period after which a position is automatically liquidated. Any intermediate trading signals are ignored. This filter aims to close a position before trends flatten out and the security potentially generates negative returns.

The filter, moving average, support and resistance, and channel breakout rules as well as their parameterizations are adopted from the seminal paper of Sullivan et al. (1999). In contrast to their study, this paper does not use on-balance volume rules (that require daily volume data) because volume data are not available for several indices in our sample. We, furthermore, add the relative strength index rules used by Hsu et al. (2016) to our sample of rules. The parameters of the trading rules are presented in Table 1 and Table 2 lists the considered parameter combinations. Overall, we study a total of 6406 trading rules, including 600 relative strength index rules, 497 filter rules, 2049 moving average rules, 1220 support and resistance rules, and 2040 channel breakout rules.

Table 1 Parameters of technical trading rules
Table 2 Parameter combinations of technical trading rules

4 Performance measurement

Similar to the prior literature (e.g., Bajgrowicz and Scaillet 2012; Hsu et al. 2010, 2016; Sullivan et al. 1999), we use a Sharpe ratio criterion in all applications of the Stepwise Superior Predictive Ability Test to evaluate the performance of technical trading rules (see Appendix 1 for the implementation of the test procedure based on the Sharpe ratio criterion, which is introduced in this section). To formalize the underlying performance measurement, let there be a total of m trading heuristics to be tested and let \(r_{t}\) denote the one-period return of a certain risky asset and \(r_t^f\) denote the risk-free rate in period t, \(t = 1,\dots ,N\). Furthermore, \(\delta _{t-1,k}\) is a signal indicator for heuristic k, \(k = 1,\dots ,m\), which indicates the exposure to the security in period t (i.e., \(\delta _{t-1,k}= 1\) for a long, \(\delta _{t-1,k}= 0\) for a neutral, and \(\delta _{t-1,k}=-1\) for a short position).

Trading according to technical sell signals requires closing long positions and/or entering short positions, where the latter may be associated with several constraints such as lending fees, lacking assets to sell short, or governmental or institutional restrictions (see, e.g., Almazan et al. 2004; D’Avolio 2002; Nagel 2005). Short sale constraints are likely severe for certain markets and time periods in our sample. On the one hand, less-developed equity markets may not be endowed with the institutional framework for short selling (at reasonable costs). On the other hand, during earlier time periods, short selling was much more expensive than it became more recently (D’Avolio 2002; Duffie et al. 2002; Jones and Lamont 2002). Moreover, short selling costs are time varying and tend to increase substantially during periods with a high willingness of investors to short a stock.

We circumvent the problems associated with true short positions by using the “double-or-out” strategy proposed by Bessembinder and Chan (1995). According to that, whenever \(\delta _{t-1,k}= 0\), an investor holds one unit of the asset and earns its return \(r_t\). If \(\delta _{t-1,k}= 1\), an investor is required to double his or her exposure by borrowing money at the risk-free rate to finance the additional unit of the asset that results in a payoff of \(2r_t-r_{t}^f.\) Finally, whenever \(\delta _{t-1,k}= -1\), the asset is sold and the proceeds are invested in the risk-free security such that the payout is equal to \(r_{t}^f\). In summary, the return of trading rule k in period t under a double-or-out strategy reads as

$$\begin{aligned} \pi _{t,k}(r_t,r_t^f,\delta _{t-1,k})= \left\{ \begin{array}{ll} 2r_{t}-r_{t}^f, &{} \delta _{t-1,k} = 1, \\ r_t, &{} \delta _{t-1,k} = 0, \\ r_t^f, &{} \delta _{t-1,k} = -1, \end{array} \right. \end{aligned}$$
(1)

and the associated Sharpe ratio is given by

$$\begin{aligned} S_{k}= \frac{ {\bar{\pi }}_k-{\bar{r}}^f}{{\hat{\sigma }}_k}, \end{aligned}$$
(2)

where \({\bar{\pi }}_k-{\bar{r}}^f\) is the sample average of excess returns and \({\hat{\sigma }}_k\) is the corresponding standard deviation.

We evaluate the performance of technical trading rules against a benchmark. A natural choice for a benchmark model is the null model of staying out of the market (e.g., Brock et al. 1992). However, since active traders usually aim to evaluate their trading performance against a benchmark that is related to the risk–return profile of the traded asset, we chose a simple buy-and-hold strategy as benchmark model. Therefore, we set \(\pi _{t,0} = r_t \delta _{t,0}\), where \(\delta _{t,0}=1\) for all t such that \(\pi _{t,0} =r_t\).

Following Sullivan et al. (1999), our final performance measure is the difference between the Sharpe ratios generated by trading rule k and the benchmark,

$$\begin{aligned} \theta _{k}= \frac{{\bar{\pi }}_k-{\bar{r}}^f}{{\hat{\sigma }}_k} - \frac{{\bar{r}}-{\bar{r}}^f}{{\hat{\sigma }}_0}, \end{aligned}$$
(3)

where \({\hat{\sigma }}_0\) is the sample standard deviation of the return series of the benchmark.

Since we do not have interest rate data for most of the investigated markets, we set \(r_{t}^f = 0\) for all t. Even though this simplification may bias our results, we assume that the impact on the overall outcomes is negligible. First, if a trading rule exhibits more days with long than short exposure, the trading returns may be biased upwards as we disregard longer periods when the investor has to pay \(r_{t}^f\) for borrowing funds to invest in a second unit of the asset. In turn, the performance could be biased downwards in the opposite case. Second, if costs of borrowing vary systematically between periods with long and short exposure, the performance measure may be biased as well. Third, expression (3) shows that the impact of \({\bar{r}}^f\) is approximately offset if standard deviations are similar. However, the performance of trading rule k tends to be overestimated if \({\hat{\sigma }}_0 > {\hat{\sigma }}_k\) when neglecting \({\bar{r}}^f\). For those reasons, we examine whether the simplification could have a significant impact on the performance measure. We find that the vast majority of investigated trading rules are relatively balanced in terms of the time invested in short and long positions. Moreover, the returns from technical trading tend to be more volatile than those of the respective benchmarks (which is mainly due to the characteristics of the double-or-out strategy), suggesting that setting \(r_t^f=0\) may keep our results rather on the conservative side. Based on these findings, we expect that the likelihood of a significant upward bias should be low.

5 Data

The empirical study is based on the daily close prices of the leading stock market indices of 23 developed countries and 18 emerging markets. An overview of these markets and respective sample periods is provided in Table 3. The longest available time series is for the US S&P 500, which covers the period between January 1950 and May 2016, while the shortest is for the Colombian IGBC, which covers the period between July 2001 and May 2016. The index data are retrieved from Datastream and cover the maximum available sample periods ending in May 2016. Markets for which less than 14 years of data are available are not included to ensure appropriately long time series and to have at least two subsamples of 7 years each in the subperiod analysis. The classification of countries as either “developed” or “emerging” is based on the categorization of the International Monetary Fund (IMF 2016), which applied at the end of the sample period. The 23 developed countries are considered as “advanced economies” by the IMF. Likewise, the 18 emerging markets are categorized as either emerging or developing markets.Footnote 18 Obviously, some of the markets undergo significant economic development during the sample period in the sense that some countries were not considered developed at the beginning of the sample period, but are considered so at the end.Footnote 19 Clearly, a static classification of markets for longer time periods is not perfectly possible.

Table 3 Stock indices and sample periods
Table 4 Descriptive statistics

Given that many emerging economies in the sample experienced high levels of inflation in the past, we adjust stock market returns for inflation rates.Footnote 20 The annual inflation rates are obtained from World Bank Open Data. We compute inflation-adjusted daily returns as \(r_t^{\textit{adj}} = r_t^{\textit{unadj}} - \log (1+i_y)/252\), where \(r_t^{\textit{unadj}}\) is the unadjusted daily index return on day t, and \(i_y\) is the corresponding discrete inflation rate in year y.

Table 4 presents basic descriptive statistics for the 41 stock market indices as well as annual inflation rates of the corresponding countries. Not surprisingly, emerging market indices in panel B tend to have higher average annualized returns and are more volatile than most developed market indices in panel A. Similarly, with a few exceptions such as Croatia, these countries have much higher inflation rates on average.

The use of index data is widespread in studies of technical trading rules, which is mainly due to the availability of long time series. A major drawback of such analyses is the limitation that stock indices cannot be traded directly. Investments that are closely related to the risk and return profile of stock indices are usually only possible through certain derivative instruments or exchange-traded funds. An obvious alternative to conducting our analysis would be to use actual market data for the constituent stocks of the considered indices, which is, however, not available for many markets in our sample. In the absence of reasonable alternatives, we therefore also rely on index data, but note that a broad analysis of actual stock data could enrich the technical analysis literature.

The analysis of index data raises additional issues that may lead to an upward bias of results. As shown by Scholes and Williams (1977), nonsynchronous trading of index constituents can lead to measurement error due to spurious serial dependence of index returns. As Ready (2002) notes, especially on days with technical trading signals, there may be systematic differences between the close price and the opening price of a stock index on the next trading day. First, imbalances in buy/sell orders caused by excess buy/sell interest at the close could reappear at the opening of the following trading day and lead to a systematic price movement. Typically, technical signals tend to exploit positive serial dependence since buy (sell) signals are often observed on days with large positive (negative) price movements (Bessembinder and Chan 1995). Moreover, if at least some close prices of the stocks included in the index are stale, partial adjustments on subsequent trading days may result in price changes biased in the same direction as those on previous days. The concern of stale prices is relatively low for very liquid stocks, but it tends to be higher if stocks trade at low volume (Campbell et al. 1993). Hence, biases induced by nonsynchronous trading could be problematic for earlier time periods when trading volume used to be lower (e.g., Lo and Wang 2000) or for less liquid markets in the sample.Footnote 21

To mitigate concerns of overestimated trading performance, we use the common approach of delaying the actual entry into a position by one day after a signal is triggered (see, e.g., Bessembinder and Chan 1995; Ready 2002; Sullivan et al. 1999). More specifically, if a technical signal is generated on day t, it is utilized at the close price of day \(t+1\). Since order imbalances and partial adjustments of stale prices tend to normalize after one trading day, it is assumed that the close of the following day is the first trading opportunity for investors. Also, assuming that stock prices are equally likely to trade at the bid or the ask price one day after a trade signal occurs, delayed trading corrects for spread-induced trading costs.Footnote 22

6 In-sample results

6.1 Performance of technical trading rules

In this section, we present the results of the application of the Stepwise Superior Predictive Ability Test. While we examine subperiods and the impact of transaction costs in later sections, the following analysis addresses full sample periods of all 41 market indices under the assumption that trading is free of any costs.

Table 5 Outperforming technical trading rules

The results are reported in Table 5, where panel A shows the results for developed countries and panel B shows the results for emerging markets, respectively. The column “Full sam.” lists the number of technical trading rules with superior performance against buying and holding the index (i.e., these rules exhibit a p-value of less than 0.1 according to the Stepwise Superior Predictive Ability Test based on the Sharpe ratio criterion). We find that 13 of the 23 developed market indices and 14 of the 18 emerging market indices are predictable with at least one technical trading rule. The highest degree of predictability among developed country indices in terms of the total number of rules with superior performance is observed for Hong Kong (290 or about 4.5% of all considered heuristics), Portugal (153 or about 2.4%), and Great Britain (101 or about 1.6%). Among the emerging market indices, predictability is highest for Peru (1700 or about 26.5%), Pakistan (805 or about 12.6%), and Bulgaria (556 or about 8.7%). Thus, while the share of predictable markets is not only higher among the sample of emerging market indices, these markets also exhibit a higher degree of predictability in terms of the proportion of rules with superior performance. For instance, 8 of the 13 predictable developed market indices have at most 14 outperforming rules, which corresponds to only about 0.2% of the full universe of considered technical trading rules.

This first analysis suggests that the majority of markets are predictable with technical rules, and, in particular, several emerging markets have a significant proportion of rules with superior performance. On the one hand, the high number of predictive rules in emerging markets may be due to lower market efficiency, as the stock markets of less developed countries tend to incorporate new information into prices more slowly and lack institutional framework to enhance efficiency (see, e.g., Araújo Lima and Tabak 2004; Bhuyan et al. 2008; Poshakwale 2002; Sharma and Thaker 2015). On the other hand, trading in these markets tends to incur comparatively high transaction costs (Lesmond 2005), which may then be picked up as profits through active trading strategies under a zero-trading cost scheme.Footnote 23 In fact, most technical trading rules generate trading signals very frequently, such that these rules would incur substantial transaction costs if applied to real markets. Later robustness checks will assess the sensitivity of performance to transaction costs.

Since investors usually strive for maximum profits, the best-performing technical trading rules play an important role among the whole set of examined rules. Therefore, Table  5 also reports basic characteristics for these specific rules for all markets.Footnote 24 For 23 (15) markets, support and resistance rules (moving average rules) have the best performance in terms of the Sharpe ratio criterion.Footnote 25 Moreover, the best-performing rules have significant excess Sharpe ratios of up to 1.29 (Portugal) for developed markets and 1.12 (Saudi Arabia) for emerging markets. These rules, furthermore, produce highly economically significant excess returns, which mainly range from more than 10% to around 40% per year.

Table 5 also shows that best-performing rules predominately exhibit long or short exposure to the market [i.e., 28 (39) of the 41 best rules are either long or short at least 80% (50%) of the time]. In several cases, the best rules are never market-neutral once an initial trading signal is triggered, as a signal converts a long position to a short position and vice versa (e.g., this is the case for all moving average rules without trading filters). For both developed and emerging market indices, best-performing trading rules tend to generate higher average returns per trade than short positions, and long signals are more likely to generate winning trades. This is in line with previous research by, for instance, Brock et al. (1992) who report that gains of long–short trading strategies are mainly driven by long positions, which is consistent with generally upward-trending equity markets in the long run.

The average number of transactions per year varies significantly among best rules. Many of them, however, generate trading signals very frequently. For example, the best rule for the Colombian market is a 5 day moving average that emits more than 70 trades per year on average with a mean holding period of just 4.3 days on the long side and 2.7 days on the short side. Accordingly, trading such rules may accumulate high transaction costs.

In the next two sections, we check for the robustness of our in-sample results by examining how the predictive ability of technical trading rules evolves over time, using subperiods of the full sample periods. Thereafter, we explore the impact of transaction costs.

6.2 Subperiod analysis

In a further step, we test whether the results for the full sample periods also hold during subperiods of 7 years.Footnote 26 Table  6 presents the results for developed country indices (panel A) and emerging market indices (panel B). Again, outperforming rules are defined as those rules that have a p-value of less than 0.1 according to the Stepwise Superior Predictive Ability Test using the Sharpe ratio criterion. For comparison purposes, we report the number of rules with superior performance over the entire sample periods, as already presented in Table 5, in the last column of the table.

Table 6 Outperforming technical trading rules during subperiods

For 20 of the 23 indices for which we find outperforming rules in at least one subperiod, the highest number of outperforming rules is observed in the first subperiod. In these cases, the proportion of predictive rules tends to decrease sharply after the first subperiods. Most notably, there are 350 outperforming rules for the Finnish stock market index between 1988 and 1994, 305 outperforming rules for the Bulgarian stock market index between 2002 and 2008, and 103 outperforming rules for the Brazilian stock market index between 1995 and 2001. While these are the results for the first subperiods, for all heuristics in the sample, the three indices become unpredictable during the subsequent subperiods. A decline in technical trading performance over time is also found by Kish and Kwon (2002) and LeBaron (2000) for major US stock indices and by Hsu et al. (2016), Neely et al. (2009), and Olsen (2004) for several foreign exchange pairs. Our results, furthermore, show that predictive power drops much earlier in developed markets than in emerging markets. In developed markets, predictive ability of technical trading rules disappears almost completely by the early 2000s. For emerging markets, the sharpest decline in predictability is observed between the penultimate (2002–2008) and the most recent (2009–2016) subperiods in terms of the share of predictable markets (see also the last row of the panel). In the last subperiod, no developed market index is predictable anymore and only two emerging market indices are still predictable with only one rule each.

There are several potential reasons for this negative trend in predictability. As discussed in the previous section, trade-intensive rules may perform particularly well before transaction costs. As trading costs have steadily declined in recent decades (e.g., Jones 2002), rules that earn high profits by exploiting a zero-cost system should, all else equal, be more successful during the early subperiods. Another explanation for our results is different levels of market efficiency over time, which is in line with the adaptive market hypothesis of Lo (2004). According to that, the first subperiods can be interpreted as “early evolutionary stages,” which are subject to greater market efficiency in the future, for instance, through gradual learning by market participants. Thus, technical trading strategies can generate superior performance over certain periods, and the more proprietary these strategies are, the longer they may yield attractive results. However, even very sophisticated trading strategies may eventually “die out” if market conditions change significantly, such as more intense competition among traders for strategy returns (Timmermann 2008; Timmermann and Granger 2004).Footnote 27 Closely related to the notion of growing market efficiency is the rise of institutional investors and professional algorithmic traders.Footnote 28 While institutional investors are not necessarily the better investors, greater competition among these investors can lead to an increase in know-how about trading techniques and drive the speed of processing market data.

In summary, the declining performance of technical trading rules across markets may not be surprising, as the rules studied in this paper are considered quite simple and were not ahead of their time at the beginning of the sample periods. The ease with which these naive technical trading rules can be copied suggests that initially predictive rules may become unfavorable relatively quickly. While at least some of the trading rules have performed significantly better than buy-and-hold strategies under a zero-transaction cost scheme in early subperiods, they all appear to be useless in current markets.

6.3 Transaction cost analysis

In the previous analyses, transaction costs were not taken into account. The superior performance found for several technical trading heuristics does not necessarily imply superior returns after transaction costs, since a zero-cost scheme is prone to overestimate the performance of highly trade-intensive heuristics. Transaction costs may also vary substantially across the examined markets (Lesmond 2005). In the following, we relax the strong simplification of trading without costs.

A proper transaction cost analysis is not trivial since reliable cost estimates for certain markets and time periods are usually not available. Therefore, a common approach in the literature is to calculate break-even transaction costs of the rules with superior performance at zero transaction costs (see, e.g., Bessembinder and Chan 1995, 1998; Hsu et al. 2016). For example, Hsu et al. (2016) simply calculate the break-even costs for the best trading rules they identified in a previous test for superior predictive ability. The drawback of this approach is that it does not reveal the maximum transaction cost per trade up to which a trading rule still exhibits statistically significant outperformance. In addition, technical trading rules that emit frequent signals are likely to be more sensitive to transaction costs compared with other rules. Thus, the distribution of rules with significant performance is most likely affected by the introduction or increase of transaction costs (see also Bajgrowicz and Scaillet 2012). To address these issues, we consider transaction costs before testing for statistical significance.

Our approach is simple, but to the best of our knowledge has not been used in the literature before. We increase the transaction costs in steps of five basis points and use the Stepwise Superior Predictive Ability Test at each transaction cost level to test whether technical trading rules have superior performance. As a result, we obtain the number of outperforming rules for each transaction cost level. Once the transaction costs have reached a level where no significant performance can be detected, the algorithm stops.

Table 7 Outperforming technical trading rules and transaction costs

Table 7 reports the results of the transaction cost study (the results for zero transaction costs from Table 6 are also reported for ease of comparison). Regarding panel A for developed countries, for 10 of the 23 market indices (i.e., Canada, Finland, Great Britain, Greece, Hong Kong, Israel, Japan, Norway, Portugal, and the USA) at least one trading rule with significant outperformance at positive transaction costs exists. However, for these markets, the number of outperforming rules and the maximum level of transaction costs are mostly small. Trading rules with superior performance at transaction costs of more than ten basis points are only found in five markets. The highest single-trip transaction costs of 200 basis points are estimated for one rule in the Japanese stock market. This is followed by the stock indices from Hong Kong and Portugal, both of which are predictable with one rule for costs of up to 40 basis points per transaction. Of the 18 emerging markets, 11 markets (i.e., Bulgaria, Chile, China, Colombia, Croatia, Malaysia, Pakistan, Peru, the Philippines, Russia, and Saudi Arabia) are predictable for positive one-way transaction costs of at least five basis points (see panel B). The Bulgarian, Malaysian, Pakistani, and Peruvian stock market indices have the highest predictability for costs of at least 50 basis points per trade. With 112 outperforming rules at the 500 basis point level and four outperforming rules at the 1000 basis point level, the Bulgarian SOFIX exhibits the highest predictability. With a few exceptions, our results are consistent with previous findings in the literature that technical trading performance is quickly offset by moderate transaction costs (e.g., Allen and Karjalainen 1999; Bajgrowicz and Scaillet 2012; Ready 2002).

Fig. 1
figure 1

Trading costs and the performance of technical trading rules. This figure shows the performance of all 6406 technical trading rules for the USA (panel A) and Peru (panel B) as representatives for the sample of developed countries and emerging markets, respectively. A heuristic’s performance measure is the difference between its Sharpe ratio and the Sharpe ratio of a buy-and-hold benchmark annualized with a factor of \(\sqrt{252}\) [cf., Eq. (3)]. Vertical axes represent the indices of technical heuristics. The five heuristic classes are highlighted with alternating gray and white areas, where RSI are relative strength index rules, FR are filter rules, MA are moving average rules, SR are support and resistance rules, and CB are channel breakout rules, respectively. Black (gray) dots indicate the performance measures of technical trading rules based on single-trip transaction costs of 0 (25) basis points

The impact of transaction costs for the full sample of technical trading rules is also shown in Fig. 1 for the US S&P 500 (panel A) and the Peruvian S&P/PVL Peru General (panel B) as examples for a developed and an emerging market, respectively.Footnote 29 Both panels plot the Sharpe ratio performance measure for all 6406 technical heuristics at transaction costs of 0 and 25 basis points per transaction. Generally, the impact of transaction costs is quite significant for rules that favor very frequent trading (these are often rules with lower parameter values which are mainly shown further to the left for each of the five classes of heuristics). For example, while the performance measures of all moving average rules before transaction costs range within the interval of \([-0.5,0.3]\) in the US market, performance suffers significantly when trading costs are added. With single-trip costs of 25 basis points, excess Sharpe ratios of moving average rules are as low as \(-\)3.5. Despite the meaningful impact of trading costs on performance, the example of the Peruvian market suggests that a large fraction of rules generate excess Sharpe ratios well above zero, which is not the case in the US market. Again, the higher degree of predictability of the Peruvian market could be due to relatively lower market efficiency and generally higher transaction costs required to trade in this market in reality.

In most markets, the estimated maximum transaction costs for a single transaction are small compared with what traders would likely have had to pay to trade these markets during the sample periods. For example, Chan and Lakonishok (1993) estimate single-trip trading costs of 14 basis points incurred by institutional traders for large US stocks between 1986 and 1988. According to Stoll and Whaley (1983), average trading costs were much higher in the preceding decades. Moreover, the emerging markets in our sample are likely to have much higher transaction costs. For the period from 1991 to 2000, Lesmond (2005) estimates variable trade costs of 0.5–1.0% for Malaysia, 0.6–0.9% for Poland (both depending on order volume), 0.8–1.05% for China, 1.5% for the Philippines (both for foreign investors), and 1.0% for Indonesia. Given these figures, our analysis casts doubt on whether markets could have been traded profitably with any of the investigated trading rules.

Table 8 Development of the average number of trades
Table 9 Development of the average holding periods

Next, we investigate the evolution of the average number of trades and average holding periods of outperforming rules at different transaction cost levels. Table 8 presents the relative change in the average number of trades for positive transaction costs.Footnote 30 For each market and transaction cost level, we average the number of trades for all rules with superior performance. To ensure comparability across markets, we normalize the average values for each transaction cost level of a market with the corresponding value when transaction costs are zero. Thus, the table shows changes in the average number of trades relative to the zero-transaction cost scenario (e.g., a value of 0.9 implies a 10% decrease in the average number of trades among all outperforming rules at a given transaction cost level compared with the corresponding average number of trades of all outperforming rules at zero transaction costs). The results indicate that the average number of trades decreases in transaction costs for most developed and emerging markets. Thus, as conjectured above, the share of outperforming rules with frequent trading signals declines for higher transaction costs. Table 9 presents the results for the average holding period conditional on transaction costs. The reported figures are calculated analogously to those presented in Table 8. According to that, in almost all markets, the average holding periods increase monotonically with transaction costs compared to the case of zero transaction costs.

The results from Tables 8 and 9 suggest that the share of outperforming rules with infrequent trading signals and longer average holding periods tends to increase as transaction costs rise. Based on that, one may expect relatively more outperforming rules which mimic a simple buy-and-hold strategy when transaction costs are high. However, this is not the case. Despite the longer average holding periods, we observe a decrease in the average time invested in 17 of the 21 markets for increasing transaction costs. Thus, the outperforming rules tend to exploit very specific price patterns and have longer periods with no exposure to the market when transaction costs are high.

7 Out-of-sample results

In this section, we address two fundamental problems that arise in analyses of technical trading rules over long sample periods. First, using technical trading rules requires active trading, and technical analysis has been shown to be used by amateur and professional investors who deliberately engage in active portfolio management (e.g., Faugère et al. 2013, Lease et al. 1980). Thus, the common convention in the literature of testing trading rules over periods of several decades most often diverges from the short-term oriented trading behavior of individuals who may aim to apply heuristics which they find most valuable at a given point in time. Second, backtests (such as those conducted in this paper so far) only provide ex-post information on whether trading rules could have been traded profitably. However, in-sample outperformance does not imply out-of-sample outperformance, nor does it give any indication of how to select the best rules to trade in the future. So far, we cannot answer the question whether a trader could have exploited the predictability of some of the investigated markets (at least if trading were free or very cheap) by implementing a profitable trading system based on the studied technical rules. We approach this task by performing an out-of-sample persistence analysis of technical trading performance in the following.

To test out-of-sample performance, we mimic the trading activities of a technical trader, similar to Bajgrowicz and Scaillet (2012). For each market, we divide the sample periods into 36 month subperiods and select the trading rules with the best performance in terms of the highest excess Sharpe ratio (we refer to these 36 month intervals in which in-sample performance of technical trading rules is evaluated as “testing periods”).Footnote 31 The best-performing rule during a testing period is used for trading during the subsequent 36 month period (which we refer to as “trading period”). When a trading signal is generated by the selected rule during a trading period, the index is purchased. A potentially open position is automatically closed at the end of a trading period. We apply the double-or-out strategy as in the previous analyses to mitigate true short positions.

Table 10 Persistence of performance

Table 10 presents the results based on single-trip transaction costs of 0, 10, 25, and 50 basis points.Footnote 32 The reported performance measure is the difference between annualized Sharpe ratios for technical trading returns and the corresponding buy-and-hold benchmark returns. In addition to the out-of-sample performance, the table also reports the corresponding in-sample performance for each transaction cost level (i.e., the in-sample results are based on the best rules identified during the testing periods, which are used to evaluate the out-of-sample performance during subsequent trading periods). The statistical significance of the performance is evaluated using a hypothesis test proposed by Bailey and López de Prado (2012), which assesses the null hypothesis that two Sharpe ratios (computed from potentially nonnormal return distributions) are equal.

The analysis shows that the in-sample performance is sizable (with excess Sharpe ratios mostly above 1) and highly statistically significant even at single-trip transaction costs of 50 basis points. This may not be surprising since in-sample performance is by construction the result of extensive data snooping. More importantly, the out-of-sample performance is mostly insignificant or negative and significant. For higher single-trip transaction costs of 25 and 50 basis points, we observe significant out-of-sample underperformance in 14 and 18 markets, respectively, at least at the 10% significance level. The findings imply that best-performing technical trading rules do not generate persistent performance over relatively short time horizons, and that such a trading system is detrimental compared with simply buying and holding the respective market index. This impression is confirmed in panel C of Table 10, which reports results for equally weighted portfolios of developed market indices, emerging market indices, and all market indices, respectively. All in-sample performance measures are positive and statistically significant at the 1% level for the four transaction cost scenarios. In contrast, the out-of-sample performance measures are insignificant for single-trip costs of 0 basis points and turn negative and significant at positive transaction costs in most of the considered cases [e.g., for single-trip costs of 50 basis points, the performance measure equals − 0.315 (significant at the 5% level) for the developed markets portfolio as well as − 0.507 and − 0.385 for the emerging markets and all-markets portfolio (both significant at the 1% level)]. We reestimate the selection algorithm using excess raw returns as an alternative performance measure. The results presented in Table 11 in the appendix confirm our baseline findings obtained with the Sharpe ratio criterion.

Fig. 2
figure 2figure 2

In-sample and out-of-sample performance of technical trading rules. This figure displays the development of the in-sample and out-of-sample performance of technical trading rules for equally weighted portfolios of developed country indices (panel A), emerging market indices (panel B), and all market indices (panel C), respectively. Performance is measured as the returns generated from technical trading in excess of buy-and-hold returns form the corresponding stock index in percent. Solid black (solid gray) lines show the development of in-sample excess returns based on single-trip transaction costs of 0 (25) basis points. Dashed black (dashed gray) lines show the development of out-of-sample excess returns based on single-trip transaction costs of 0 (25) basis points

Panels A–C in Fig. 2 plot the performance of the three portfolios over the entire sample period for single-trip transaction costs of 0 and 25 basis points. Here, performance is measured as the in-sample and out-of-sample excess returns (i.e., the daily differences between the equally weighted returns from technical trading and the equally weighted returns from the buy-and-hold benchmark strategy). All three panels show that the in-sample excess returns are substantial and comparatively unaffected by transaction costs. In contrast, out-of-sample performance is regressive most of the time, and all portfolios generate negative returns over the full sample period, even in the absence of transaction costs.

In summary, the results presented in this section demonstrate that past superior performance of technical trading rules does not persist in the near future. This seriously calls into question whether the studied rules could have been traded at any profit. Based on our simple selection algorithm for trading rules, technical trading is harmful for investment performance across the broad range of considered markets. Our results corroborate those of Bajgrowicz and Scaillet (2012), who also measure very poor out-of-sample performance of simple technical trading rules applied to the DIJA.

8 Conclusions

This paper provides a comprehensive analysis of a broad universe of simple technical trading rules applied to a total of 23 developed market indices and 18 emerging market indices using the Stepwise Superior Predictive Ability Test. The novelty of this test is its ability to identify the whole subset of trading rules with superior performance relative to a buy-and-hold benchmark, while accounting for data snooping bias. To the best of our knowledge, we are the first to apply this powerful statistical test to a large set of trading rules in a comparative analysis of multiple stock markets.

Our in-sample results show that technical trading rules have predictive power in some markets during relatively early periods when transaction costs are ignored. However, in recent years, the investigated technical rules do not have predictive power anymore. To evaluate the impact of transaction costs, we run a stepwise algorithm to determine the number of outperforming trading rules for different transaction cost levels. This analysis reveals a high sensitivity of trading performance to moderate single-trip transaction costs. Moreover, an out-of-sample analysis suggests that the performance of the best technical heuristics is generally not persistent over shorter time periods in the future. These in-sample best trading rules tend to significantly underperform out-of-sample even when transaction costs are low.

The existing literature on the profitability of technical trading rules is relatively comprehensive, but it shows inconclusive results and relies mainly on limited data or outdated statistical tests. We provide extensive empirical evidence that simple technical rules do not achieve data snooping-free outperformance of various stock indices. This is true even for markets that are considered far less information efficient than the extensively studied US stock market. Overall, our results cast doubt on the economic value of technical trading rules that have been found to generate superior performance by several previous studies based on tests with less statistical rigor. The results strongly suggest that the investigated trading signals are noise and that a trading strategy which follows these signals ultimately underperforms the market due to an accumulation of transaction costs.

Our analysis is limited with respect to several dimensions, so the results of this study should not be generalized beyond the scope of the trading rules and data examined in this paper. Clearly, the examined rules represent only a subset of the potentially testable technical trading rules and are among the simplest used in practice. There are several, more subtle technical strategies that are not part of the universe tested in this paper. Moreover, the analysis is limited to daily close prices of national stock market indices and does not provide evidence for more detailed data such as intraday stock prices. The boom in high-frequency algorithmic trading in recent years, driven by increasing computational capabilities, may have created highly specialized investors who generate excess returns through technical trading. More recent empirical evidence related to this subject by Batten et al. (2015) and Gebka et al. (2014) point in this direction. However, the nature, complexity, and trading horizon of these technical approaches drastically differ from the naive heuristics typically employed by individual investors.

Despite the poor performance of simple technical rules, they are still widely used by market participants. Future research should investigate whether there may be other, nonmonetary factors that motivate these investors to apply technical analysis. Potential preference-based reasons for trading technical signals and the behavior of technical traders have hardly been addressed in the finance literature.