1 Introduction

Firstly introduced in the ‘80s by Gerry Bamberger and the quantitative group led by Nunzio Tartaglia's at Morgan Stanley (Bookstaber 2007), the pairs trading strategy has since then became very popular. The idea behind pairs trading strategy is rather simple and consists of identifying pairs of assets whose prices share a common stochastic trend, and of profiting by exploiting any deviation from this long-term relationship, which, by construction, is temporary. More specifically, whenever the prices of these assets diverge from their long-run equilibrium, the relatively overpriced asset is sold and the other asset is bought so that when prices converge again, by unwinding the positions, the profit is realized. Pairs trading is a statistical arbitrage strategy (Bondarenko 2003) because, although a positive payoff may not be guaranteed in each state, its expected value is positive. Moreover, it is expected to be a market-neutral strategy (Ehrman 2006) because its results depend on the relative pricing of the selected assets only, so that the potential profits are independent from market performance.

The first step to take for pairs trading implementation is to identify pairs of assets. Among the different methodologies developed, the one based on cointegration tests has certainly attracted great attention due to its superior results in terms of profitability (Huck and Afawubo 2015; Rad et al. 2016; Blázquez et al. 2018). However, as confirmed e.g. in Huck and Afawubo (2015), its application to large datasets (and ideally to the entire market) comes with a remarkable computational burden. For example, a dataset of 500 assets would require 124.750 cointegration tests to identify all pairs that are potentially eligible for trading. To overcome this drawback, some empirical contributions narrow down the analysis to a subset of assets in the market, previously selected based on some measures of distance or association. If, on the one hand, this allows to reduce considerably the computational burden of carrying out the first step of cointegration-based pairs trading, on the other hand, the effect of this pre-selection of pairs on the final profitability of the strategy is not clear and has so far never been assessed. The first contribution of this paper is thus to fill this gap, by investigating and comparing the profitability of a cointegration-based pairs trading strategy when pairs are pre-selected based on seven different pre-selection measures.

Some of the measures considered for pre-selection have been extensively used in the pairs trading literature, either to pre-select or to identify assets’ couples (rather than via cointegration tests). This is the case for the correlation between the log-prices, the correlation between the returns, the sum of squared deviations, and the price-ratio between the normalized log-prices. These measures are of easy interpretation and fast computation, and hence particularly appreciated by practitioners for whom speed and efficiency of computation is a vital consideration (Clark 2012; Brogaard et al. 2014; Angel 2014). However, they all present some drawbacks. These include: (i) the potential risk of pre-selecting pairs of assets whose prices (or returns) display low volatilities, and (ii) they are not able to detect the common trend between the paired assets. The set of metrics considered for pre-selection is thus augmented with the covariance between the log-prices and the covariance between the returns, which overcome the first weakness, and with the spectral coherence at frequency zero, which specifically addresses the latter issue. These additional measures used for pre-selection represent a novelty in this type of application and thus constitute the second contribution of this paper.

The rest of paper is structured as follows: Sect. 2 briefly summarizes the empirical contributions dealing with distance-based and cointegration-based pairs trading; Sect. 3 presents the dataset and the methodology used, illustrating the pre-selection measures considered, the trading strategy, and its final evaluation; Sect. 4 presents the main results, while Sect. 5 extends the analysis by investigating the risk-profiles of the excess returns and tests the robustness of the main results. Finally, last Section concludes.

2 Literature

Pairs trading strategies have been implemented using different approaches, which Krauss (2017) broadly classifies into the following categories: (i) distance approach; (ii) cointegration approach; (iii) time series (or stochastic spread) approach; and (iv) a residual category, gathering all the applications not belonging to one of the above. All the approaches require to first identify pairs of assets (during the so-called formation period) and then to implement an investment strategy (during the so-called trading period). The different approaches mainly differ for how pairs are selected during the formation period. For instance, in the distance approach pairs of assets are identified using nonparametric distance metrics, while in the cointegration approach Engle and Granger (1987) and/or Johansen (1988) cointegration tests are employed to spot a long-term equilibrium between asset prices time series.Footnote 1

Gatev et al. (2006) are among the first applying pairs trading using the distance approach. As the authors outline, the implementation of such strategy is structured in two periods: (1) in the formation period, pairs are selected by minimizing the sum of squared deviations between their normalized pricesFootnote 2; and (2) in the trading period, a position is opened whenever the distance between the normalized prices diverges more than a given threshold. The investment strategy consists in selling one dollar of the relatively overpriced asset and buying one dollar of the underpriced asset, thereby ensuring that the strategy is self-financing. The position is then closed if and when the normalized prices difference reaches zero, or at the end of the trading period. Since the strategy is self-financing, all the final payoffs can be interpreted as excess returns.

In an analysis on the liquid US stocks in the CRSP, over the period 1962 to 2002, Gatev et al. (2006) investigate the profitability of the distance-based pairs trading strategy and prove that it produces significant excess returns, which also survive to the inclusion of trading costs. The very same approach is implemented in many other empirical contributions, including e.g. Do and Faff (2010) and Huck (2013). The former show that the profitability found by Gatev et al. (2006) reduces if the analysis is extended to 2009, while the latter investigates the sensitivity of the profitability found by Gatev et al. (2006) under different parameterizations in terms of the length of the formation period and of the opening threshold. Other contributions rely on pairs trading implemented with the distance approach to investigate the potential sources of profitability of this strategy. Examples include Andrade et al. (2005), who prove the relevance of uninformed demand shocks, or Papadakis and Wysocki (2007) and Engelberg et al. (2009), both focusing on the impact of news disclosure or information events on the strategy performance. Jacobs and Weber (2016) explore the implications of time-varying awareness of firm-level information on asset pricing, proving that pairs opening in high distraction days are more likely to generate higher returns. Regardless of the aim of the study, all these contributions identify pairs by minimizing the sum of squared deviations. The only one who, to the best of our knowledge, uses a different metrics is Chen et al. (2017). They use the Pearson correlation coefficient between returns, in univariate and quasi-multivariate settings, and select, for each asset, the 50 most correlated stocks to form an equally weighted portfolio. As reported by Krauss (2017), their empirical application, conducted on CRSP data between 1962 and 2002, shows that the quasi-multivariate setting is more profitable with respect to the univariate case, and that the profits of the latter are slightly lower if compared to the Gatev et al. (2006) strategy.

As Do et al. (2006) highlight, identifying pairs of assets by minimizing the sum of squared deviations between their prices is proven profitable in different markets, is easy to implement and is model-free. However, as suggested by Krauss (2017), this approach appears to be suboptimal because it can lead to the selection of pairs with a low level of variance in their price distance, thus, de facto, reducing the number of potential trade opportunities. Moreover, this approach is not able to capture the long-run equilibrium relationship between prices and may not guarantee the mean reversion of their differences—a limitation somewhat overcome using the cointegration approach.

In the cointegration-based pairs trading (Vidyamurthy 2004), pairs are identified based on cointegration tests, so as to exploit the mean-reversion property of a stationary process. The underlying idea is that if the prices series of two stocks are cointegrated, there exists a stationary linear combination of the two series that is mean-reverting, meaning that any short-term deviation from this equilibrium is temporary by construction. Similar to the distance approach, the implementation of cointegration-based pairs trading requires two stages: (1) during the formation period, the pairs of assets whose prices are cointegrated are selected; (2) during the trading period, a self-financing strategy is implemented on the identified pairs, buying 1$ of the relatively low-priced asset and selling 1$ of the relatively over-priced asset every time the stationary linear combination of the two series, named Spread, sufficiently deviates from its long-run mean.

The first empirical applications of cointegration-based pairs trading focused on commodities futures and spot prices. For instance, Wahab and Cohn (1994) applied it to gold and silver cash and future prices and Girma and Paulson (1999) on the crack spread, which is the prices’ difference between petroleum and refined products futures. Similarly, Simon (1999) focused on the crush spread, that is the soybean futures and its end products prices’ difference, and Emery and Liu (2002) on the spark spread, i.e. difference between natural gas and electricity futures prices. The same approach has been more recently used e.g. by Gutierrez and Tse (2011), who use CSRP data on three water utility stocks and prove that most of the pairs trading profits are obtained from the Granger-follower position. Applications to the stock market include the contributions by Dunis et al. (2010), who use (daily and intra-daily) data of EuroStoxx 50 index constituents, and by Caldeira and Moura (2013), who apply it to the 50 most liquid stocks of the Brazilian Ibovesoa index. Hence, the application of this approach to larger datasets (and ideally to the entire market) is actually scant. Indeed, the high computational cost of the cointegration-based pairs trading makes its application to large datasets very difficult and explains the typical focus in the empirical literature on small sets of assets.

Some recent contributions have attempted to reduce the computational burden entailed by cointegration tests by pre-selecting assets before testing for cointegration. The only examples in this direction we are aware of are Miao (2014), Huck and Afawubo (2015) and Rad et al. (2016). Miao (2014) proposes to rank pairs of stocks based on the prices’ Pearson correlation coefficient, and to test for cointegration only those with correlation at least equal to 0.9. In this way, despite the empirical application uses data on 177 energy companies stocks traded in NTSE and NASDAQ markets, the actual number of cointegration tests required to implement the pairs trading is sensibly reduced, from 15,576 potential pairs to (an average of) 1,378 actually tested pairs. Of those, the first 10 pairs with smallest residuals ADF test statistic are considered eligible for trading, and their final performance is evaluated using the Sharpe ratio. The empirical application in Huck and Afawubo (2015) relies on a sub-sample of the S&P 500 index constituents. Among the 500 stocks, only the pairs of assets whose returns differ no more than 10% are included in the sample and then tested for cointegration. This allows to sensibly reduce the actual number of pairs tested since approximately 80% of the pairs are dropped before testing for cointegration. Finally, Rad et al. (2016) apply a cointegration-based pairs trading to a large dataset composed by 23.616 stocks in CRSP, from 1962 to 2014. In the empirical work, pairs are first sorted based on the sum of squared deviations between prices and, then, cointegration tests are performed until 20 cointegrated pairs are identified.

To be noticed that each of the above cited contribution uses a different pre-selection measure. However, no evidence has so far been provided on the differences, if any, in terms of final profitability and risk-exposure of the cointegration-based pairs trading strategy when assets are pre-selected with different metrics. The first contribution of this paper is thus to fill this gap. Besides, we provide evidence related to three supplementary metrics that, to the best of our knowledge, have never been used in this type of applications, but whose characteristics might help to overcome some of the drawbacks entailed by the measures used so far in the literature. In doing so, we will rely on data referred to the US stock market, as detailed in the following Section.

3 Data and methodology

The empirical analysis relies on the dividend adjusted daily closing prices of the S&P 500 index constituents, which are extremely liquid assets, characterized by high market capitalization, and relatively low transactions costs.The data, retrieved from Thomson Reuters DataStream, cover the period from 1st January 1998 until 30th October 2018, and include all the stocks belonging to the S&P 500 on the last day of our sample (since some stocks are not included in the index all along the sample, the total number of stocks varies between 373 and 505).

Based on this dataset, our empirical application proceeds as follows:

  1. 1.

    Pairs pre-selection: we consider a one-year formation period during which we order pairs of stocks according to seven different metrics, as described in Sect. 3.1;

  2. 2.

    Cointegration-based pairs identification: using the formation period data and following the ranking as from step 1, we run the cointegration tests required to find the first 20 cointegrated pairs of stocks. This allows us to estimate the parameters and assess the stationarity of the cointegration relationship, both required for the subsequent implementation of the trading strategy. This step is described in greater detail in Sect. 3.2;

  3. 3.

    Pairs trading: using data from a six-month trading period, we implement the trading strategy described in Sect. 3.3;

  4. 4.

    Profits evaluation: we compute the monthly excess profits on the six-months trading period and repeat this procedure every month in a rolling window setting. We then assess the profitability of the pairs trading strategy, as described in Sect. 3.4.

3.1 Pre-selection measures

We consider seven different measures to pre-select assets. The first four have been extensively used in the pairs trading literature, either to pre-select or to identify assets to be traded. The last three, which conversely represent a novelty in this type of application, are examined in the light of their features, that are potentially able to overcome some of the drawbacks typical of the first set of measures considered. The importance of considering such a disparate set of metrics to pre-select pairs is confirmed by the results reported in Sect. 3.2 below.

The time-varying volatility characterizing the returns time-series might represent an issue at this stage. In order to overcome this potential drawback, pairs pre-selection is performed on the returns (and the associated log-prices) only after having “cleaned out” their heteroskedasticity, modelled through an exponentially weighted moving average.Footnote 3 Specifically, the “homoskedastic” returns, indicated in what follows with \({}_{h}{r}_{t}\), are obtained as:

$${}_{h}{r}_{t}={r}_{t}/{\widehat{\sigma }}_{t}$$

with \(t=1,\dots ,T\), where T is the number of trading days comprising the formation period, and \({\widehat{\sigma }}_{t}\) is the standard deviation of the raw log-returns \({r}_{t}\). In turn, \({\widehat{\sigma }}_{t}\) is modelled as:

$${\widehat{\sigma }}_{t}=\sqrt{k {\widehat{\sigma }}_{t-1}^{2}+(1-k) {\widehat{u}}_{t}^{2}}$$

where \({\widehat{u}}_{t}={r}_{t}-\overline{r }\) is the demeaned return at time t, the parameter \(k\) is comprised between 0 and 1, and \({\widehat{\sigma }}_{0}^{2}\) is a required initial condition.Footnote 4 The “homoskedastic” log-prices are derived cumulating the above defined \({}_{h}{r}_{t}\).

The first measure considered for pre-selection is by far the most widely applied in the pairs trading literature. The Sum of Squared Deviations between the normalized “homoskedastic” log-prices was firstly proposed as a selection criterion by Gatev et al. (2006) and then employed for pre-selection of pairs by Rad et al. (2016). It is computed as:

$$\widehat{SSD}=\sum_{t=1}^{T}{\left({}_{h}{\widetilde{p}}_{1,t}-{}_{h}{\widetilde{p}}_{2,t}\right)}^{2}$$
(1)

where \({}_{h}{\widetilde{p}}_{1,t}\) and \({}_{h}{\widetilde{p}}_{2,t}\) are the normalized “homoskedastic” log-prices of stock 1 and 2 on day \(t\), respectively, i.e. \({}_{h}{\widetilde{p}}_{1,t}={}_{h}{p}_{1,t}/{}_{h}{p}_{1,1}\) and \({}_{h}{\widetilde{p}}_{2,t}={}_{h}{p}_{2,t}/{}_{h}{p}_{2,1}\).

The second measure considered is the Price Ratio, that is:

$$\widehat{PR}=\frac{1}{T}\sum_{t=1}^{T}\frac{{}_{h}{\widetilde{p}}_{1,t}}{{}_{h}{\widetilde{p}}_{2,t}}$$
(2)

where \({}_{h}{\widetilde{p}}_{1,t}\) and \({}_{h}{\widetilde{p}}_{2,t}\) are the above-defined normalized “homoskedastic” log-prices of stock 1 and 2 on day \(t\), respectively. This measure was first proposed for pairs identification by Baronyan et al. (2010), who use the test of price ratio stationarity as an alternative to the test for cointegration.

The third measure considered is the absolute value of the Pearson correlation between the “homoskedastic” log-prices time series, employed in Miao (2014), that is:

$${\widehat{\rho }}^{p}=\left|\frac{\sum_{t=1}^{T}\left[{}_{h}{p}_{1,t}-{}_{h}{\overline{p} }_{1}\right]\left[{}_{h}{p}_{2,t}-{}_{h}{\overline{p} }_{2}\right]}{\sqrt{{\sum }_{t=1}^{T}{\left[{}_{h}{p}_{1,t}-{}_{h}{\overline{p} }_{1}\right]}^{2}{\sum }_{t=1}^{T}{\left[{}_{h}{p}_{2,t}-{}_{h}{\overline{p} }_{2}\right]}^{2}}}\right|$$
(3)

where \({}_{h}{p}_{1,t}\) and \({}_{h}{p}_{2,t}\) are the “homoskedastic” log-prices of stock 1 and 2 on day \(t\), \({}_{h}{\overline{p} }_{1}\) and \({}_{h}{\overline{p} }_{2}\) are their corresponding sample means over the formation period. The impact of using this measure to pre-select assets on the final profitability of pairs trading is not clear a priori. On the one hand, it is true that this measure is not directly linked to cointegration, as high correlation might be observed even when cointegration is absent.Footnote 5 On the other hand, Miao (2014) argues that coupling cointegration-based pairs trading with pre-selection based on correlation might be beneficial to trading as they provide different and potentially complementary information. Indeed, while correlation captures co-movements which may be unstable and vary over time, cointegration measures long-term co-movements, being there even through sub-periods where correlation appears low.

The fourth measure used to pre-select pairs of stocks is the absolute value of the Pearson correlation coefficient between the “homoskedastic” returns, used as criterion of pairs formation by Chen et al. (2017), that is:

$${\widehat{\rho }}^{r}=\left|\frac{\sum_{t=1}^{T}\left({}_{h}{r}_{1,t}-{}_{h}{\overline{r} }_{1}\right)\left({}_{h}{r}_{2,t}-{}_{h}{\overline{r} }_{2}\right)}{\sqrt{{\sum }_{t=1}^{T}{\left({}_{h}{r}_{1,t}-{}_{h}{\overline{r} }_{1}\right)}^{2}{\sum }_{t=1}^{T}{\left({}_{h}{r}_{2,t}-{}_{h}{\overline{r} }_{2}\right)}^{2}}}\right|$$
(4)

where \({}_{h}{r}_{1,t}\) and \({}_{h}{r}_{2,t}\) are the “homoskedastic” returns on day \(t\) of stock 1 and 2, respectively obtained as difference of the stock “homoskedastic” log-prices (i.e., \({}_{h}{r}_{1,t}={}_{h}{p}_{1,t}-{}_{h}{p}_{1,t-1}\) and \({}_{h}{r}_{2,t}={}_{h}{p}_{2,t}-{}_{h}{p}_{2,t-1}\)), \({}_{h}{\overline{r} }_{1}\) and \({}_{h}{\overline{r} }_{2}\) are their corresponding sample means over the formation period.

Both \({\widehat{\rho }}_{\mathrm{1,2}}^{P}\) and \({\widehat{\rho }}_{\mathrm{1,2}}^{r},\) i.e. the correlation between “homoskedastic” log-prices and returns, tend to their maximum value as the standard deviation of the underlying series tend to their minimum. This means that selecting pairs by maximizing the absolute value of the correlation may result in the selection of stocks whose prices or returns display low volatilities. In order to overcome this drawback, as our fifth and sixth measures of pre-selection we consider the corresponding covariances between both “homoskedastic” log-prices and returns, that is:

$${\widehat{COV}}^{p}=\sum_{t=1}^{T}\left[{}_{h}{p}_{1,t}-{}_{h}{\overline{p} }_{1}\right]\left[{}_{h}{p}_{2,t}-{}_{h}{\overline{p} }_{2}\right]$$
(5)
$${\widehat{COV}}^{r}=\sum_{t=1}^{T}\left({}_{h}{r}_{1,t}-{}_{h}{\overline{r} }_{1}\right)\left({}_{h}{r}_{2,t}-{}_{h}{\overline{r} }_{2}\right)$$
(6)

All the measures considered so far are intuitive and computationally non-demanding, making them particularly suitable for practitioners undertaking high-frequency trading, where speed and computational efficiency are pivotal (Clark 2012; Brogaard et al. 2014; Angel 2014). However, they are not necessarily connected with the existence of a common trend between the paired assets. The seventh measure we consider aims to overcome this concern. The magnitude-squared coherence is a signal processing tool that indicates how well two signals match at each frequency and its estimate is a function with values between 0 and 1. It measures the linear dependence in the spectral decomposition of \(\Delta {}_{h}{p}_{1,t}\) and \(\Delta {}_{h}{p}_{2,t}\) by computing \({\widehat{C}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{2,t}}(f)\) values at different frequencies \(f\) as:

$${\widehat{C}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{2,t}}(f)=\frac{{\left|{\widehat{S}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{2,t}}(f)\right|}^{2}}{{\widehat{S}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{1,t}}\left(f\right){\widehat{S}}_{\Delta {}_{h}{p}_{2,t}\Delta {}_{h}{p}_{2,t}}\left(f\right)}$$
(7)

where \({\widehat{S}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{2,t}}(f)\) is the cross-power spectral density of \(\Delta {}_{h}{p}_{1,t}\) and \(\Delta {}_{h}{p}_{2,t}\), and \({\widehat{S}}_{\Delta {}_{h}{p}_{1,t}\Delta {}_{h}{p}_{1,t}}\left(f\right)\) and \({\widehat{S}}_{\Delta {}_{h}{p}_{2,t}\Delta {}_{h}{p}_{2,t}}\left(f\right)\) are the related power spectral densities.Footnote 6 In particular, we consider the magnitude-squared coherence at frequency zero, \({\widehat{C}}_{\mathrm{\Delta p}}(0)\) which is equal to 1 if the two series are cointegrated (Cubadda 1994; Levy 2002).

Each metrics is computed monthly using data from the previous one-year formation period. Stock pairs are then sorted in ascending order according to SSD, by PR closest to one, and in descending order according to measures (3) to (7). This first step thus closes with the pairs ranked on the basis of the metrics illustrated.

3.2 Cointegration-based pairs identification

In the second step, the ranked pre-selected pairs are progressively tested for cointegration until 20 pairs of stocks whose prices are actually cointegrated are found. Those “top 20” pairs are thus eligible for trading during the following six months.Footnote 7

The most widely used procedure in the empirical literature on cointegration-based pairs trading is the two-step approach proposed by Engle and Granger (1987). Along with its simplicity, which makes it highly appreciated by practitioners, this procedure comes with several limitations, which include the impossibility to retrieve more than one cointegration relationship as well as the sensitivity of the results to the choice of the asset used as dependent variable. While the former is not an issue in pairs trading, as dealing with two assets at a time entails that at most one cointegration vector can be found, the second limitation might indeed be a serious issue. We thus opt for the Johansen (1988) procedure, where the test for the existence of cointegration and the estimates of the coefficient rely on a vector autoregression (VAR) model.Footnote 8 More specifically, consider the following VAR of order \(q\):

$${Y}_{t}=\mu +{A}_{1}{Y}_{t-1}+\dots +{A}_{q}{Y}_{t-q}+{\varepsilon }_{t}$$
(8)

where \({Y}_{t}\) is the \(T\times 2\) vector of the non-stationary, or \(I\left(1\right)\), time-series of the log-prices of the two stocks, denoted with \({p}_{1,t}\) and \({p}_{2,t}\), \({\varepsilon }_{t}\) is the \(T\times 1\) vector of innovations, and \(q\), the lag order, is defined optimally through the Bayesian Information Criterion (BIC). This VAR can be rewritten as:

$$\Delta {Y}_{t}=\mu +\Pi {Y}_{t-1}+\sum_{i=0}^{q-1}{\Gamma }_{i}{\Delta Y}_{t-i}+{\varepsilon }_{t}$$
(9)

with

$$\Pi =\sum_{i=0}^{q}{A}_{i}-I \quad {\text{and}} \quad {\Gamma }_{i} = -\sum_{j=i+1}^{q}{A}_{j}$$
(10)

The matrix \(\Pi\) can be then expressed in terms of the loadings vector \(\alpha\) and the cointegration vector \(\mathrm{\rm B}\) as:

$$\Pi = \alpha {\rm B}^{\prime}$$
(11)

The Johansen test for cointegration amounts to test for the rank of matrix \(\Pi\), by testing whether (and how many of) its two eigenvalues are significantly different from zero. If they are both different from (or equal to) 0, then \(\Pi\) has full (zero) rank and there is no cointegration between the series. If instead \(\Pi\) has rank equal to 1, i.e. only one eigenvalue is statistically different from zero, then \({\mathrm{\rm B}}^{^{\prime}}{Y}_{t}\) is stationary, implying that the time-series of the log-prices are cointegrated with cointegration vector \(\mathrm{\rm B}\), and the two assets are thus considered eligible for trading.Footnote 9

This concludes the second step of the procedure, which—as the first one—is repeated monthly in a rolling-window setting using data from the previous one-year formation period. This length for the formation period is almost a standard in the relevant literature, as it allows consistent estimates of the cointegration vector and takes up the potential calendar effects, pretty pronounced on the stock market (see e.g. Sharma and Narayan 2014, McConnell and Xu 2008, and Kunkel et al. 2003 and references therein).

We are now able to evaluate the consequences of ordering and pre-selecting pairs of assets before testing for cointegration in a pairs trading setting. The fewer the number of tests needed, the less time will be necessary to select pairs. As reported in the top panel of Table 1, pre-selecting assets allows to substantially reduce the total number of cointegration tests actually required to find the top 20 pairs of assets eligible for trading. The computational gain is thus remarkable, and this applies to all the pre-selection measures considered. On the other hand, the top 20 pairs eventually selected for trading are in most of the cases different depending on the pre-selection measure used (see the bottom panel of Table 1). The maximum degree of overlap is indeed found between returns correlation and returns covariance, where the pre-selected pairs coincide in the 41% of the cases. For all the other combinations, the pairs selected overlap by at most 13%, with several cases in which the overlap is negligible or even non-existent. This confirms the potentially crucial impact of the pre-selection metrics used on the pairs that are actually traded and, hence, on the final profitability of the pairs trading strategy.

Table 1 Tests required to spot the top 20 pairs and share of overlapping top 20 pairs, by pre-selection measure

3.3 Pairs trading strategy implementation

Next, we move to the following six-month period, the so-called trading period, as it is where we implement the pairs trading strategy. As argued in Huck (2013), a six-month length represents a good choice as it ensures the trades to be performed on pairs selected based on the most recent information, on the one hand, and allows the reversion of the spread, and hence the trades to close “naturally”, on the other. Thus, using data on the six-month following the trading period, we implement the self-financing pairs trading strategy whenever pricing anomalies are signaled by deviations from the long-run equilibrium relationship.

In order to do so, we compute the Spread between log-prices, which is stationary and mean-reverting, as:

$$Sprea{d}_{t}={p}_{1,t}-\left(\widehat{\mu }+\widehat{\beta }{p}_{2,t}\right)$$
(12)

using the \(\widehat{\mu }\) and \(\widehat{\beta }\) estimates obtained in the first step of the procedure.Footnote 10 Any significant deviation of the Spread from its historical mean (equal to zero by construction) is interpreted as a mispricing and thus signals a trading opportunity. Therefore, we can then define a trading rule that will trigger a trade whenever the following relationship is violated:

$$-2\widehat{\sigma } \le Sprea{d}_{t}\le 2\widehat{\sigma }$$
(13)

where \(\widehat{\sigma }\) is the historical standard deviation of the Spread (computed during the formation period).

In more detail, if \(Sprea{d}_{t}>2\widehat{\sigma }\) the first stock is suspected to be relatively overpriced with respect to the second. Then, the self-financing strategy consists in selling (going short on) 1$ of the first stock and buying (going long on) 1$ of the second stock. On the other hand, if \(Sprea{d}_{t}<-2\widehat{\sigma }\) the first stock is suspected to be underpriced relatively to the second stock. Therefore, the trading strategy requires to sell 1$ of the second stock and to buy 1$ of the first stock. Notice that the strategy is self-financing, since it requires no initial capital, and consequently all payoffs can be interpreted as excess returns. The position is unwound when the long-term equilibrium is re-established, that is, when the Spread returns to within the estimated boundariesFootnote 11 (or at the end of the trading period). An example of when a trading opportunity arises (green arrow) and when the position is closed (red arrow) is provided in Fig. 1.

Fig. 1
figure 1

- Example of trading strategy triggering signals. The top panel plots the log-prices time series of two cointegrated stocks, during the trading period going from 9th June to 9th December 1999. The bottom panel plots the standardized Spread along with its \(\pm 2\widehat{\sigma }\) boundaries. The green arrow spots the opening of the trading position, while the red one spots the closing of the same trade

This whole procedure, from step 1 to step 3, is then repeated in a rolling window fashion, by shifting the formation and trading periods by one month. As a result, every month (starting from the 6th in the sample) six overlapping portfolios of 20 pairs are generated (Fig. 2). This approach mimics the payoffs that a proprietary trading desk would get delegating the management of these portfolios to six different traders whose formation and trading periods are staggered by one month (see, e.g. Gatev et al. 2006 and Huck and Afawubo 2015).

Fig. 2
figure 2

- Rolling scheme representation. Graphical representation of the rolling scheme employed in the pairs trading methodology. The blue arrows represent the length of the formation periods, the brown arrows represent the length of the trading periods and the green areas represent the overlapping months across different trading periods. The time indicator “t” refers to the t-th month (composed of 22 trading days)

Our dataset comprises 235 trading months.Footnote 12 Excluding the first and the last 5 months, which by construction do not include the full set of six overlapping portfolios, leaves us with 225 trading months, which are the ones used to evaluate the performance of the strategy.

3.4 Performance evaluation

The profitability of the pairs trading strategy is evaluated in terms of excess returns (profits), Sharpe ratio and percentage of positive monthly excess returns.

The value-weighted mark-to-market daily excess return of the portfolio constituted by the 20 traded pairs is computed as:

$${R}_{P,t}=\frac{{\sum }_{i=1}^{20}{w}_{i,t}{R}_{i,t}}{{\sum }_{i=1}^{20}{w}_{i,t}}$$
(14)

where:

  • \({w}_{i,t}\) is the weight associated to each pair \(i\). It is equal to 1 whenever a new position is opened on the pair \(i\) and, for each subsequent period, is computed as: \({w}_{i,t}={w}_{i,t-1}\left(1+{R}_{i,t-1}\right)=\left(1+{R}_{i,1}\right)\dots \left(1+{R}_{i,t-1}\right)\)

  • \({R}_{i,t}\) is the daily mark-to-market excess return obtained from trading pair \(i\), computed as:

    $${R}_{i,t}=\sum_{j=1}^{2}{I}_{j,t}{\omega }_{j,t}{R}_{j,t}$$
    (15)

    with:

  • \({I}_{j,t}\) denoting the position on stock \(j\) in day \(t\). It is set equal to 1 if a long position is open, -1 if a short position is opened, and 0 otherwise

  • \({R}_{j,t}\) being the daily return of stock \(j\) in day \(t\)

  • \({\omega }_{j,t}\) representing the weight associated to stock \(j\) in day \(t\), computed as \({\omega }_{j,t}={\omega }_{j,t-1}\left(1+{R}_{j,t-1}\right)=\left(1+{R}_{j,1}\right)\dots \left(1+{R}_{j,t-1}\right)\)

The daily portfolio excess returns are then compounded to obtain monthly excess returns, which, in turn, are averaged across the six overlapping portfolios so as to generate a single summarizing monthly measure of profitability.

Finally, the null that the average monthly excess returns are not significantly positive is tested using Newey-West (1987) heteroskedasticity and autocorrelation robust standard errors, using six lags. Following Huck and Afawubo (2015), we handle the potential data-snooping issue deriving from testing multiple strategies on the same data set by using the Hansen (2005) test for Superior Predictive Ability, which takes into account the dependence between the statistics derived from the different implementations.

Besides the excess returns, we also evaluate the profitability of the pairs trading strategy based on the Sharpe ratio and on the percentage of positive monthly excess returns. The first one accounts also for the volatility of profits, as it rescales the average monthly excess return by the monthly excess returns’ standard deviation. The second is particularly suitable considered that pairs trading is a statistical—rather than a pure—arbitrage strategy, which thus allows negative returns. An account of the frequency of positive monthly excess returns is thus certainly relevant.

Transaction costs might have a critical impact on the profitability of a trading strategy and have thus to be carefully evaluated. The most well-known components of direct trading costs include commissions, short selling fees and bid and ask spread. In Gatev, et al. (2006) the bid-ask spread is handled by waiting one day after the divergence (convergence) to open (close) a position. Yet, the data used for the empirical application in this paper refer to assets with an extremely high level of liquidity, which are generally traded over a relatively short period, and have high dollar value and high market capitalization. Hence, bid and ask spread is likely not to be an issue. The same applies to short-selling, as reported in D’Avolio’s (2002) and referred to also in Do and Faff (2012). Based on this claims, in this study the final profits from pairs trading are not weighted up with short-selling fees and bid-ask spread. On the other hand, commissions are likely to be particularly influential for pairs trading, where two roundtrip transactions are involved. Following Do and Faff (2012), we take commissions estimates from reports of the Investment Technology Group (ITG), a brokerage firm specialized in trade execution. The values of the commissions used are reported, in basis points, in Table 2.Footnote 13

Table 2 Commissions' estimates from the quarterly reports of the ITG

One of the limitations posed by including commissions is that the trading strategy can no longer be considered self-financing. Subtracting the commissions amount from the cash flows generated by the position opening, would indeed require an initial capital equal to the total commissions for the two stocks. To overcome this limitation and to obtain results that can still be interpreted as excess returns, we adapt the amounts of each stock that are bought or sold. More specifically, when opening a position the amount (1-commission)$ is bought and the amount (1 + commission)$ is sold, so that the total initial cash flow remains zero and the strategy remains self-financing. When closing the position, the commissions paid are included by adjusting the daily excess returns for the pairs, as follows:

$${R}_{i,t}=\sum_{j=1}^{2}\left({I}_{j,t}{\omega }_{j,t}{R}_{j,t}-c{\omega }_{j,t}\left(1+{R}_{j,t}\right)\right)$$
(16)

where \(c\) is the amount of the commission as a percentage.

Besides, following Caldeira and Moura (2013), we complete our trading strategy with a stop-loss rule, which prescribes to close a position whenever the realized excess return of the operation reaches -7%, and a “duration rule”, which specifies to forcibly close a position after 50 trading days. These cut rules handle the risks of extreme losses, as well as the possibility that the Spread does not revert to its equilibrium. While academic research might disregard these aspects, practitioners are strongly aimed at preventing extremely negative results and loosing time value.

In this work, we thus evaluate the profitability both with and without taking transaction commissions and cut rules into account.Footnote 14

4 Main results

The top panel of Table 3 reports the descriptive statistics of the monthly excess returns obtained from the pairs trading strategy when neither commissions nor cut rules are considered. The profitability of the strategy remarkably differs depending on the pre-selection metrics used. When pre-selection is carried out by means of the log-prices covariance or by the coherence metrics, the pairs trading strategy is found to produce non-significant average excess returns. All other pre-selection metrics instead lead to significant profitability. Moreover, the average excess returns display quite a large degree of variability also in terms of level: the (significant) 0.73% obtained when pairs are pre-selected based on the log-prices correlation, is more than 4 times the (non-significant) 0.19% average net profit obtained carrying out the very same trading strategy but using the coherence metrics to pre-select instead.

Table 3 Pairs trading monthly excess returns statistics, by pre-selection metrics

The picture does not change if the median—rather than the mean—excess return is considered: the highest median excess return, obtained pre-selecting pairs based on correlation between log-prices (0.58%), is almost 6 times the lowest one (− 0.15%), obtained using the covariance between returns as pre-selection measure.

The profitability observed for the different pre-selection measures is highly heterogeneous also in terms of variability, with SSD and covariance being at the extremes. In fact, the excess returns obtained with pairs pre-selected using SSD have a standard deviation (range) equal to 0.0235 (0.15), much smaller than the one observed when pre-selection occurs based on covariance of log-prices, equal to 0.1082 (1.27).

Not surprisingly, the profitability per unit of risk, as measured by the Sharpe Ratio, is not uniform across the different pre-selection metrics considered: the Sharpe Ratio is around 0.06 when pre-selection is based on the covariance of log-prices or on spectral coherence, it almost doubles when returns correlation or SSD are used to pre-select, and it further increases, reaching as much as 0.25, when pre-selection is done via PR.

The effect of using different pre-selection measures is also evident looking at the share of positive monthly excess returns. When pairs are pre-selected using the log-prices covariance and coherence this share is less than half, while it increases to around 54% if pre-selection uses log-prices correlation or Price Ratio and to 56.17% when pairs are pre-selected based on returns correlation.

To sum up, we find that when the same cointegration-based pairs trading strategy is implemented pre-selecting pairs with different metrics, the final profitability obtained is highly heterogenous in terms of average, variability and significance. We thus conclude that pre-selection matters, as it might lead to remarkably different final excess returns.

The impact of pre-selecting is even more apparent when the profitability is evaluated taking commissions and cut rules into account (see Panel B, Table 3). Indeed, the average monthly excess returns are not statistically significant in 4 cases out of 7 (namely when pre-selection occurs via SSD, returns correlation, covariance of log-prices and coherence). Besides, when statistically significant, the excess returns appear to be remarkably different, ranging from 0.31%, obtained when pairs are pre-selected based on the returns covariance, to 1.08%, as much as 3 times more, when pre-selection is performed using the log-prices correlation. Additionally, the Sharpe Ratios vary by a factor of 25 among all the metrics, ranging from 0.01, when pre-selection is based on SSD, to as much as 0.25, when pairs are pre-selected via log-prices correlation. The impact of pre-selection is also confirmed looking at the frequency of positive monthly excess returns: correlation between log-prices and PR guarantee the highest frequencies (51.49% and 52.34%, respectively), while all others pre-selection metrics reduce this chance around or well below 50%.

Finally, a comparison between the top and bottom panels of Table 3 allows to assess the differential impact of commissions and cut rules across the different pre-selection metrics. Recall that the commissions and the cut rules have opposite effects on profitability, so that the overall final effect might be either positive, null or negative, depending on which one prevails, if any. Again, we find disparate results across the different metrics considered: e.g., for SSD the introduction of commissions and cut rules translates into remarkably lower average excess returns, for log-prices correlation it leads to better results, while for others, such as coherence, the overall impact is found to be negligible. By the same token, the rankings based e.g. on Sharpe Ratio or frequency of positive excess returns—before and after commissions and cut rules are taken into account—change for some pre-selection metrics more than others, further confirming that the pre-selection measure used have an impact also in this direction.

All in all, the evidence reported proves that pre-selection matters, since it strongly impacts the final profitability of the pairs trading strategy. Remarkably, none among the three additional pre-selection investigated seem to produce systematically better results compared to ones traditionally considered in the pairs trading literature.

5 Further results

In this section, we first investigate whether the monthly excess returns observed represent a compensation for traditional risk factors. Then, we check the robustness of our main conclusions to a stricter definition of the Spread reversion to the equilibrium, to the inclusion of market impact on the final evaluation of the strategy profitability, and to the use of Engle and Granger (1987) procedure to assess cointegration.

5.1 Excess returns and risk factors

We investigate if and how the pairs trading profitability obtained applying each of the considered pre-selection metrics correlates with the systematic stock-market risk factors conventionally acknowledged in the asset pricing literature. To this end, we regress the monthly excess returns (after commision and cut rules are included) on the following factorsFootnote 15:

  1. 1.

    Market excess return (MKT): difference between the market (S&P500) and 30-day Treasury bill returns. This factor is key to assess market-neutrality: to the extent that pairs trading is a market-neutral strategy, the correlation between its profitability and the market excess returns is expected to be small, if any.

  2. 2.

    Size factor (SMB): difference between small and big stock portfolios;

  3. 3.

    Book-to-market factor (HML): difference between value and growth stock portfolios;

These first three are the stock-market risk factors established in the seminal paper by Fama and French (1993).

  1. 4.

    Investment factor (CMA): difference between conservative and aggressive portfolios;

  2. 5.

    Profitability factor (RMW): difference between robust and weak profitability portfolios.

This two additional factors lead to the five-factor model proposed in Fama and French (2015). Since pairs trading is a contrarian strategy, we expect a mild correlation, if any, of its profitability to these factors.

  1. 6.

    Short-term reversal factor (STR): difference between last month winner and loser portfolios. Recalling that trading on assets which performed particularly well in the past month has been proven to lead to significant abnormal returns (see, e.g. Jegadeesh 1990, Lehmann 1990, or Jegadeesh and Titman 1993) and that pairs trading sells overpriced (i.e. winners) and buys underpriced (losers) assets, the excess returns of pairs trading are expected to be positively exposed to this factor;

  2. 7.

    Momentum factor (MOM): difference between last year winner and loser portfolios; to the extent that pairs trading is a short-term strategy, we expect its excess returns to correlate negatively with the momentum factor.

These last two factors are included in the light of the fact that pairs trading is, by construction, a trading strategy relying on the past (patterns of) prices/returns of the assets involved.

Various combinations of the above-mentioned factors have been extensively used in the literature to assess whether the observed price anomalies, and hence the associated arbitrage opportunities, are priced by the common risk-factors (see e.g. Heiko 2015, and reference therein). The pairs trading literature we refer to makes no exception. For instance, Gatev et al. (2006) and Engelberg et al. (2009) employ the standard Fama and French (1993) three-factors model augmented by momentum and short term reversal factors, while Rad et al. (2016) use both the Fama and French (1993) three-factors model augmented with momentum and liquidity factors and the Fama and French (2015) five-factors model. More recently, both Clegg and Krauss (2018) and Han et al. (2022) regress the excess returns of the pairs trading strategy on the Fama and French (1993) three factors, Fama and French (2015) five factors, and Fama and French (1993) three factors augmented with momentum and short-term reversal factors.

Following Rad et al. (2016), we opt for the most comprehensive model specification, including all the factors at a time, because we also want to evaluate the intercept of the estimated model. Indeed, the intercept of this model can be interpreted as the alpha in a CAPM, whose value captures the average excess return of this strategy with respect to the one achieved by the market based on the risk factors included in the regression. A significant and positive alpha, even after controlling for all the possible common risk-factors, would imply that carrying out such an active strategy is able to beat the market.

As reported in Table 4, the results are highly disparate with reference to all the risk factors considered, thus proving that pre-selection impacts also on the risk-profile of the pairs trading eventual profitability. More specifically, the trading excess returns correlate with the market excess returns only if pairs are pre-selected via SSD or correlation between log-prices: hence, in these cases only the market-neutrality of the pairs trading strategy would be disproved. By constrast, when pre-selction is based on any other of the remaining metrics considered, the evidence would be in support of market neutrality. Likewise, we find that excess returns of pairs trading are positively correlated with size factor only when pre-selection occurs via Price Ratio, while in all other cases no significant relationship is retreived. Different pre-selection metrics also translates into different degrees of correlation with the Book-to-market risk-factor, which is strongly related to excess returns only when pairs are pre-selected via correlation or covariance measures. Besides, we observe that covariance between returns is the only pre-selection metrics leading to profits positively correlated to both the CMA and RMW factors. In all other cases, we observe no correlation with CMA and in the majority of the cases a negative one, if any, with RMW. In the latter case, the estimates also show a remarkable variabilty in terms of magnitude, ranging from − 0.861 to 0.168. Finally, no significant correlation is found with the short-term reversal (STR), while the pairs trading excess returns often negatively correlate with the momentum factor (MOM). This result is consistent with the expectations and with the previous literature but again is far from being homogeous across all the metrics used for pre-selection, being observed in only 4 cases out of 7 (i.e. when pre-selection is carried out via SSD, PR and covariances measures).

Table 4 Pairs trading monthly excess returns risk-profile, by pre-selection metrics

Last, we find non-homogeneous results across the metrics also in terms of the model intercept. The set of risk factors considered is generally able to sweep away the significance of the intercept, leading to the conclusion that pairs trading do not sistematically produces extra-profits with respect to the market. However, this does not happen when pre-selection runs based on Price Ratio, where the positive and significant alpha suggests instead that implementing a pairs trading strategy pre-selecting asssets with this measure might actually beat the market.

All in all, this evidence proves that pre-selection matters also in terms of if and how the pairs trading final profitability relates to the conventional risk-factors.

5.2 Spread reverting to zero

We now evaluate the robustness of our results adopting an alternative definition of the reversion to the equilibrium that is stricter than the one used in the baseline approach. More specifically, we now close the positions whenever the Spread reaches zero (or at the end of the trading period), rather than just reentering within the \(\pm 2\widehat{\sigma }\) boundaries. The results, reported in Table 5 confirm the high degree of variation across the pre-selection metrics. For instance, after the inclusion of commissions and cut rules, the average excess returns range from values not statistically distinguishable from 0 (when using SSD, PR and coherence as pre-selection metrics) to as much as 1.26%, when pre-selection occurs with log-prices covariance. Similarly, the Sharpe Ratios range from almost 0, when using the SSD, up to 0.25, when pairs are pre-selected based again on log-prices correlation. Finally, the frequency of positive monthly excess returns varies of almost 12 percentage points, moving from above 56% when pre-selection is based on returns correlation, to 44% when pairs are pre-selected based on spectral coherence.

In general, imposing this stricter condition to unwound the positions on the assets implies a longer duration of each trade (from an average of around 12 days to an average of around 30 days, see bottom part of Tables 3 and 5), coupled with a higher chance of the 50-days cut rule kicking in. This means that more trades are forcibly closed when the prices of the paired assets are still largely divergent, leading to more extreme outcomes in terms of final profitability. This might explain why, in some cases (e.g. when pre-selection is based on returns correlation and on covariance metrics) the average excess returns after commissions and cut rules are even higher than in the baseline case, and why the variation across the pre-selection measures is in this case generally higher compared to the baseline case.

Table 5 Pairs trading monthly excess returns when spread reverts to 0, by pre-selection metrics

5.3 Market impact

When big investors trade, market impact, i.e. the implicit costs entailed by the movement of (huge amount of) assets, has also to be taken into account as an additional contribution to transaction costs. Do and Faff (2012) estimate the average market impact for the US stock market equal to 26 basis points, if the sample period considered spans from 1963 to 2009, which reduces to 20 basis points over the sub-period going from 1989 to 2009. Since our sample covers the 1998–2008 period, we set the cost associated to market impact to 20 basis points of the traded amounts in dollars. For each transaction, we thus compute the market impact for the opening and closing days only in dollars. Then, for each day we compute the average amount of market impact across the traded pairs, and obtain the daily net profits as difference between the average daily excess returns minus the average amount of market impact as computed above. Daily net profits are, as usual, compounded to obtain monthly net profits, and then averaged across the six overlapping portfolios so as to generate the single summarizing monthly measure. Table 6 reports the main descriptive statistics of these monthly measures. For convenience, in the first row we also report the average monthly net profit before market impact is accounted for (and after the inclusion of commissions and cut rules, as previously reported in the bottom panel of Table 3). Notice that once market impact is considered, the results can no longer be interpreted as excess returns, since the strategy has now an initial cost of $0.004 (that is the value of the market impact, fixed to 20 basis points, multiplied by $1 for each side of the trade).

Table 6 Pairs trading profitability after commission and cut rules, by pre-selection metrics: the effects of market impact

Once market impact is considered, monthly net profits suffer an average reduction of about $0.005, and the pairs trading profits retaining statistical significance (albeit at 10% level only) are the ones obtained when pairs pre-selection is based on log-prices correlation only. In all the remaining cases final profits are not statistically distinguishable from 0, further confirming the relevant impact of using different metrics for pairs pre-selection on the eventual pairs trading profitability.

5.4 Engle and Granger (1987) cointegration test

In our main analysis, the potential cointegration relationship between pre-selected assets is estimated and tested via the Johansen (1988) procedure. In this section we replicate the analysis using the two-step Engle and Granger (1987) test to find the 20 pairs whose prices are actually cointegrated. Despite its limitations as a statistical tool, this procedure is computationally simple, and hence particularly suitable for practitioners, and not surprisingly the most widely used in the empirical literature on cointegration-based pairs trading.

Provided that \({p}_{1,t}\) and \({p}_{2,t}\), i.e. the time-series of the log-prices of the two stocks, are \(I\left(1\right)\), the first step of the Engle-Granger procedure consists in estimating the following regression:

$${p}_{1,t}=\mu +\beta {p}_{2,t}+{\epsilon }_{t}$$
(17)

so as to obtain \(OLS\) estimates of \(\beta\) (and of a constant \(\mu\)) and the estimated in-sample residuals \(\widehat{{\epsilon }_{t}}\). In the second step of the procedure, the stationarity of \({\widehat{\epsilon }}_{t}\) is tested by means of the \(ADF\) test (Dickey and Fuller 1979). If \({\widehat{\epsilon }}_{t}\), that represent the deviations from the long-run equilibrium, are found to be stationary, the two series are said to be cointegrated and are thus considered eligible for trading. As above, the trading is triggered whenever condition (13) is violated, where the Spread is computed using the estimates \(\widehat{\mu }\) and \(\widehat{\beta }\) obtained in the first step of the procedure, i.e. as:

$$Sprea{d}_{t}={p}_{1,t}-\left(\widehat{\mu }+\widehat{\beta }{p}_{2,t}\right)$$
(18)

Panel A of Table 7 reports the results obtained using this procedure. The huge variability of pairs trading profitability across different pre-selection metrics is once again confirmed. The average excess returns range between 0.37% (0.27%), when pre-selection runs via the correlation between returns, up to 1% (1.31%), almost 3 (4) times larger, when the very same trading strategy is carried out on assets pre-selected using the log-prices correlation instead before (after) including commissions and cut rules. The same picture arises when the median, rather than the mean, is considered or looking at the variability of the excess returns. For instance, before commissions and cut rules are taken into account, the excess returns obtained with pairs pre-selected using returns correlation have a standard deviation (range) equal to 0.0274 (0.26), which is 3 times smaller than the one observed when pre-selection occurs based on covariance of log-prices, equal to 0.0724 (0.88).

Table 7 Pairs trading monthly excess returns using Engle-Granger procedure, by pre-selection metrics 

A possible limitation of the Engle-Granger procedure is that the estimates \(\widehat{\mu }\) and \(\widehat{\beta }\), and—as a consequence—the residuals \(\widehat{{\epsilon }_{t}}\) of which stationarity is tested to assess cointegration, and the Spread used to trigger the trade, may vary based on which asset is chosen as dependent variable in the OLS regression. We thus identify a pair as eligible for trading only if the stocks are cointegrated in both directions.Footnote 16 Moreover, we repeat the analysis considering the reversed regression as the true one, i.e.:

$${p}_{2,t}=\widetilde{\mu }+\widetilde{\beta }{p}_{1,t}+{\varepsilon }_{t}$$
(19)

and, thus, computing the Spread used to trigger the trades as follows:

$$Sprea{d}_{t}={p}_{2,t}-\left(\widetilde{\mu }+\widetilde{\beta }{p}_{1,t}\right)$$
(20)

Results, reported in Panel B of Table 7, are largely consistent with the one reported above and once again show a huge degree of variability across the different pre-selection measures considered, thus confirming that our main conclusion is robust also to the procedure used to assess the cointegration relationship.

6 Conclusions

This study compares the profitability of a cointegration-based pairs trading strategy when pairs of US stocks are pre-selected based on seven different metrics, with the aim to reduce the computational burden entailed by cointegration tests. Although some of these metrics have already been employed in this steam of literature, to the best of our knowledge, the effect of this pre-selection on the final profitability of the pairs trading strategy has never been assessed. We also enlarge the set of pre-selection metrics considered by investigating three supplementary metrics with desirable features that, as far as we know, have never been used in this type of application.

The first take-away from our investigation is that pre-selection matters, since the profitability of the pairs trading strategy remarkably changes depending on the pre-selection metrics considered. For instance, when neither commissions nor cut rules are considered, average excess returns are not statistically significant when pre-selection is carried out by means of the covariance between log-prices or coherence, while significant in all other cases. Moreover, the average excess returns vary by a factor of 4, ranging from (a non-significant) 0.19%, generated when pairs are selected based on the coherence, up to (a significant) 0.73% when pairs are pre-selected based on the log-prices correlation. The excess returns also differ in terms of variability, whereby pre-selecting again via returns correlation as well as via SSD and PR seems to generate excess returns that are less volatile compared to those obtained when other metrics are used. Among the seven pre-selection metrics used, the ones providing the best profitability per unit of risk, as measured by the Sharpe Ratio, and the highest incidence of actually positive excess returns, are the Price Ratio and the returns correlation, respectively.

The same conclusion is achieved even after commission costs and cut rules are considered, and using a stricter definition of reversion of the Spread to the equilibrium. These differences are even more striking once the implicit costs entailed by the market impact are considered, as the pairs trading profits retain statistical significance when pre-selection runs based on log-prices correlation only, while in all the remaining cases final profits are not statistically distinguishable from zero. This is consistent with Miao (2014), who argues that pre-selection based on correlation might be beneficial to cointegration-based pairs trading in the light of the potentially complementary information it captures.

Pre-selection also impacts on the risk-profile of the observed excess returns. We find that, for all the conventional risk factors considered, results are highly disparate across the pre-selection metrics analysed. As an example, market neutrality of the pairs trading strategy is disproved if pairs are pre-selected via SSD or log-prices correlation, while supported if pre-selection is done with any other metrics considered. Similarly, the expected negative association with the momentum factor is confirmed for some metrics only. Moroever, pairs trading produce significant alphas independently of the pre-selection metric used, with the only exception of Price Ratio.

The second take-away of this analysis is that none among the three additional pre-selection measures investigated seem to produce systematically better results in terms of profitability. In other words, their features, potentially able to overcome some of the limitations of the metrics traditionally considered in the pairs trading literature, do not translate into substantial gains in terms of final profitability and, in the case of spectral coherence, also come with a higher computational burden.

Potentially interesting extensions of the analysis proposed, such as the investigation across non-US and/or non-stock markets, or a sensitivity analysis of the final profitability to arbitrarily fixed parameters (such as the opening trigger or the length of the formation period), are left for further research.