1 Introduction

The concept of cointegration derives its popularity from the fact that it allows to model equilibrium relationships between non-stationary time series. The most popular tests in the standard I(1)/I(0) setting includes the two-step procedure by Engle and Granger (1987), the trace test by Johansen (1988) and the principal component test by Phillips and Ouliaris (1988) which are subject of several comparaitive studies like Reimers (1992) and Höglund and Östermark (2003). In practice, however, standard cointegration analysis can often not be applied, since the I(1)/I(0) framework is too restrictive. For example, the series of interest may be persistent but not have a unit root, or the deviations from the equilibrium may be more persistent than the I(0) model allows.

Fractional cointegration overcomes these shortcomings, by allowing for non-integer integration orders of the variables in the system and any (possibly non-zero) memory order in the cointegrating residuals as long as it is reduced compared to the original system. Consequently, fractional cointegration promises to facilitate the modeling of a larger number of equilibrium relationships compared to standard cointegration. This has led to the development of various testing and rank estimation procedures to determine whether fractional cointegration is present in a multivariate time series.

Parametric approaches include Johansen (2008), Łasak (2010), Johansen and Nielsen (2012), Łasak and Velasco (2015), and Johansen and Nielsen (2019), among others, who consider fractional extensions of the cointegrated VAR model of Johansen (1988). Furthermore, Breitung and Hassler (2002) introduce a trace test to determine the cointegrating rank, Avarucci and Velasco (2009) suggest rank estimation in a regression framework, and Hassler and Breitung (2006) develop a time domain residual-based test.

Semiparametric approaches, on the other hand, have the advantage that they allow the researcher to focus on the long-run relationship between the series and do not require the specification of short-run dynamics. This literature encompasses the spectral-based rank estimation procedure of Robinson and Yajima (2002) and its extension by Nielsen and Shimotsu (2007), a Hausmann-type test based on the multivariate local Whittle estimator introduced by Robinson (2008a), a number of residual-based tests for the null hypothesis of no fractional cointegration developed by Marmol and Velasco (2004), Chen and Hurvich (2006), Hualde and Velasco (2008), and Wang et al. (2015), a variance-ratio test proposed by Nielsen (2010), a test based on a GPH-type estimate of the cointegration strength introduced by Souza et al. (2018) and a rank estimation procedure based on an eigenanalysis of the autocovariance function from Zhang et al. (2019).

Unfortunately, the domain of applicability of most of these procedures is much more restrictive than the definition of fractional cointegration. Some are only applicable in stationary systems—some only in non-stationary systems. Some procedures require the reduction in memory to be more than 1/2—some require the memory of the cointegrating residuals to be less than 1/2.

Furthermore, there are different assumptions about the form of the fractionally cointegrated system. Some approaches assume that one of the observed series itself is an observation of the common underlying trend. Other approaches assume an unobserved common underlying trend. We refer to these models as the triangular system and the common-components model. Which of these assumptions is more suitable in practice depends on the specific application. On the one hand, it may be appropriate to think of the risk-free interest rate as an observed common component that is perturbed by risk premia in risky bonds so that a triangular model can be used. For cointegrated pairs of stocks, on the other hand, it is unclear why the price of one stock should be interpreted as a perturbed version of another stock price so that a common-components model is more appropriate. Finally, even though the development of each of these procedures to determine whether fractional cointegration is present is a major theoretical contribution, relatively little effort has been devoted to analyze how they perform compared to each other.

Here, we try to address these issues by providing a survey of all the rank estimation and testing procedures discussed above. To study the relative performance of the competing approaches, we conduct an extensive Monte Carlo analysis of their size and power properties. It is found that several procedures - namely those of Nielsen and Shimotsu (2007) or Robinson and Yajima (2002), Marmol and Velasco (2004), and Hualde and Velasco (2008) show severe finite sample size distortions in systems with correlated short-run components. The relative performance in terms of power depends on the form of the system. For triangular systems and non-stationary common-components models the test of Souza et al. (2018) performs best overall, whereas the test of Chen and Hurvich (2006) is preferable for stationary common-components models.

The rest of the paper is structured as follows. The next section gives the definition and model of fractional cointegration we adopt and briefly reviews the basic estimation methods required by the tests. Section 3 is divided into two subsections describing two types of tests, 3.1 containing the tests based on a spectral matrix and 3.2 summarizing the tests based on cointegrating residuals, Sect. 4 presents finite sample results, and Sect. 5 concludes.

2 Fractional cointegration: models and definitions

A p-dimensional vector-valued time series \(X_t\) has long memory if its spectral density fulfills

$$\begin{aligned} f_X(\lambda )\sim {\varLambda }_j(d)\, G \, \overline{{\varLambda }_j(d)},\quad \text {as } \lambda \rightarrow 0^+, \end{aligned}$$
(1)

where G is a real, symmetric, and non-negative definite matrix, \({\varLambda }_j(d)=\text {diag}\left( \lambda ^{-d_1}e^{i\pi d_1/2},\ldots ,\lambda ^{-d_p}e^{i\pi d_p/2}\right) \) is a \(p\times p\) diagonal matrix, \(\overline{{\varLambda }_j(d)}\) is its complex conjugate transpose and ‘\(\sim \)’ implies that for each element the ratio of real and imaginary parts on the left- and right-hand side tends to one. The element in the a-th row and b-th columns of the spectral matrix \(f_X(\lambda )\) is denoted by \(f_{ab}(\lambda ) \sim g_{ab}\lambda ^{-2d}\) for \(a,b\in \{1,\ldots ,p\}\) where \(g_{ab}\) denotes the respective element of G. The periodogram of \(X_t\) at the Fourier frequencies is given by

$$\begin{aligned} I_{X}(\lambda _j)= w_{X}(\lambda _j) \overline{w_{X}(\lambda _j) }, \end{aligned}$$
(2)

with \(w_{X}(\lambda )=\frac{1}{\sqrt{2\pi T}}\sum _{t=1}^T X_t e^{i\lambda t}\), and \(\lambda _j=2\pi j/T\), for \(j=1,\ldots ,\lfloor T/2 \rfloor \), where \(\lfloor \cdot \rfloor \) denotes the greatest integer smaller than the argument.

There is a number of different definitions of fractional cointegration in the literature. The most common one goes back to Engle and Granger (1987). According to this definition the p-dimensional time series \(X_t\) is cointegrated of rank r, if all components of \(X_t\) are integrated of order d (denoted by I(d)), and there exists a non-singular matrix \(\beta \) so that the r linear combinations \(v_t=\beta ' X_t\) are \(I(d-b_a)=I(d_{v_a})\) with \(d>b_a>0\) for all \(a=1,\ldots ,r\). The matrix \(\beta \) is called the cointegrating matrix and each of its columns is a cointegrating vector. The elements of the vector \(v_t\) are the cointegrating residuals. Other definitions are given by Johansen (1995), Flôres Jr and Szafarz (1996), Marinucci and Robinson (2001), and Robinson and Yajima (2002) who also provide a discussion of the implications of the different definitions.

Standard cointegration is a special case of the definition above where \(d=1\) and \(d_{v_a}=0\) for all a. In this setup the system is non-stationary, whereas the cointegrating residuals are stationary. In contrast to that, fractional cointegration allows for a more flexible model so that several cases can be distinguished: weak cointegration (\(b<0.5\)), strong cointegration (\(b>0.5\)), stationary cointegration (\(0<d_v<d<0.5\)), or non-stationary cointegration (\(0.5<d_v<d\)).

In general, (fractional) cointegration is an equilibrium concept where the persistence of the cointegrating residual \(d_v\) determines the speed of adjustment towards the cointegration equilibrium \(\beta 'X_t\), and shocks have no permanent influence on the equilibrium as long as \(d_v<1\) holds.

As an example, consider the fractionally (co-)integrated bivariate model with \(X_t=(X_{1t}, X_{2t})'\), where

$$\begin{aligned} X_{1t}&=c_1+\xi _1 Y_t + {\varDelta }^{-(d-b_1)} u_{1t}\mathbb {1}(t>0) \end{aligned}$$
(3)
$$\begin{aligned} X_{2t}&=c_2+\xi _2 Y_t+ {\varDelta }^{-(d-b_2)} u_{2t}\mathbb {1}(t>0) \end{aligned}$$
(4)
$$\begin{aligned} \text {and} \qquad Y_t&={\varDelta }^{-d} e_t\mathbb {1}(t>0). \end{aligned}$$
(5)

Here, \(u_t=(u_{1t},u_{2t})'\) is a weakly-dependent zero-mean process with constant covariance matrix \({\varOmega }_u\) and spectral density matrix \(f_u(\lambda )\), \(e_t\) (with variance \(\sigma _e^2\) and spectral density \(f_e(\lambda )\)) is a univariate weakly-dependent zero-mean process that is allowed to be correlated with \(u_{t}\), and L denotes the lag-operator so that \(LY_t=Y_{t-1}\). The fractional difference operator \({\varDelta }^{d}=(1-L)^{d}\) is defined in terms of the binomial expansion so that \((1-L)^d=\sum _{k=0}^\infty \left( {\begin{array}{c}d\\ k\end{array}}\right) (-1)^k L^{k}\), with \(\left( {\begin{array}{c}d\\ k\end{array}}\right) =\frac{d(d-1)(d-2) \ldots (d-(k-1))}{k!}\). Furthermore, \(\mathbb {1}(\cdot )\) denotes the indicator function that takes the value one if its argument is true and is zero, otherwise. Finally, it is assumed that \(d\ge b_1, b_2\ge 0\).

The truncated processes \({\varDelta }^{-(d-b_a)} u_{at}\mathbb {1}(t>0)\) are fractionally-integrated processes of type-II which means they are only asymptotically stationary for \(d<1/2\), but in contrast to type-I processes they are still defined for \(d>1/2\). For a detailed discussion cf. Marinucci and Robinson (1999).

In this bivariate model there can be at most one cointegrating relationship. In this case \(r=1\) and \(\beta \) itself is a cointegrating vector. Obviously, if the linear combination \(\beta ' X_t= v_t\) has reduced memory, the same is true for every scalar multiple of it. To identify the cointegrating vector, it is therefore customary to apply some kind of normalization such as setting the first element of the vector to unity. In Eqs. (3) to (5), fractional cointegration arises if \(\xi _1,\xi _2\ne 0\), and \(b_1,b_2>0\). In this case the normalized cointegrating vector is \(\beta =\left( 1, -\frac{\xi _1}{\xi _2}\right) '= \left( 1, - {{\tilde{\beta }}} \right) '\) and the cointegrating residual \(v_t\) is \(I(d-b) = I(d_v)\), where \(b=\min (b_1,b_2)\). Note that this model is a common-components model, but it also nests a triangular system. This is obtained as a special case if \({\varOmega }_{u,22}=0\) so that \(X_{2t}\) is a direct (rescaled) observation of the underlying common trend and only \(X_{1t}\) is perturbed with a cointegration error so that \(b=b_1\). Standard cointegration in the I(1)/I(0) framework is obtained as a special case if \(d=1\) and \(b_1=b_2=1\). It is also possible to have \(\xi _1,\xi _2\ne 0\), so that both \(X_{1t}\) and \(X_{2t}\) contain the common component \(Y_t\), but they are not cointegrated if \(b_1=b_2=0\).

3 Tests for no fractional cointegration

In the following, we provide a comprehensive review of semiparametric tests and estimation procedures that can be used to determine the order of fractional cointegration in a p-dimensional vector-valued time series \(X_t\). According to the definition discussed above, this requires that the components of \(X_t\) are integrated of the same order. In practice, this can either be assumed based on domain-specific knowledge, or it can be tested with tests for the equality of memory parameters that allow for cointegration introduced by, for example, Robinson and Yajima (2002), Nielsen and Shimotsu (2007), Hualde (2013), and Wang and Chan (2016). In particular Robinson and Yajima (2002) discuss in detail how to partition a vector-valued time series into subvectors with equal memory parameters. These can then be used for further cointegration analysis.

In the following, it will be assumed that all components of \(X_t\) are I(d), which means we abstract from these pre-testing issues to focus on the actual tests for the null of no fractional cointegration. For all tests the hypotheses are defined by

  • \(H_0\): \(X_t\) is not fractionally cointegrated (\(d=d_v\)),

  • \(H_1\): \(X_t\) is fractionally cointegrated (\(d>d_v\)).

In contrast to standard I(1)/I(0) cointegration, the memory parameter d is unknown in fractionally cointegrated systems and has to be estimated. Since multivariate memory estimation becomes inconsistent under cointegration, the memory parameters are estimated univariately and, if not stated otherwise, we employ the means of the univariate memory estimates in the tests.

The tests presented in this Section apply the most common estimators: the log-periodogram estimator \({{\widehat{d}}}_{GPH}\) of Geweke and Porter-Hudak (1983) and Robinson (1995b), the local Whittle estimator \({{\widehat{d}}}_{LW}\) of Künsch (1987) and Robinson (1995a), or the exact local Whittle estimator \({{\widehat{d}}}_{ELW}\) of Shimotsu and Phillips (2005) and Shimotsu (2010). All of these estimators are periodogram-based and employ the first m Fourier frequencies. The general requirement is that \(m<\lfloor T/2 \rfloor \) tends to infinity more slowly than T so that \(\frac{1}{m} + \frac{m}{T} \rightarrow 0 \text { as } T\rightarrow \infty \) and even the largest frequency \(2\pi m/T\) is asymptotically local to the zero frequency.

To estimate the cointegrating relationship \(\beta ' X_t = v_t\) when \(r=1\), the vector is partitioned such that \(X_t=(y_t,x_t)\), where \(y_t\) is a scalar and \(x_t\) is \((p-1)\times 1\). By doing so, the focus is on one possible cointegrating relation \(y_t = {\tilde{\beta }} x_t + v_t\) where \({\tilde{\beta }}\) is \((p-1)\)-dimensional.

As in standard cointgration analysis the vector \({\tilde{\beta }}\) can be estimated with ordinary least squares (OLS) as long as \(d>1/2\) so that the series remains non-stationary. In stationary long-memory time series, OLS is inconsistent in presence of correlation between the stationary regressors and the innovation term \(v_t\) (cf. Robinson (1994)).

Robinson (1994) and Robinson and Marinucci (2001) introduce an alternative estimator of the cointegrating vector that is based on the periodogram local to the zero frequency. In contrast to OLS, this narrow-band frequency domain least squares (NBLS) estimator is consistent under cointegration for all values of d and has a non-normal limiting distribution in the non-stationary region. Christensen and Nielsen (2006a) extend the asymptotic results to the stationary region where the estimate follows an asymptotic normal distribution and Nielsen and Frederiksen (2011) provide a correction of the asymptotic bias under weak fractional cointegration.

Estimating the linear cointegrating relationship with NBLS requires calculating the averaged cross-periodogram of \(x_t\) with itself and \(y_t\) by \(I^{av}_{xx}(\lambda _j) = \frac{2\pi }{T} \sum _{j=1}^m \omega _x(\lambda _j)\overline{\omega _x(\lambda _j)}\) and \(I^{av}_{xy}(\lambda _j) = \frac{2\pi }{T} \sum _{j=1}^m \omega _x(\lambda _j)\overline{\omega _y(\lambda _j)}\). The NBLS estimate of \({\tilde{\beta }}\) is then defined by

$$\begin{aligned} {\widehat{\beta }}_m = I^{av}_{xx}(\lambda _j)^{-1} I^{av}_{xy} (\lambda _j). \end{aligned}$$
(6)

The bandwidth m has to fulfill the usual local-to-zero condition as \(T\rightarrow \infty \). If not specified otherwise, we employ NBLS to estimate the cointegrating vector. Other estimators suggested in the literature include estimation based on the eigenvectors of a version of \(I^{av}_X(\lambda _j)\) (cf. Chen and Hurvich (2006)) and joint estimation with the memory parameters in multivariate local Whittle approaches such as those of Nielsen (2007), Robinson (2008b) and Shimotsu (2012).

The following review is divided into tests based on the spectral density local to the origin (Sect. 3.1) and tests based on estimates of the cointegrating residuals (Sect. 3.2). Of course, this distinction is not clear cut, since some of the residual-based approaches also use the spectral properties of the potential cointegrating residuals and for example the test of Nielsen (2010) is presented as a variance-ratio test. Many different categorizations would be possible. Here, we refer to those approaches as ”spectral-based” that rely on the properties of the spectrum of the observed series \(X_t\) itself, and those that rely on the spectrum of the cointegrating residual are called ”residual-based”.

3.1 Tests based on the spectral matrix

A number of procedures to determine the fractional cointegrating rank of the p-dimensional time series \(X_t\) are based on properties of the rescaled spectral matrix local to the zero frequency. This is denoted by G in Eq. (1) and has reduced rank if and only if \(X_t\) is fractionally cointegrated. If fractional cointegration is present, the number of eigenvalues that are equal to zero corresponds to the cointegrating rank r. More details on the connection between fractional cointegration, unit coherence and singularity of G are given in Velasco (2003b) and Nielsen (2004).

Based on this property Robinson and Yajima (2002) introduce an information criterion to determine the fractional cointegration rank that is extended to non-stationary processes by Nielsen and Shimotsu (2007). To obtain an estimate \({{\widehat{G}}}\) of G, the first step consists in applying the univariate exact local Whittle estimator of Shimotsu and Phillips (2005) and Shimotsu (2010) to each component of \(X_t\) separately, using bandwidth m, and pooling them to the arithmetic mean \({{\widehat{d}}}_{ELW}\). The estimate of \({{\widehat{G}}}({{\widehat{d}}}_{ELW})\) is then defined by \({{\widehat{G}}}({{\widehat{d}}}_{ELW}) = \frac{1}{m_1} \sum _{j=1}^{m_1} \text {Re } I_{{\varDelta }^d}(\lambda _j),\) where \(I_{{\varDelta }^d}\) is the periodogram of \({\varDelta }^{{{\widehat{d}}}_{ELW}}X_t\). The bandwidths have to fulfill \(\frac{m_1}{m}\rightarrow 0\) in order to ensure faster convergence of \({{\widehat{d}}}_{ELW}\) than of \({{\widehat{G}}}({{\widehat{d}}}_{ELW})\).Footnote 1 Denote the empirical eigenvalues calculated from \({{\widehat{G}}}({{\widehat{d}}}_{ELW})\) and sorted in descending order by \({\widehat{\delta }}_{a,G}\) for \(a=1,\ldots ,p\). The cointegrating rank can then be estimated using a model selection criterion that is based on the partial sum of the sorted eigenvalues

$$\begin{aligned} {{\widehat{r}}}_{NS} = \arg \underset{k=0,\ldots ,p-1}{\min } \left( n(T) (p-k) - \sum _{a=1}^{p-k} {\widehat{\delta }}_{a,G}\right) , \end{aligned}$$
(7)

where n(T) is a function which fulfills \(n(T) + \frac{1}{\sqrt{m_1}\, n(T)} \rightarrow 0\) as \(T\rightarrow \infty \) so that n(T) goes to zero more slowly than the estimation error in the eigenvalues that is of order \({\mathcal {O}}_P\left( m_1^{-1/2}\right) \). Asymptotically, the expression is therefore minimal if only estimates of non-zero eigenvalues are included in the sum.

To deal with situations in which the scales of the components in \(X_t\) are different, Nielsen and Shimotsu (2007) suggest to base the procedure on the correlation matrix \({\widehat{P}}({\widehat{d}}_{ELW})={{\widehat{R}}}({\widehat{d}}_{ELW})^{-1/2} {\widehat{G}}({\widehat{d}}_{ELW}){{\widehat{R}}}({\widehat{d}}_{ELW})^{-1/2}\) instead of \({{\widehat{G}}}\), where \({{\widehat{R}}}({\widehat{d}}_{ELW}) = \text {diag}({{\widehat{g}}}_{11},\ldots ,{{\widehat{g}}}_{pp})\) contains the diagonal elements of \({\widehat{G}}({\widehat{d}}_{ELW})\). This is admissible since the rank of \({{\widehat{P}}}\) is the same as that of \({\widehat{G}}\) in the limit. Nielsen and Shimotsu (2007) point out that this approach works better in simulations and also recommend to use the bandwidth \(n(T)=m_1^{-0.3}\). The cointegrating rank estimate is consistent for \(r\in \{0,\ldots ,p-1\}\). It is applicable for systems of dimension \(p\ge 2\), and it does not impose restrictions on d and b. A similar rank estimation procedure based on the average of finitely many tapered periodogram ordinates local to the origin was also proposed by Chen and Hurvich (2003).

The inconsistency of the multivariate local Whittle estimator under fractional cointegration is the basis for a test procedure originally proposed by Marinucci and Robinson (2001). They suggest a Hausman-type test that compares multivariate and univariate local Whittle estimates. Under the null hypothesis of no cointegration the multivariate estimator is efficient and both are consistent, whereas under the alternative of fractional cointegration the univariate estimator remains consistent, while the multivariate one does not.

This idea is formalized by Robinson (2008a). The test statistic is based on the objective function of the multivariate local Whittle estimator (cf. Lobato (1999), Shimotsu (2007)) \(S(d) = \log \det {{\widehat{G}}}^*(d) - \frac{2pd}{m} \sum _{j=1}^m \log \lambda _j\) with \({{\widehat{G}}}^*(d) = \frac{1}{m} \sum _{j=1}^m I_X(\lambda _j) \lambda _j^{2d}\) and its derivative

$$\begin{aligned}&s^*\left( d\right) = \text {tr}\left( {{\widehat{G}}}^*(d)^{-1} {{\widehat{H}}}^*(d)\right) \end{aligned}$$
(8)

with \({{\widehat{H}}}^*(d) = \frac{1}{m} \sum _{j=1}^m \nu _jI_X(\lambda _j) \lambda _j^{2d}\) and \(\nu _j=\log j - \frac{1}{m} \sum _{k=1}^m \log k\). Similar to the previous procedure, the memory parameter d is estimated by pooling the univariate estimates obtained by applying the local Whittle estimator to each of the component series. The equally weighted average is denoted by \({{\widehat{d}}}_{LW}\). To obtain a test statistic, the derivative \(s^*(d)\) from (8) is evaluated at this averaged univariate estimate:

$$\begin{aligned} W_{Rob}^*&= \frac{m s^*({{\widehat{d}}}_{LW})^2}{N^2\text {tr}( {{\widehat{F}}}^{*^2})-p } \end{aligned}$$
(9)

with \({{\widehat{F}}}^*={{\widehat{R}}} ^{*^{-1/2}} {{\widehat{G}}}^*({{\widehat{d}}}_{LW}) {{\widehat{R}}}^{*^{-1/2}}\) and \({{\widehat{R}}}^* =\text {diag}({{\widehat{g}}}_{11}^*,\ldots ,{{\widehat{g}}}_{pp}^*)\), where \({{\widehat{g}}}_{aa}^*\), \(a=1,\ldots ,p\), are the diagonal elements of \({{\widehat{G}}}^*({{\widehat{d}}}_{LW})\). The scaled derivative \(m^{1/2}s^*({{\widehat{d}}}_{LW})\) is asymptotically normal so that the test follows a \(\chi ^2_1\)-distribution if appropriately standardized by the term in the denominator.

The test generates power because G(d) is singular under the alternative of fractional cointegration so that the inverse \({{\widehat{G}}}^*({{\widehat{d}}}_{LW})^{-1}\) of the estimate and consequently the trace \(s^*\left( {{\widehat{d}}}_{LW}\right) \) become large. This is a score-type test that avoids the calculation of the multivariate local Whittle estimator that can be numerically expansive. Since the efficiency of the multivariate estimate is obtained with a single Newton step from the univariate estimate in direction of the multivariate one, \(s^*\left( {{\widehat{d}}}_{LW}\right) \) is directly proportionate to the difference between the efficient and the inefficient estimate.

This test allows series of dimensions larger than two, but it is restricted to processes with \(d\in (-1/2,1/2)\) and focuses on the empirically relevant range \(d\in (0,1/2)\). A non-stationary extension based on a trimmed version of the local Whittle estimator is proposed, but the size and power properties of this test in simulations appear to depend heavily on the sample size.Footnote 2

An alternative way to allow for non-stationary processes would be to base the test on the objective function of the multivariate exact local Whittle estimator (as in Shimotsu (2012), but without allowing for fractional cointegration) and univariate ELW estimates. Since the exact local Whittle estimates have the same asymptotic properties as the local Whittle estimate for \(d\in (-1/2,1/2)\), the test would have the same limiting distribution.

For a bivariate process with known \(d\in (0,1]\), Souza et al. (2018) propose a test based on an estimate of b obtained from the determinant of the trimmed and truncated spectral matrix of the fractionally differenced process via a log-periodogram regression. Denote the fractionally differenced process by \({\varDelta }^d X_t= ({\varDelta }^d X_{1t}, {\varDelta }^d X_{2t})'\) with spectral density matrix \(f_{{\varDelta }^d}(\lambda )\), then the determinant \(D_{{\varDelta }^d}(\lambda )\) of \(f_{{\varDelta }^d}(\lambda )\) depends on the memory reduction parameter \(b\in [0,d]\) and can be approximated by

$$\begin{aligned} D_{{\varDelta }^d}(\lambda ) \sim {\tilde{g}} |1-e^{-i\lambda }|^{2b}, \quad \text { as } \quad \lambda \rightarrow 0^+, \end{aligned}$$
(10)

where \({{\tilde{g}}}\) is a constant and finite scalar. Under cointegration, \(f_{{\varDelta }^d}(\lambda )\) does not have full rank near the origin (like G in (1)) so that its determinant \(D_{{\varDelta }^d}(\lambda )\) approaches zero as \(\lambda \rightarrow 0^+\). The memory reduction b can be estimated from the logged version of Eq. (10) using a log-periodogram type regression,

$$\begin{aligned} \log D_{{\varDelta }^d}(\lambda ) \sim \log {{\tilde{g}}} + 2b \log |1-e^{-i\lambda }| +\log \frac{{\tilde{g}}^*(\lambda )}{{{\tilde{g}}}}, \quad \text { as } \quad \lambda \rightarrow 0^+, \end{aligned}$$

where \(\lim _{\lambda \rightarrow 0^+} {\tilde{g}}^*(\lambda ) = {{\tilde{g}}}\).

In order to make the estimation of b feasible, the empirical determinant \({{\widehat{D}}}_{{\varDelta }^d}(\lambda )\) has to be calculated from an estimate \({{\widehat{f}}}_{{\varDelta }^d}(\lambda )\) of the spectral density at the Fourier frequencies with \(j=l,l+(2l-1),l+2(2l-1),\ldots ,m-(2l-1),m\) with \(l+1<m<T\). The latter is obtained from the locally averaged periodogram \({{\widehat{f}}}_{{\varDelta }^d}(\lambda _j)=\frac{1}{2l-1} \sum _{k=j-(l-1)}^{j+(l-1)} I_{{\varDelta }^d}(\lambda _k),\) where \(I_{{\varDelta }^d}(\lambda _k)\) is the periodogram of \({\varDelta }^dX_t\). At each j the estimate \({{\widehat{f}}}_{{\varDelta }^d}(\lambda _j)\) is thus a local average of the periodogram at frequency j and the \(l-1\) frequencies to its left and right and the \(\lambda _j\) are spaced so that the local averages are non-overlapping.

The resulting estimator for the cointegrating strength b is given by

$$\begin{aligned} {\widehat{b}}_{GPH}&= \left( \sum _{j=l+1}^m {{\tilde{Z}}}_j^{*2}\right) ^{-1} \sum _{j=l+1}^m {{\tilde{Z}}}_j^* \log {{\widehat{D}}}_{{\varDelta }^d}(\lambda _j), \end{aligned}$$

where \({\tilde{Z}}_j^*=Z_j^*-\bar{Z^*}, \, Z_j^*= \log |1-e^{i\lambda }| = \log (2-2\cos (\lambda _j))\), and \(\bar{Z^*}\) is the mean of the \(Z_j^*\). Under the null hypothesis of no fractional cointegration we have \(b=0\). Under this condition, and assuming that l and m fulfill the condition \(\frac{l+1}{m}+\frac{m}{T}+\frac{1}{m}+\frac{\log m}{m} \rightarrow 0 \) as \(T\rightarrow \infty \), the estimate \({{\widehat{b}}}_{GPH}\) is consistent and asymptotic normal with variance \(\sigma _{b}^2 = \frac{1}{m} ({\varPsi }^{(1)}(2l+1) + {\varPsi }^{(1)}(2l))\), where \({\varPsi }^{(1)} (x) = \frac{\delta ^2 \log {\varGamma }(x)}{\delta x^2}\) is the polygamma function of order 1 and \({\varGamma }(\cdot )\) denotes the gamma function.

The null hypothesis of no fractional cointegration can thus be tested using a simple t-test:

$$\begin{aligned} W_{SRFB} = \frac{{{\widehat{b}}}_{GPH}}{\sigma _{b}}{\mathop {\rightarrow }\limits ^{d}}N(0, 1). \end{aligned}$$
(11)

The method has no restrictions regarding the range of d and b but is only applicable to bivariate processes. For practical purposes, d is usually unknown and has to be estimated, but as shown in our simulation study in Sect. 4 this has no severe implications for the quality of the test. However, a thorough theoretical examination of this aspect would be interesting for further research. Note that the work by Velasco (2003a) might help on this issue as he introduced a similar estimate focusing on knowing or not-knowing the true residuals.

3.2 Tests based on cointegrating residuals

By the definition of fractional cointegration the memory of the linear combination \(v_t=\beta 'X_t\) is lower than that of \(X_t\) itself. Under the null hypothesis of no fractional cointegration one can still write \(v_t=\beta 'X_t=y_t-{\tilde{\beta }} x_{t}\), since \(y_t\) can still depend on the values of the other components of \(X_t\). The difference to the cointegrated case is only that \(d_v=d\). It is therefore natural to test for fractional cointegration by testing \(d_v=d\) (or \(b=0\)) versus \(d_v<d\) (or \(b>0\)) based on an estimate \({{\widehat{v}}}_t\) of the potential cointegrating residual.

Under weak non-stationary fractional cointegration, i.e., \(d>d_v>1/2\), Marmol and Velasco (2004) suggest a Hausman (1978)-type F-test that compares the OLS estimate \({\widehat{\beta }}^{OLS}\) of the cointegrating vector with an alternative estimate \({\widehat{\beta }}^{NB}\) with opposite consistency characteristics.

The OLS estimator \({\widehat{\beta }}^{OLS}\) is consistent for \({\tilde{\beta }}\) under the alternative (as long as \(d>1/2\)) but inconsistent under the null hypothesis. Marmol and Velasco (2004) propose an alternative estimator \({\widehat{\beta }}^{NB}\) that is consistent for the vector \({\tilde{\beta }}\) under the null hypothesis but inconsistent under the alternative. The estimator is given by

$$\begin{aligned} {\widehat{\beta }}^{NB}({{\widehat{d}}}_x, {{\widehat{d}}}_v) ={{\widehat{G}}}_{xx}^{MV}({{\widehat{d}}}_x)^{-1} {{\widehat{g}}}_{xy}^{MV}({{\widehat{d}}}_{v}), \end{aligned}$$

where \({{\widehat{G}}}_{xx}^{MV}(d) = \frac{2\pi }{m_2} \sum _{j=1}^{m_2} {\tilde{{\varLambda }}}_j(d)^{-1} \text {Re} \left\{ I_{xx}(\lambda _j) \right\} {\tilde{{\varLambda }}}_j^{-1} (d),\) and \({{\widehat{g}}}_{xy}^{MV}(d) =\) \( \sum _{j=1}^{m_2} \text {Re } I_{xy} (\lambda _j) \lambda _j^{2(d-1)},\) \({\tilde{{\varLambda }}}_j(d)=\text {diag}(\lambda _j^{1-d},\ldots ,\lambda _j^{1-d})\) and where \(I_{xx}(\lambda _j)\) and \(I_{xy}(\lambda _j)\) are the respective elements of the periodogram \(I_{{\varDelta } X {\varDelta } X}(\lambda _j)\) of the differenced process \({\varDelta } X_t\) and \(m_2\) is subject to the usual bandwidth conditions. The estimator is closely related to the narrow band least squares estimator \({\widehat{\beta }}_m\) from (6) but uses a rescaled version of the periodogram. In fact, \({\widehat{\beta }}^{NB}(0, 0)\) would be equivalent to the NBLS estimate based only on the real part of the periodogram. Note that Nielsen (2005) introduced a very similar GLS-type estimate \({\widehat{\beta }}^{NB}(d,d)\).

Inconsistency under the alternative is only obtained through the choice \({\widehat{\beta }}^{NB}({{\widehat{d}}}_x, {{\widehat{d}}}_v)\), where \({{\widehat{d}}}_v\) is estimated from the OLS residuals. Since under the alternative \({{\widehat{v}}}_t^{OLS}\) is a consistent estimate of the cointegrating residual, \({{\widehat{d}}}_v\rightarrow d_v<d\), whereas \({{\widehat{d}}}_x\) is estimated from the original series and is consistent for d. Under the null hypothesis, on the other hand, \({\widehat{\beta }}^{OLS}\) is inconsistent so that \({{\widehat{v}}}_t^{OLS}\) is just some linear combination of I(d) series, \({{\widehat{d}}}_v\rightarrow d\), and \({\widehat{\beta }}^{NB}({{\widehat{d}}}_x, {{\widehat{d}}}_v)\) is consistent for \({\tilde{\beta }}\).

Since the process is non-stationary, the memory is estimated by local Whittle from the differenced process. Alternatively, d could be estimated using a tapered local Whittle estimator, or by the exact or fully extended local Whittle estimator. The test statistic compares both estimates of \({\tilde{\beta }}\) where the normalizing variance \({{\widehat{V}}}^{MV}\) is estimated from the periodogram of the OLS residuals \({{\widehat{v}}}^{OLS}_t\) and that of \(x_t\) so that

$$\begin{aligned} {{\widehat{V}}}_{MV}&= \left( \sum _{j=-m}^m I_{xx} (\lambda _j) \right) ^{-1} \sum _{j=-m}^m I_{xx} (\lambda _j) I_{{{\widehat{v}}}{{\widehat{v}}}}(\lambda _j) \left( \sum _{j=-m} ^m I_{xx} (\lambda _j) \right) ^{-1}. \end{aligned}$$

This leads to the test statistic

$$\begin{aligned} W_{MV}&=\frac{1}{p-1} \left( {\widehat{\beta }}^{OLS} - {\widehat{\beta }}^{NB} \right) ' {{\widehat{V}}}_{MV}^{-1} \left( {\widehat{\beta }}^{OLS} - {\widehat{\beta }}^{NB} \right) . \end{aligned}$$
(12)

The choices of m and \(m_2\) are not linked, but both have to satisfy the condition \((m^{d-2} + m^{\gamma -1} \log T)\log ^2T + \frac{m}{T} \rightarrow 0\) as \(T\rightarrow \infty \), with \(\gamma >0\) which is fulfilled if \(m\sim T^{\eta }\), \(\eta \in (0,1)\). The asymptotic distribution is non-standard and depends on the memory parameter d. It is given by

$$\begin{aligned} W_{MV}&{\mathop {\rightarrow }\limits ^{d}} \frac{1}{p-1} \int _0^1 W_y(d; r)W_x(d; r)'dr V^{-1}\int _0^1 W_x(d;r)W_y(d;r)dr, \end{aligned}$$

with \(V=\int _0^1\gamma _R(s)\left\{ \gamma _{xx}(s)+\gamma _{xx}'(s)+\gamma _{xx}(1-s)+\gamma _{xx}'(1-s)\right\} ds,\) \(\gamma _R(s)=\int _0^{1-s} W_y(d;r) W_y(d;r+s) dr,\) and \(\gamma _{xx}(s)=\int _0^{1-s}W_x(d;r) W_x(d;r+s)' dr,\) where \(W_y(d;r)\) is a fractional Brownian bridge, and \(W_x(d;r)\) is a \(p\times 1\) vector of independent fractional Brownian bridges.

Critical values are tabulated in Marmol and Velasco (2004) for dimensions up to \(p=5\) and different forms of detrending that affect the type of the fractional Brownian bridges. The test statistic \(W_{MV}\) diverges under the alternative since both \({\widehat{\beta }}^{NB}\) and \({{\widehat{V}}}_{MV}^{-1}\) diverge under fractional cointegration. Although the consistency of the test is derived assuming \(d>d_v>0.5\), Marmol and Velasco (2004) state that the test remains consistent if the stationarity border is crossed by the cointegrating residuals, i.e. \(d>0.5>d_v\). Our simulations in Sect. 4 confirm this.

A direct residual-based test is proposed by Chen and Hurvich (2006) who estimate the possible cointegrating subspaces using eigenvectors of the averaged periodogram local to the zero frequency. The process \(X_t\) is assumed to be stationary after taking \((q-1)\) integer differences which allows \(d\in \left( q-1.5, \, q-0.5 \right) \). In order to account for possible over-differentiation the complex-valued taper \(h_t=0.5(1-e^{i2\pi t/T})\) of Hurvich and Chen (2000) is applied to the data. The tapered discrete Fourier transform (DFT) and periodogram of \(X_t\) are defined by

$$\begin{aligned} w_{X}^{tap}(\lambda _j)&= \frac{1}{\sqrt{2\pi \sum _t |h_t^{(q-1)}|^2}} \sum _{t=1}^T h_t^{(q-1)} X_t e^{i\lambda _jt},\\ I_X^{tap}(\lambda _j)&= w_X(\lambda _j) \overline{w_X(\lambda _j)}. \end{aligned}$$

Based on the tapered periodogram, define the averaged periodogram matrix of \(X_t\) by \(I_{X}^{av}(\lambda _j) = \sum _{j=1}^{m_{3}} \text {Re}\left( I_{X}^{tap} (\lambda _j)\right) ,\) where \(m_{3}\) is a fixed positive integer fulfilling \(m_{3}>p+3\). The eigenvalues of \(I_{X}^{av}(\lambda _j)\) sorted in descending order are denoted by \({\widehat{\delta }}_{a,I_{X}^{av}}\) and the corresponding eigenvectors are given by \({\widehat{\chi }}_{a,I_{X}^{av}}\), for \(a=1,\ldots ,p\). Under the alternative hypothesis, if there are \(r>0\) cointegrating relationships, the matrix consisting of the first r eigenvectors provides a consistent estimate of the cointegrating subspace.

To construct a test for the null hypothesis of no fractional cointegration the potential cointegrating residuals \(v_t\) are estimated by multiplying \(X_t\) with the eigenvectors \({\widehat{\chi }}_{a,I_{X}^{av}}\) so that \({{\widehat{v}}}^{av}_{at}={\widehat{\chi }}_{a,I_{X}^{av}}'X_t\), for \(a=1,\ldots ,p\).

The memory of the p residual processes is estimated with the local Whittle estimator using bandwidth m but calculated using shifted Fourier frequencies \(\lambda _{{{\tilde{j}}}}\) with \({{\tilde{j}}} = j +(q-1)/2\) to account for the tapering of order q. These estimates are denoted by \({{\widehat{d}}}_{v_a,{\widetilde{LW}}}\), and they remain consistent and asymptotic normal.

Since there can be at most \(p-1\) cointegrating relationships in a p-dimensional time series, the first residual corresponding to the largest eigenvalue cannot be a cointegrating residual. Its memory must therefore equal the common memory d of \(X_t\). In contrast, the last residual \({{\widehat{v}}}^{av}_{pt}\) corresponding to the smallest eigenvalue is most likely to be a cointegrating residual if there is cointegration so that its memory is reduced by b under cointegration.

The test idea of Chen and Hurvich (2006) is therefore to compare the estimated memory orders from the residual series \({{\widehat{v}}}^{av}_{1t}\) and \({{\widehat{v}}}^{av}_{pt}\) that correspond to \({{\widehat{d}}}\) (first residual) and \({{\widehat{d}}}_v\) (last residual). Chen and Hurvich (2006) show that

$$\begin{aligned} \sqrt{m}\left( {{\widehat{d}}}_{v_a,{\widetilde{LW}}} - {{\widehat{d}}}_{v_b, {\widetilde{LW}}}\right)&{\mathop {\rightarrow }\limits ^{d}}N\left( 0, V_{CH,q}\left( 1-\frac{G_{ab}^2}{G_{aa}G_{bb}}\right) \right) \end{aligned}$$

with \(V_{CH,q}= \frac{1}{2} \, \frac{{\varGamma }(4q-3){\varGamma }^4(q)}{{\varGamma }^4(2q-1)}\). A conservative test statistic is therefore given by

$$\begin{aligned} W_{CH}&=\sqrt{m}\, \frac{\left( {{\widehat{d}}}_{v_1,{\widetilde{LW}}} - {{\widehat{d}}}_{v_p, {\widetilde{LW}}}\right) }{\sqrt{V_{CH,q}}}. \end{aligned}$$
(13)

The test rejects if \(W_{CH}\) is larger than the standard normal quantile \(z_{1-\alpha /2}\). It is very versatile, since it does not impose restrictions on the cointegration strengh b and can be applied to stationary as well as non-stationary long-memory processes, but it requires a priori knowledge about the location of d in the parameter space to determine the order of differencing.

Hualde and Velasco (2008) propose another testing strategy in a residual-based regression framework. As before, the series \(X_t\) is partitioned such that \(X_t=(y_t, x_t')'\) and they consider the single-equation regression \(y_t = {\tilde{\beta }} x_t + v_t\).

The test idea is based on the observation that the fractionally differenced residual \({\varDelta }^{d_x} v_t\) is unrelated to the long-run level of \(x_t\) under the null hypothesis. This is because \({\varDelta }^{d_x} v_t\) is I(0) and \(x_t\) is I(d). The cross-spectrum of \(x_t\) and \({\varDelta }^{d_x} v_t\) should therefore be zero at frequencies local to zero. Possible dependence between the short-run components \(u_t\) and \(e_t\) in (3) would manifest itself in form of a non-zero cross-spectrum at higher frequencies.

The test statistic of Hualde and Velasco (2008) is therefore based on the quantity \({\widehat{\tau }}_m\) defined as

$$\begin{aligned} {\widehat{\tau }}_m= \sum _{j=1}^m w_{x}(-\lambda _j) \zeta (\lambda _j) w_{{\varDelta }^{d_v,d} X} (\lambda _j) \end{aligned}$$

where \({\varDelta }^{d_v,d}X_t=\left( {\varDelta }^{{{\widehat{d}}}_v} y_t, {\varDelta }^{{{\widehat{d}}}} x_t' \right) '\) and \(\zeta (\lambda _j) = (1,0_{p-1}')\, {{\widehat{f}}}_X(\lambda _j)^{-1}\). The projection vector \(\zeta (\lambda _j)\) estimates the DFT of the residual process \(v_t\) from\(w_{{\varDelta }^{d_v,d} X} (\lambda _j)\)—the DFT of the fractionally differenced process \({\varDelta }^{d_v,d} X_t\). As usual for these semiparametric approaches, it is assumed that \(m\le T/2\) and \(m/T\rightarrow 0\), as \(T\rightarrow \infty \). This leads to the test statistic

$$\begin{aligned} W_{HV}&= {\widehat{\tau }}_m' {{\widehat{V}}}_{HV}^{-1} {\widehat{\tau }}_m \end{aligned}$$
(14)

with \({{\widehat{V}}}_{HV}= \sum _{j=0}^m a_j\text { Re } \kappa (\lambda _j) I_{XX}(\lambda _j ),\) and \(\kappa (\lambda _j) = \zeta (\lambda _j) \,(1,0_{p-1}')',\) where the weights are defined by \(a_j=1\) if \(j\in \{0,T/2\}\) and \(a_j=2\) otherwise. Under the null hypothesis this test statistic follows an asymptotic \(\chi _{p-1}^2\)-distribution. Under the alternative the test develops power, since \(d_v\) is estimated from the NBLS estimate of the cointegrating residuals. Since these have reduced memory under the alternative, the first component of \({\varDelta }^{d_v,d} X_t\) (\(y_t\)) is I(b) instead of I(0) and the cross spectrum of the underdifferenced estimate of \(v_t\) and \(x_t\) in \({\widehat{\tau }}_m\) becomes non-zero. As before, the memory orders are estimated using consistent estimators that account for the (possible) non-stationarity of the data—for example the exact local Whittle estimator of Shimotsu and Phillips (2005).

A modified test with more power in bivariate systems \(X_t = (X_{1t}, X_{2t})'\) is calculated with \({\tilde{\tau }}_m\) instead of \({\widehat{\tau }}_m\):

$$\begin{aligned} {\tilde{\tau }}_m&= \sum _{j=0}^m a_j\, \frac{\text {Re} \left( I_{{\varDelta }^{{{\widehat{d}}}_v}X_1, X_2}(\lambda _j) - \frac{{{\tilde{f}}}_{12}(\lambda _j)}{{{\widehat{f}}}_{22}(\lambda _j)} I_{{\varDelta }^{{{\widehat{d}}}_v}X_2,X_2}(\lambda _j) \right) }{{{\widehat{f}}}_{11}(\lambda _j) - \frac{{{\widehat{f}}}_{12}(\lambda _j) {{\widehat{f}}}_{21}(\lambda _j)}{{{\widehat{f}}}_{22}(\lambda _j)}}. \end{aligned}$$

Here, the respective elements of the spectral matrices of differenced processes \({{\widehat{f}}}_{{\varDelta }^{{{\widehat{d}}}}}(\lambda _j)= \frac{1}{2m+1} \sum _{k=j-m}^{j+m} I_{{\varDelta }^{{{\widehat{d}}}}X} (\lambda _k)\), \({{\tilde{f}}}_{{\varDelta }^{{{\widehat{d}}}_v}}(\lambda _j)=\frac{1}{2m+1} \sum _{k=j-m}^{j+m} I_{{\varDelta }^{{{\widehat{d}}}_v}X} (\lambda _k)\) are denoted by \({{\widehat{f}}}_{ab}(\lambda _j)\) and \({{\tilde{f}}}_{ab}(\lambda _j)\) with \(a,b \in \{1,2\}\). This is the same as \({\widehat{\tau }}_m\) but with \({{\widehat{f}}}_{12}(\lambda _j)\) replaced by \({{\widetilde{f}}}_{12}(\lambda _j)\) that is constructed using \({{\widehat{d}}}_v\) so that it also diverges under the alternative and constitutes an additional source of power. The asymptotic \(\chi _{p-1}^2\)-distribution is unaffected by this modification.

It is not necessary to impose any restrictions on the range of d and \(d_v\) except for those implied by fractional cointegration, and processes of dimensions higher than two are allowed. The asymptotic \(\chi _{p-1}^2\) distribution depends only on the dimension of the process. Furthermore, the memory parameters are allowed to differ as long as two components of \(X_t\) share the same memory parameter and the vector is sorted so that the component with the highest memory comes first.

Nielsen (2010) introduces a sequential testing approach to test fractional cointegration and to determine the cointegrating rank. The method is based on a variance-ratio statistic and imposes the assumption that the process \(X_t\) is non-stationary and the potential cointegrating residual process is stationary with \(d_v< 0.5 < d\).

Denote the demeaned process by \(Z_t=X_t-\overline{X_t}\), where \(\overline{X_t}\) is the vector of arithmetic means of the component series. The fractionally integrated version of \(Z_t\) is denoted by \({{\widetilde{Z}}}_t={\varDelta }^{-\epsilon }Z_t\). Then the variance ratio is given by \(K_T(\epsilon )=A_T C_T^{-1},\) with \(A_T = \sum _{t=1}^T Z_t Z_t',\) and \( C_T = \sum _{t=1}^T {{{\tilde{Z}}}}_t {{{\tilde{Z}}}}_t.\) Taking the ratio has the advantage of eliminating the processes’ variance from the asymptotic distribution. The eigenvalues of \(K_T(\epsilon )\) sorted in ascending order are denoted by \({\widehat{\delta }}_{a,K}\) with \(a=1,\ldots ,p\).

Similar to the spectral matrix G, the rank of \(K_T(\epsilon )\) is reduced to \(p-r\) under fractional cointegration. This leads to a non-parametric trace statistic whose structure is similar to the trace statistic of Johansen (1991) in the parametric context

$$\begin{aligned} W_{Niel}(\epsilon ) = T^{2\epsilon } \sum _{k=1}^{p-r} {\widehat{\delta }}_{k,K}, \quad r=1,\ldots ,p-1, \end{aligned}$$
(15)

where r is the number of cointegrating relations under the null hypothesis. Using (15) the cointegrating rank can be determined by a sequence of tests of the null hypothesis \(H_0\): \(r=r_0\) vs. \(H_1\): \(r>r_0\).

The limiting distribution is given by

$$\begin{aligned}&W_{Niel}(\epsilon ){\mathop {\rightarrow }\limits ^{d}}\text {tr}\left\{ \int _0^1W_{n-r}(d;s)W_{n-r}(d;s)'ds \right. \\&\quad \left. \left( \int _0^1 {{\widetilde{W}}}_{n-r}(d+\epsilon ;s){{\widetilde{W}}}_{n-r}(d+\epsilon ;s)'ds\right) ^{-1}\right\} , \end{aligned}$$

where \(W_{n-r}(d,u)=B_d^{n-r}(u)-\int _0^1 B_d^{n-r}(v) dv\), \({{\widetilde{W}}}_{n-r}(d+\epsilon ;u)=B_{d+\epsilon }^{n-r}(u)-\int _0^u\frac{(u-v)^{\epsilon -1}}{{\varGamma }(\epsilon )}d v\int _0^1 B_{d+\epsilon }^{n-r}(v) dv\), \(B_d^{n-r}\) is a \(n-r\) dimensional vector of mutually independent standard fractional Brownian motions of type II, and the Brownian motions driving the fractional Brownian motions \(B_{d}^{n-r}\) and \(B_{d+\epsilon }^{n-r}\) are identical.

This asymptotic distribution is non-standard and depends on the dimension p, the cointegrating rank r, the order of fractional integration \(\epsilon \) and d. In practice d can be estimated consistently, and the other parameters are known. Critical values for \(d=1\), \(\epsilon =0.1\), and \(p-r=1,2,\ldots ,8\) are given by Nielsen (2010), who recommends to use \(\epsilon =0.1\) to integrate the process because it leads to higher power than larger values whereas smaller values improve power slightly but lead to size distortions at the same time. For more details confer Nielsen (2009). Note that choosing a different order of fractional summation changes the limiting distribution which implies that the test performance is free from user-chosen tuning parameters.

To see why this test can be considered to be residual-based, note that

$$\begin{aligned} {\widehat{\delta }}_{a,K}=\frac{\widehat{\eta _a}' A_T \widehat{\eta _a}}{\widehat{\eta _a}' C_T \widehat{\eta _a}} =\frac{\sum _{t=1}^T {{\widehat{v}}}_t^2}{\sum _{t=1}^T {{\widetilde{v}}}_t^2}, \end{aligned}$$

where \(\widehat{\eta }_a\) denotes the eigenvector corresponding to \({\widehat{\delta }}_{a,K}\). Since the first r eigenvectors are consistent estimates of the cointegrating space, the first r eigenvalues are thus given by the ratio of the sum of the squared cointegrating residuals and the sum of squares of their \(\epsilon \) times integrated version \({{\widetilde{v}}}_t\). Here the squares are estimators of the respective process variances and it is assumed that \(d>1/2>d_v\). Therefore, under the null hypothesis of no fractional cointegration the enumerator grows with rate \({\mathcal {O}}_P(T^{2d})\) and the more persistent denominator grows with rate \({\mathcal {O}}_P(T^{2(d+\epsilon )})\), so that the eigenvalue has rate \({\mathcal {O}}_P(T^{-2\epsilon })\). Under the alternative of fractional cointegration with \(d_v<1/2\), the process \(v_t\) is stationary so that the process variance is finite and the enumerator grows with rate \({\mathcal {O}}_P(T)\). The denominator that may or may not be stationary due to the integration with \(\epsilon \) is \({\mathcal {O}}_P(T^{\max \left\{ 1/2, d-b+\epsilon \right\} })\). Consequently, the eigenvalue is \({\mathcal {O}}_P(T^{\min \left\{ 0,1-2(d-b+\epsilon )\right\} })\), so that it goes to zero more slowly than under the null hypothesis.

The test is restrictive in that it requires non-stationary processes and, preferably, stationary residual processes, but as shown by his Monte Carlo simulation the test still exhibits power if \(d_v>0.5\) and \(b>0\). Furthermore, it is applicable to multivariate systems and is able to estimate the number of cointegrating relations.

Wang et al. (2015) propose a simple residual-based test in a bivariate setting where \(X_t = (X_{1t}, X_{2t})'\). The test statistic is based on the partial sum of \({\varDelta }^{d_v} X_{2t}\), which is the demeaned second component series fractionally differenced with the memory order of the potential cointegrating residual \(v_t\). It is given by

$$\begin{aligned} W_{WWC} = T^{-1/2} \frac{\sum _{t=1}^T {\varDelta }^{{{\widehat{d}}}_{v}}Z_{2t} }{\sqrt{2 \pi {{\widehat{f}}}_{22}(0)}}, \end{aligned}$$
(16)

where \(f_{22}\) is the spectral density of either \(u_{2t}\) or \(e_t\) in (3), depending on whether a triangular model or a common-components model is assumed.

Under the null hypothesis \(d_v=d\) so that \({\varDelta }^{d_v} Z_{2t}\) is I(0) and the appropriately rescaled sum is asymptotically standard normal. Under the alternative \({\varDelta }^{d_v} Z_{2t}\) is I(b), so that the test statistic diverges with rate \({\mathcal {O}}_P(T^b)\).

To make this test statistic feasible the spectral density \(f_{22}\) can be estimated from the periodogram of the fractionally differenced process \({\varDelta }^{{{\widehat{d}}}} Z_{2t}\) following the approach of Hualde (2013): \({\widehat{f}}_{22}(0) = \frac{1}{(2m +1)} \sum _{j=-m}^{m} I_{{\varDelta }^{{{\widehat{d}}}} Z_2}(\lambda _j),\) where \(I_{{\varDelta }^{{{\widehat{d}}}}Z_2}(\lambda _j)\) is the periodogram of \({\varDelta }^{{{\widehat{d}}}} Z_{2t}\).

While Wang et al. (2015) are agnostic about the method that is used for the estimation of the memory parameters d and \(d_v\), they assume that \(d>1/2\) so that the cointegrating vector can be estimated using ordinary least squares. The memory orders can be estimated from \({{\widehat{v}}}_t^{OLS}\) and \(Z_{2t}\) using any of the common semiparametric estimates such as ELW with bandwidth m as in \({{\widehat{f}}}_{22}\) that fulfills the usual bandwidth conditions. The method does not impose any restrictions on the fractional cointegrating strength b. As the Monte Carlo simulations below show, the non-stationarity requirement (\(d>1/2\)) can be circumvented if the cointegrating residual \(v_t\) is based on the NBLS estimate of the cointegrating vector instead of the OLS estimate.

Zhang et al. (2019) propose an alternative estimator of the cointegrating space that is based on the eigenvectors of the non-negative matrix \({{\widehat{M}}} = \sum _{j=0}^{j_0} {\widehat{{\varOmega }}}_Z(j) {\widehat{{\varOmega }}}_Z(j)', \) where \({\widehat{{\varOmega }}}_Z(j)=\frac{1}{T} \sum _{t=1}^{T-j} Z_{t+j} Z_t'\) is the autocovariance matrix at lag j and \(j_0\) is a fixed integer. The matrix \({{\widehat{M}}}\) is thus the sum of the outer products of the first \(j_0\) autocovariance matrices with themselves. The outer product is used instead of the covariance matrices \({\widehat{{\varOmega }}}_Z(j)\) to ensure that there is no information cancellation over different lags in \({{\widehat{M}}}\). It is assumed that \(d>0.5\) and \(d_v<0.5\).

The eigenvalues of \({{\widehat{M}}}\) in descending order are denoted by \({\widehat{\delta }}_{a,M}\) for \(a=1,\ldots ,p\) and the corresponding eigenvectors are denoted by \({\widehat{\chi }}_{a,M}\). Similar to the matrix G in (1), the first \(p-r\) eigenvalues of M are non-zero, whereas the remaining r are zero. For known r the eigenvectors corresponding to the r smallest eigenvalues provide a consistent estimate of the cointegrating space.

If r is unknown, the p potential cointegrating residuals are estimated using the eigenvectors so that \({{\widehat{v}}}^M_{at} ={\widehat{\chi }}_{a,M}' X_t\). By the same argument as in the procedure of Chen and Hurvich (2006), the residual corresponding to the smallest eigenvalue is most likely a cointegrating residual with reduced memory of \(d_v=d-b\) and the residual corresponding to the largest eigenvalue is I(d).

The cointegrating rank can be estimated using a simple criterion based on the summed autocorrelations of the potential cointegrating residuals. Define

$$\begin{aligned} Q_a (k_0)&= \sum _{k=1}^{k_0} {\widehat{\rho }}_a (k), \\ \text {with } {\widehat{\rho }}_a(k)&=\frac{\frac{1}{T-k} \sum _{t=1}^{T-k} ({{\widehat{v}}}^M_{a,t+k} - \overline{{{\widehat{v}}}^M_{at}} )({{\widehat{v}}}_{at} - \overline{{{\widehat{v}}}^M_{at}})' }{\frac{1}{T} \sum _{t=1}^T ({{\widehat{v}}}^M_{at} - \overline{{{\widehat{v}}}^M_{at}})^2}, \end{aligned}$$

where \( \overline{{{\widehat{v}}}^M_{at}} \) is the mean of \({{\widehat{v}}}^M_{at}\). The cointegrating rank estimator counts the instances when the averaged autocorrelation is smaller than a threshold \(c_0\in (0,1)\):

$$\begin{aligned} {{\widehat{r}}}_{ZRY} = \sum _{a=1}^p \mathbb {1}\left\{ \frac{Q_a(k_0)}{k_0} < c_0 \right\} . \end{aligned}$$
(17)

If the residual \({{\widehat{v}}}^M_{at}\) is stationary (\(d_v<1/2\)), the rescaled sum of autocorrelations \(Q_a(k_0)/k_0\) converges to zero asymptotically for \(k_0\rightarrow \infty \), since the autocorrelations are asymptotically proportionate to \(k^{2d_v-1}\). Under certain regularity conditions this estimate is consistent. Even though the consistency is only proven for \(r\ge 1\) in Theorem 4.2 of Zhang et al. (2019), our simulations below show that it also works well in discriminating between \(r=0\) and \(r=1\).

It should be noted that the authors define \(r=p\) if all components of \(X_t\) are I(0). This leads to some abuse of notation and r cannot be interpreted as the cointegrating rank in a narrow sense. Based on their simulations Zhang et al. (2019) recommend to use \(j_0=5\), \(k_0=20\) and \(c_0=0.3\). The estimator is easy to implement and applicable to higher dimensional processes. However, the requirement of \(d>0.5\) and \(d_v<0.5\) is restrictive.

4 Monte Carlo Study

The asymptotic properties of all tests and rank estimates presented in Sect. 3 are derived by the respective authors, and some of them also present simulations to explore the finite sample results of the test statistics. This however is not the case for all tests and a comprehensive comparative study suited to guide the choice of appropriate methods in practical applications is entirely missing. To close this gap, we conduct an extensive Monte Carlo study. In addition to general results, we are particularly interested in answering two empirically motivated questions.

(i) How does correlation between the underlying short-run components influence the size of the tests? This question is important, since applied researchers will generally want to test for fractional cointegration if two related series seem to be co-moving. Similar trajectories, however, can also be generated by persistent processes with highly correlated innovations. Tests for the null hypothesis of no fractional cointegration should therefore be robust to a relatively high degree of correlation between the short-run components of the series.

(ii) Is there a notable difference in the power of the tests depending on whether the data is generated from a triangular model or from a common-components model? Both models are used in the literature to motivate and construct testing procedures, but to our knowledge simulation results are typically based on the triangular representation. In practice, either model could be justified—depending on the application. For example, if one is considered with potential fractional cointegrating relationships between stock prices, it is not clear why one of the stock prices should be seen as a perturbed version of the other one (as it is the case in the triangular model that treats the series in an asymmetric way) so that the common-components model is more suitable. In contrast to that, in the case of the potential parity between implied volatility and the expected average realized volatility over the next month (the so-called implied-realized parity analyzed by Christensen and Prabhala (1998), Christensen and Nielsen (2006b), and Nielsen (2007), among others), there is theoretical reason to assume that the implied volatility is a perturbed version of the expected average future realized volatility, since it contains a variance-risk premium (cf. Chernov (2007)). Therefore, a triangular model is more suitable.

We focus on three data generating processes (DGPs) based on the general model from Eqs. (3) to (5). For simplicity we set \(c_1=c_2=0\) and \(b=b_1=b_2\) so that the processes are mean zero and have a common memory reduction parameter. A simple bivariate model without fractional cointegration is constructed by setting \(\xi _1=\xi _2=0\). This model—referred to as (size) DGP1—is given by

$$\begin{aligned} X_{1t}&= {\varDelta }^{-d} u_{1t}\mathbb {1}\{t>0\}, \end{aligned}$$
(18)
$$\begin{aligned} X_{2t}&= {\varDelta }^{-d} u_{2t}\mathbb {1}\{t>0\}, \end{aligned}$$
(19)

where correlation between \(u_{1t}\) and \(u_{2t}\) is allowed. This is our size-DGP. For the power simulations, we consider a triangular model and a common-components model. In both cases we set \(\xi _1=\xi _2=1\) which implies a cointegrating vector of \(\beta =(1,-1)'\). The triangular model DGP2 is given by

$$\begin{aligned} X_{1t}&= Y_t + {\varDelta }^{-(d-b)} u_{1t}\mathbb {1}\{t>0\}, \end{aligned}$$
(20)
$$\begin{aligned} X_{2t}&= Y_t, \end{aligned}$$
(21)

and the common-components model DGP3 is defined by

$$\begin{aligned} X_{1t}&= Y_t + {\varDelta }^{-(d-b)}u_{1t}\mathbb {1}\{t>0\}, \end{aligned}$$
(22)
$$\begin{aligned} X_{2t}&= Y_t + {\varDelta }^{-(d-b)}u_{2t}\mathbb {1}\{t>0\}. \end{aligned}$$
(23)

In both DGP2 and DGP3 we have \(Y_t={\varDelta }^{-d} e_t\mathbb {1}\{t>0\}\). The underlying short-run components \(u_{1t}\) and \(u_{2t}\), or \(u_{1t}\) and \(e_t\)—depending on the DGP—have unit variance and correlation \(\rho \).

We consider sample sizes of \(T \in \{100, \,500, \,1000, \,2500 \}\) and values of \(d \in \{ 0.4, \,0.7, \,1\}\) in the stationary and non-stationary region. Under fractional cointegration, the memory reduction b is linked to the value of d so that \(b \in \{d/3, d\}\). Consequently, there is either a memory reduction to 0 if \(b=d\) or a weaker form of cointegration if \(b=d/3\). In order to examine the impact of correlation between the short-run components, we consider \(\rho \in \{0, \, 0.45, \, 0.9, \,0.99\} \). Note that the results for \(b=d/3\), \(\delta _m=0.55\) and further robustness tests (other size DGPs and \(p=3\)-dimensional processes) are available online as supplementary material.

The semiparametric nature of the tests and rank estimates requires several bandwidth choices. The memory estimation with (E)LW estimators involved in all methods is based on the bandwidth m that determines the number of frequencies included in the estimation. We use \(m=\lfloor T^{\delta _m} \rfloor \) with \(\delta _m=\{0.55, \, 0.75\}\) to account for sensitivities regarding bandwidth choice. With regard to the other bandwidth choices, we follow the recommendations by the authors: \(m_1 =\lfloor T^{\delta _m-0.1} \rfloor \) and \(p(T) = m_1^{-0.3}\) for Nielsen and Shimotsu (2007) or Robinson and Yajima (2002), \(l=1\) for Souza et al. (2018), \(m_3=25\) for Chen and Hurvich (2006), \(c_0=0.3\), \(j_0=5\) and \(k_0=20\) for Zhang et al. (2019), and for Marmol and Velasco (2004) we set \(m= \lfloor T^{2/3} \rfloor \) and \(m_2= \lfloor T^{\delta _m} \rfloor \). All tests are carried out allowing for a non-zero mean.

Table 1 Size (*rank estimation) based on DGP1 with \(\delta _m=0.75\)

The results presented are based on 5000 replications and a nominal significance level of \(\alpha =0.05\). Since the tests impose different conditions on d and \(d_v\), we mark the cells in the tables in bold where the methods have well-defined asymptotic properties and are supposed to deliver good results. In some cases the methods give satisfactory results beyond these limitations. For example, we implement the method of Wang et al. (2015) using a NBLS estimate of the cointegrating vector instead of the OLS estimate. This makes the test applicable in stationary time series as well as in non-stationary ones.

Since the limiting distributions of the non-pivotal test statistics of Marmol and Velasco (2004) and Nielsen (2010) depend on d and it is assumed that \(d>1/2\), it is unclear which critical values would be used in the stationary region. The respective fields are therefore left blank.

Further, it should be noted that the methods of Nielsen and Shimotsu (2007) (or Robinson and Yajima (2002)) and Zhang et al. (2019) are not tests but rank estimates. Instead of the rejection frequency, we therefore report the ratio of correctly estimated cointegrating ranks. Therefore, the results cannot be interpreted as size or power, and in the size table and graphs the estimates should yield 0 instead of 0.05, since the estimates do not involve any significance level.

Table 1 displays size results based on DGP 1 with \(\delta _m=0.75\). The methods that have well defined asymptotic properties across all parameter constellations covered in the table are those of Nielsen and Shimotsu (2007), Chen and Hurvich (2006), Hualde and Velasco (2008), and Souza et al. (2018). It can be observed that all of these methods achieve good size properties for \(\rho =0\), except for the test of Chen and Hurvich (2006), when \(d=0.4\). Among those four procedures, only the tests of Souza et al. (2018) (disregarding the smallest sample) and Chen and Hurvich (2006) do not over-reject if \(\rho \) increases.Footnote 3 For low values of d the test of Hualde and Velasco (2008) already becomes oversized for \(\rho =0.45\) and as \(\rho \) increases it becomes oversized for higher values of d, too. The rank estimation procedure of Robinson and Yajima (2002) and Nielsen and Shimotsu (2007) is even more affected and estimates a cointegrating rank of one in nearly all cases if \(\rho \ge 0.9\).

In addition to the tests of Souza et al. (2018) and Chen and Hurvich (2006), the modified version of the test by Wang et al. (2015) that is based on the NBLS estimator instead of OLS also maintains satisfactory size properties across all values of \(\rho \) and d.

The group of procedures that is only applicable to non-stationary systems consists of Marmol and Velasco (2004), Nielsen (2010), and Zhang et al. (2019). It can be observed that the procedure of Marmol and Velasco (2004) behaves similar to that of Hualde and Velasco (2008) in the sense that it is very liberal for higher values of \(\rho \) and lower values of d. For non-stationary series and larger sample sizes the procedure by Zhang et al. (2019) estimates correctly the cointegrating rank to be zero—independently of the degree of correlation. The variance-ratio statistic of Nielsen (2010) turns out to be slightly liberal for \(d=0.7\) in larger samples, but holds the nominal size for \(d=1\) even in small samples. In particular, the performance is independent of the degree of correlation.

Finally, the test of Robinson (2008a) is only applicable for stationary systems. Here, it can be observed that the test does not hold its size for \(\rho =0\). This is because the Hausman-testing principle requires one of the estimates of the memory parameter to be more efficient than the other one, but the multivariate estimate is not more efficient in absence of correlation. For other values of \(\rho \), however, the test has good size properties. Interestingly, the test also has good size properties if \(d=1\), even though it assumes stationarity. The intermediate value of \(d=0.7\), on the other hand, leads to a moderately oversized test.

Fig. 1
figure 1

Size (*rank estimation) based on DGP1 depending on correlation \(\rho \in \{0,0.99\}\) and bandwidth \(\delta _m\in \{0.55, 0.75\}\) with \(T=1000\)

Figure 1 analyzes the interaction between the degree of correlation \(\rho \) and the choice of the bandwidth \(\delta _m\). It shows the size of the tests in scatterplots where the results with no correlation (\(\rho =0\)) are plotted against results with high correlation (\(\rho =0.99\)). In the upper panel, tests that allow for stationary processes (\(d=0.4\)) and in the lower panel (\(d=1\)) the non-stationarity-robust tests, i.e. all except that of Robinson (2008a), are displayed. The dashed lines mark the nominal size level of 0.05 so that ideally all points would lie on the intersection between these two lines. The dotted line is the bisector implying that methods above the bisector do better with correlation and methods below the bisector do better without. Black symbols give results with a bandwidth parameter of \(\delta _m=0.75\) and gray symbols with \(\delta _m=0.55\).

It can be observed that the procedures by Marmol and Velasco (2004), Nielsen and Shimotsu (2007) and Hualde and Velasco (2008) lie below the bisector and are thus negatively affected by high correlation, whereas the tests by Chen and Hurvich (2006) and Robinson (2008a) lie above the bisector. The remaining tests lie on the bisector indicating robustness to correlation. Regarding bandwidth choice, the tests by Marmol and Velasco (2004), Chen and Hurvich (2006), Hualde and Velasco (2008) and Wang et al. (2015) are more liberal with a small bandwidth, whereas Nielsen and Shimotsu (2007), Robinson (2008a), Nielsen (2010), Souza et al. (2018) and Zhang et al. (2019) are relatively robust in terms of size. In general, correlation in the underlying short-run component is mistaken for cointegration more often in stationary systems than in non-stationary ones.

Overall, in terms of size for bivariate systems and taking the range of admissible parameter values into account, we find that the test of Souza et al. (2018) has the best performance, followed by those of Chen and Hurvich (2006) and Wang et al. (2015). Considering procedures only applicable to non-stationary systems, Nielsen (2010) and Zhang et al. (2019) are very reliable options as well.

Table 2 Power (*rank estimation), \(b=d\) and \(\delta _m=0.75\) for the triangular model (DGP2)
Table 3 Power (*rank estimation) with \(b=d\) and \(\delta _m=0.75\) for the common-component model (DGP3)

To analyze the power of the procedures, we focus on the triangular representation in DGP2 with \(b=d\) so that the memory reduces to zero in the cointegrating relation. Again, \(\delta _m\) is set to 0.75. The results are shown in Table 2. In the following, we focus on the results for parameter constellations for which the tests have reasonable size properties.

It can be seen that the rank estimate of Nielsen and Shimotsu (2007) correctly identifies the presence of fractional cointegration even in relatively small samples. Since the estimate works well under the null hypothesis if \(\rho \) is low, it clearly outperforms its competitors in this situation. The power of the test of Hualde and Velasco (2008) is also high, but it suffers from similar size issues in case of strongly correlated short-run components.

Among the tests that are more widely applicable the approach of Souza et al. (2018) generates higher power than that of Wang et al. (2015) (except for \(\rho =0.99\)), which in turn outperforms the approach of Chen and Hurvich (2006). Furthermore, it can be seen that the test of Souza et al. (2018) outperforms more restrictive approaches in small samples such as those of Robinson (2008a) and Nielsen (2010). In large sample this observation vanishes. For the test of Chen and Hurvich (2006) we can observe that the power is lower for \(d=0.7\) than for other values of d. Furthermore, the power becomes non-monotonic in T in some cases. This effect is likely to be caused by the fact that the order of differentiation required may be estimated incorrectly for intermediate values of d. The approach of Zhang et al. (2019) behaves similarly as that of Nielsen (2010).

With regard to the test of Robinson (2008a), it is noteworthy that the power is considerably lower for \(\rho =0.9\) than it is for \(\rho =0.45\) or \(\rho =0.99\). Further simulation results on this V-shaped dependence pattern between the power of the test and \(\rho \) (not reported here) show that the test has no power if \(\rho =0.8\) and its power is very low in a neighborhood of this point.

The test of Marmol and Velasco (2004) develops good power for stationary values of \(d_v\), even though its theoretical properties are derived under the assumption that \(d_v>0.5\).

Overall, we find that the rank estimation of Nielsen and Shimotsu (2007) performs best in identifying the correct order of fractional cointegration if the correlation between the series is low. For non-stationary data Nielsen (2010) is a good choice, and among the more broadly applicable methods the test of Souza et al. (2018) clearly performs best in terms of size and power.

The previous table is generated based on the triangular model (DGP2), but we are also interested in the performance based on the common-components model (DGP3). Those results are displayed in Table 3. It can be seen that there is a number of striking differences in the relative performance of the tests. For low values of d, the rank estimation procedure of Nielsen and Shimotsu (2007)/Robinson and Yajima (2002) loses precision. At the same time, the test of Chen and Hurvich (2006) becomes more powerful so that overall the two procedures become comparable in terms of their ability to identify the correct rank. Unfortunately, the non-monotonicity of the test of Chen and Hurvich (2006) for intermediate values of d becomes even more apparent.

The test of Souza et al. (2018) still performs relatively well—especially for larger values of d. The same holds true for that of Wang et al. (2015) which reaches a relatively high power in smaller samples but approaches 1 only slowly.

With respect to the other tests, it can be seen that the test of Hualde and Velasco (2008) has very good power properties—also for low values of \(\rho \) where it maintains its size. The procedures of Nielsen (2010) and Zhang et al. (2019) have lower power for \(d=0.7\) in small samples but in all other constellations their power results are very good.

Fig. 2
figure 2

Power (*rank estimation) depending on model specification (DGP2 or DGP3) and bandwidth \(\delta _m\in \{0.55,0.75\}\) with \(T=1000\), \(\rho =0.45\), and \(b=d/3\)

We further conduct a similar analysis to that for the size in Figure 2 in order to analyze the effect of the model construction and that of the bandwidth choice on the power of the procedures. As before, black symbols represent results with \(\delta _m=0.75\) and gray symbols represent \(\delta _m=0.55\). The values of d and b are selected such that the power of the procedures tends to be low and changes in their behavior are easier to identify. First of all, most procedures perform better in the triangular model except for the tests by Chen and Hurvich (2006) and Robinson (2008a). Furthermore, while an increase of the bandwidth leads to a considerable power gain for the tests of Chen and Hurvich (2006), Robinson (2008a), and Souza et al. (2018), the approaches of Marmol and Velasco (2004), Hualde and Velasco (2008) and Nielsen and Shimotsu (2007) have higher power with a smaller bandwidth—at least in the common-components model. This, however, might be due to the larger size distortions visible in Figure 1. The performance of the approaches of Nielsen (2010) and Wang et al. (2015) is relatively independent of the bandwidth choice. For the test of Nielsen (2010) this is explained by the fact that the bandwidth only influences the estimate of d that determines the correct set of critical values. The test statistic itself does not depend on the bandwidth.

Table 4 Size (*rank estimation) based on DGP1 and power (*rank estimation) based on DGP2 with \(\delta _m=0.75\), \(\rho =0.45\), and \(b=0\)/\(b=d\) including short-run components

A strong advantage of semiparametric methods is the theoretical robustness to short-run dynamics. We therefore consider the previous experiments with \(u_{1t}, u_{2t}, e_t\) as AR(1) processes with \(\phi \in \{-0.5,0.5\}\). Table 4 contains both size and power results. Most tests and rank estimates exhibit performance differences compared to the white noise case, in particular with respect to size. For Souza et al. (2018), Wang et al. (2015) and Zhang et al. (2019) we observe sensitivity to the sign of the AR parameter as they tend to become liberal (or have a higher error rate for Zhang et al. (2019)) with \(\phi =-0.5\) and become conservative (have a lower error rate) with \(\phi =0.5\). Additionally, the test by Souza et al. (2018) does not hold the nominal size if \(d=1\) and \(\phi =0.5\). Similar as the estimate of Zhang et al. (2019) does not yield 0 anymore, the one by Nielsen and Shimotsu (2007) has an error rate of 2-10% instead of 0%. The test of Chen and Hurvich (2006) becomes more conservative with stationary data and gets closer to the nominal significance level for non-stationary data, and the procedures by Hualde and Velasco (2008) and Marmol and Velasco (2004) both turn out very conservative. The test by Nielsen (2010) holds the nominal size only with \(d=1\) and a negative AR parameter and becomes very liberal in the other cases. In comparison to analogous results without short-run dynamics we find weaker performances in small samples for the tests by Robinson (2008a) and Marmol and Velasco (2004) in terms of size, and for the methods by Nielsen and Shimotsu (2007), Souza et al. (2018) and Marmol and Velasco (2004) in terms of power. In general, we observe a tendency of slightly lower power in some occasions like stationary data with the tests of Chen and Hurvich (2006) and Souza et al. (2018), but overall the power results differ not that much compared to the white noise case.

5 Conclusion

This review is written with the objective to provide guidance for the selection of methods in practical applications. We judge the methods based on (i) the range of values of d and b that are allowed, (ii) the ability to distinguish correctly between common trends and correlated innovations, and (iii) the performance across different DGPs—namely triangular systems as well as common-components models.

Based on our Monte Carlo studies, we find that some of the proposed approaches have weaknesses in their finite sample behavior in some empirically relevant scenarios—especially in presence of correlated short-run components. This concerns mostly the methods of Nielsen and Shimotsu (2007) (or Robinson and Yajima (2002)), Marmol and Velasco (2004), and Hualde and Velasco (2008) that have the highest power but have size issues in case of strongly correlated short-run components. With regard to iii.), we find that the size properties of the tests in the triangular case and the common-components model is generally comparable (see online material). For the power of the tests, however, there are important differences between the two cases. In particular, the test of Chen and Hurvich (2006) has much better power for stationary systems under the common components specification, whereas the methods of Robinson and Yajima (2002) and Hualde and Velasco (2008) become worse in their ability to detect fractional cointegration.

Although the methods of Robinson (2008a), Nielsen (2010), and Zhang et al. (2019) turn out to be robust to short-run correlation and are appealing due to their simplicity, they impose practically relevant restrictions on the permissible range of d and b. However, if there is prior knowledge about the (non-) stationarity of the data, those procedures are very good options.

Overall, we conclude that the test of Souza et al. (2018) for bivariate systems has the best properties, both theoretically and empirically, and is a good choice for the applied econometrician. It allows for the whole empirically relevant range of d and b, it is robust to correlation and short-run dynamics with positive coefficients, and it provides comparable performance in both—triangular systems and common-components models.

In higher dimensional systems, however, the test of Souza et al. (2018) is no longer applicable and that of Chen and Hurvich (2006) turns out to be liberal in finite samples from stationary processes. Here, the procedure of Robinson (2008a) can be recommended for stationary processes and the rank estimation by Nielsen (2010) and Zhang et al. (2019) should be preferred for non-stationary systems if the cointegrating residuals can be expected to be stationary.