Rank based cointegration testing for dynamic panels with fixed T
Abstract
In this paper, we show that the cointegration testing procedure of Binder et al. (Econom Theory 21:795–837, 2005) for Panel Vector Autoregressive model of order 1, PVAR(1) is not valid due to the singularity of the hessian matrix. As an alternative we propose a method of moments based procedure using the rank test of Kleibergen and Paap (J Econom 133:97–126, 2006) for a fixed number of time series observations. The test is shown to be applicable in situations with timeseries heteroscedasticity and unbalanced data. The novelty of our approach is that in the construction of the test we exploit the “weakness” of the Anderson and Hsiao (J Econom 18:47–82, 1982) moment conditions. The finitesample performance of the proposed test statistic is investigated using simulated data. The results indicate that for most scenarios the method has good statistical properties. The proposed test provides little statistical evidence of cointegration in the employment data of AlonsoBorrego and Arellano (J Bus Econ Stat 17:36–49, 1999).
Keywords
Dynamic panel data Panel VAR Cointegration Fixed T consistency1 Introduction
In this paper, we consider the cointegration testing problem for Panel VAR model of order 1 with a fixed time dimension. Up to date the only testing approach in this case is the likelihood ratio test based on the transformed maximum likelihood (TML) estimator of Binder et al. (2005) [hereafter BHP]. However, in the univariate setup it is known that for data with autoregressive parameter close to unity, the likelihood approach does not have a gaussian asymptotic limit, see e.g. Kruiniger (2013). We extend that result to multivariate setting and argue that the cointegration testing procedure of Binder et al. (2005) is not valid due to the singularity of the corresponding expected hessian matrix.
To the best of our knowledge, in the fixed T dynamic panel data (DPD) literature no feasible method of moments (or leastsquares) alternative to likelihood based cointegration testing procedures is available. The main reason for the absence of method of moments based alternatives is that jacobian matrix of the Anderson and Hsiao (1982) moment conditions is of reduced rank, when process is cointegrated. It is natural to use this information and consider a rank based cointegration test, based on the rank of the jacobian matrix. In this paper, we propose such a test and show that it is applicable in situations with timeseries heteroscedasticity and, unlike the likelihood based tests, the new test does not require any numerical optimization. At the same time, this procedure cannot provide inference that is uniform over the parameter space, as the asymptotic distribution of the test depends on the properties of the initial condition.
In the Monte Carlo section of this paper, we investigate the finite sample properties of the proposed procedure. We find that the new testing procedure provides a good size control as well as high power in most of the designs considered. However, in some setups this test lacks power if the data generating process for the initial condition substantially deviates from stationarity.
The paper is structured as follows. In Sect. 2, we briefly present the model, the testing problem at hand and the results for the testing procedure of Binder et al. (2005). Rankbased cointegration testing procedure is formally introduced in Sect. 3. In Sect. 4, we continue with the finite sample performance by means of a Monte Carlo analysis. In Sect. 5, we illustrate the testing procedure using the data of AlonsoBorrego and Arellano (1999). Section 6 concludes.
Here we briefly discuss notation. Bold uppercase letters are used to denote the original parameters, i.e. \(\{\varvec{\varPhi },\varvec{\varSigma },\varvec{\varPsi }\}\), while the lowercase letters \(\{\varvec{\phi },\varvec{\sigma },\varvec{\psi }\}\) denote \({\text {vec}}\,{(\cdot )}\) (\({\text {vech}}{(\cdot )}\) for symmetric matrices) of corresponding parameters, in the univariate setup corresponding parameters are denoted by \(\{\phi ,\sigma ^{2},\psi ^{2}\}\). We use \(\rho (\varvec{A})\) to denote the spectral radius^{1} of a matrix \(\varvec{A}\in \mathbb {R}^{n\times n}\). We define \(\bar{\varvec{y}}_{i}\equiv (1/T)\sum _{t=1}^{T} \varvec{y}_{i,t1}\) and similarly \(\bar{\varvec{y}}_{i}\equiv (1/T)\sum _{t=1}^{T} \varvec{y}_{i,t}\). We use \(\tilde{\varvec{x}}\) to indicate variables after Within Group transformation (for example \(\tilde{\varvec{y}}_{i,t}\equiv \varvec{y}_{i,t}\bar{\varvec{y}}_{i}\)), while \(\ddot{\varvec{x}}\) are used for variables after a “quasiaveraging” transformation.^{2} For further details, see Abadir and Magnus (2002). Where necessary, we use the 0 subscript to denote the true value of the parameters, e.g. \(\varvec{\varPhi }_{0}\).
2 Cointegration testing for fixed T panels
2.1 The model
2.2 The likelihood ratio test of BHP
 (TML 1)
The error terms \(\varvec{\varepsilon }_{i,t}\) are i.i.d. across i and uncorrelated over time \({\text {E}}[\varvec{\varepsilon }_{i,t}\varvec{\varepsilon }_{i,s}']=\mathbf {O}_{m}\) for \(s\ne t\), and \({\text {E}}[\varvec{\varepsilon }_{i,t}\varvec{\varepsilon }_{i,t}']=\varvec{\varSigma }\) for \(t>0\). \({\text {E}}[\Vert \varvec{\varepsilon }_{i,t}\Vert ^{4}]<\infty \) holds \(\forall t\).
 (TML 2)
The initial deviations \(\varvec{u}_{i,0}\equiv \varvec{y}_{i,0}\varvec{\mu }_{i}\) are i.i.d. across i, with \({\text {E}}[\varvec{u}_{i,0}]=\varvec{0}_{m}\) and positive definite \({\text {E}}[\varvec{u}_{i,0}\varvec{u}_{i,0}']=\varvec{\varSigma }_{\varvec{u}_{0}}\). \({\text {E}}[\Vert \varvec{u}_{i,0}\Vert ^{4}]<\infty \) holds.
 (TML 3)
The following moment restrictions are satisfied: \({\text {E}}[\varvec{\varPi }\varvec{u}_{i,0}\varvec{\varepsilon }_{i,t}']=\mathbf {O}_{m}\) for all i and \(t=1,\ldots ,T\).
 (TML 4)
\(N \rightarrow \infty \), T is fixed.
 (TML 5)
Denote by \(\varvec{\kappa }\) a \([k \times 1]\) vector of unknown coefficients. \(\varvec{\kappa }\in \varvec{\varGamma }\), where \(\varvec{\varGamma }\) is a compact subset of \(\mathbb {R}^{k}\) and \(\varvec{\kappa }_{0}\in \mathrm {interior}(\varvec{\varGamma })\), while \(\rho (\varvec{\varPhi }_{0})\le 1\).
One of the standard regularity conditions for extremum estimators, is that the asymptotic (or expected) hessian matrix, \({\varvec{\mathcal {H}}}_{\ell }\equiv {\text {E}}[\varvec{\mathcal {H}}^{N}(\varvec{\kappa }_{0})]\) is positive definite. In Bond et al. (2005), authors showed that for the TML estimator of Hsiao et al. (2002) (which is a special case of Binder et al. 2005 for \(m=1\)) this regularity condition is violated. In the next theorem we show that the same conclusion extends to a more general case with \(m\ge 1\).
Theorem 1
Proof
In the Appendix. \(\square \)
As the TML estimator can be seen as a nonlinear MM estimator with the score vector defining the moment conditions, singularity of the \({\varvec{\mathcal {H}}}_{\ell }\) matrix can be seen as a “weak instrument” problem (using the GMM notation). The singularity result in Theorem 1 is of special interest when the inference regarding the rank of \(\varvec{I}_{m}\varvec{\varPhi }_{0}\) is concerned.
It is important to note that despite singularity of \({\varvec{\mathcal {H}}}_{\ell }\), the TML estimator \(\hat{\varvec{\kappa }}_{TMLE}\) remains consistent, hence the identification part of Remark 4.1. in BHP is correct. However, as a result of singularity the limiting distribution for this estimator is nonstandard. Using the approach of Roznitzky et al. (2000), Ahn and Thomas (2006) showed that in the univariate model (i.e. \(m=1\)), the TML estimator of \(\phi \) converges at the \(N^{1/4}\) rate to a nonstandard distribution.^{8} Additionally, they show that LR test statistic for \(H_{0}: \phi _{0}=1\) has a mixture distribution, of a \(\chi ^{2}(1)\) random variable and a degenerate random variable that takes value 0 with probability 1, with equal mixing weights of 0.5. In this paper, we do not attempt to study the distributional consequences of the singularity for the LR test and leave it for future research.^{9} Based on results in Dovonon and Renault (2009) (for GMM), it is known that for general rank deficiencies the maximal rate of convergence is \(N^{1/4}\). However, no results regarding the behavior of the estimator (see discussion in Dovonon and Hall 2016) and the LR ratio test in cases like ours are available. As a result, it is not obvious that using the critical values from the \(\chi ^{2}\) distribution with \((mr)^{2}\) degrees of freedom results in a conservative test.
Although the unit root model is not of prime importance for the main topic of this paper, Theorem 1 provides a natural starting point for intuition of the next result. For the unit root case (i.e. \(\varvec{\varPhi }_{0}=\varvec{I}_{m}\)) the expression for \({\varvec{\mathcal {H}}}_{\ell }\) simplifies dramatically as \(\varvec{\varSigma }_{0}=\varvec{\varTheta }_{0}\). That allowed us in Theorem 1 to show that \({\varvec{\mathcal {H}}}_{\ell }=0\) for any value of \(\varvec{\varSigma }_{0}\) and T. Unfortunately, no result of this type is available when \(\varvec{\varPi }\) is of reduced rank \(r>0\). However, some special results can be derived for \(T=2\).
Proposition 1
Proof
In the Appendix. \(\square \)
This quantity is smaller than \(m^{2}\) for all \(m\le 4\) (note that the bivariate PVAR model is analyzed in most empirical studies with limited number of timeseries observations). It follows that for cases of most empirical value the expected hessian matrix is singular and the corresponding estimator does not have a normal limiting distribution. Although in this paper we do not prove more general results for \(T>2\), we performed numerous numerical evaluations of \({\varvec{\mathcal {H}}}_{\ell }\) for larger values of T and different combinations of population matrices in the bivariate setup.^{10} For all setups we found that the expected hessian matrix is singular for \(r<m\) and of full rank otherwise. Given these results the unit root and cointegration testing procedure of BHP that is based on asymptotic \(\chi ^{2}(\cdot )\) critical values is not asymptotically valid.
Remark 1
Alternatively, instead of considering likelihood function for observations in first differences one can consider a correlated random effects likelihood function (conditional on \(\varvec{y}_{i,0}\)) as in Arellano (2016) and Kruiniger (2013). Although we do not formally consider a possible singularity of the hessian matrix for that estimator, we conjecture that the main conclusions of this paper are also applicable to that approach (Ahn and Thomas 2006; Kruiniger 2013 proved this for \(m=1\)).
Remark 2
Note that the results of this section are derived under assumption that \(\varvec{\varPsi }\) is estimated without any restrictions, i.e. as suggested by Binder et al. (2005). If one instead imposes some restrictions on this parameter matrix, e.g. covariance stationary, it is possible that the expected hessian matrix has full rank. For example, Kruiniger (2008) considers univariate results, where he shows that for \(\phi _{0}=1\), the TML estimator retains standard asymptotic properties if the stationarity assumption is used in estimation.
3 Jacobian based testing
3.1 Regularity conditions
 (A.1)
The error terms \(\varvec{\varepsilon }_{i,t}\) are i.i.d. across i and uncorrelated over time, \({\text {E}}[\varvec{\varepsilon }_{i,t}\varvec{\varepsilon }_{i,s}']=\mathbf {O}_{m}\) for \(s\ne t\), and \({\text {E}}[\varvec{\varepsilon }_{i,t}\varvec{\varepsilon }_{i,t}']=\varvec{\varSigma }_{t}\) for \(t>0\). \({\text {E}}[\Vert \varvec{\varepsilon }_{i,t}\Vert ^{4}]<\infty \) holds \(\forall t\).
 (A.2)
The \(\varvec{\mu }_{i}\) are i.i.d. across i, with \({\text {E}}[\varvec{\mu }_{i}]=\varvec{0}_{m}\) and \({\text {E}}[\varvec{\mu }_{i}\varvec{\mu }_{i}']=\varvec{\varSigma }_{\varvec{\mu }}\). Furthermore, for all i and \(t\ge 0\), \({\text {E}}[\varvec{\mu }_{i}\varvec{\varepsilon }_{i,t}']=\mathbf {O}_{m}\). \({\text {E}}[\Vert \varvec{\mu }_{i}\Vert ^{4}]<\infty \) holds.
 (DGP.1)
\(\varvec{\varepsilon }_{i,0}\sim (\varvec{0}_{m},\varvec{\varSigma }_{0})\) with \(\varvec{\varSigma }_{0}\) positive (semi)definite matrix.
 (DGP.2)
\(\varvec{\varepsilon }_{i,0}=\sum _{l=0}^{M}\varvec{\varPhi }^{l}\varvec{\varepsilon }_{i,l}^{*}\). Here M is assumed to be finite.
 (DGP.3)
\(\varvec{\varepsilon }_{i,0}=\sum _{l=0}^{\infty }\left( \varvec{\varPhi }^{l}\varvec{C}\right) \varvec{\varepsilon }_{i,l}^{*}+\varvec{C}\varvec{\xi }_{i}\). Here \(\varvec{\xi }_{i}\) is an \([m\times 1]\) vector of the (independent) individualspecific initialization effects.^{14}
3.2 Rank test
Theorem 2

Under which conditions one can interpret rejection/nonrejection of the \(\mathbf{r k(r)}\) test as an evidence regarding the rank of \(\varvec{\varPi }\)?
 (IDN)
Matrix \(\varvec{\varPhi }^{t1}(\varvec{\varUpsilon }\varvec{I}_{m})\varvec{\varSigma }_{\varvec{\mu }}\) is such that \({\text {E}}[\varvec{u}_{i,t1}\varvec{y}_{i,t1}']\) has a full rank m.
Until now we considered only the jacobian of Anderson and Hsiao (1982) moment conditions, however, for \(T>2\) further lags \(\varvec{y}_{i,tj}\), can be used. Nevertheless, it is not clear that the use of lags j larger than \(j>1\) still ensures that, even in the effect stationary case, \({\text {E}}[\overline{\Delta \varvec{y}_{i,t}\varvec{y}_{i,tj}'}_{T}]\) has reduced rank r if and only if \({\text {rk}}\,{\varvec{\varPi }}=r\). Moreover, the power of the test might be substantially affected by the choice of lags, as with any alternative close to the unit circle we encounter the weak instruments problem for any distanced lags. On the other hand, we can expect a better test power to the alternatives with substantially lower \(\rho (\varvec{\varPhi })\).
Remark 3
If the model contains time effects \(\lambda _{t}\), the test statistic needs to be modified using variables in deviations from their crosssectional averages \(\check{\varvec{y}}_{i,t}\equiv \varvec{y}_{i,t}(1/N)\sum _{i=1}^{N}\varvec{y}_{i,t}\) rather than levels.
Remark 4
One important advantage of the proposed test statistic is the additional flexibility while dealing with unbalanced panels. As long as for every individual i at least one \(\Delta {}\varvec{y}_{i,t}\varvec{y}_{i,t1}'\) (\(t>1\)) term is available, the test statistic can be computed. The only difference as compared to the balanced case is that an individual contribution to \(\overline{\Delta {}\varvec{y}_{i,t}\varvec{y}_{i,t1}'}_{T}\) is no longer a simple averages with \(T1\) terms, but has an individual specific number of observations \(T_{i}1\).
Remark 5
The testing procedure remains valid if, as suggested by Kleibergen and Paap (2006), instead of \(\overline{\Delta \varvec{y}_{i,t}\varvec{y}_{i,t1}'}_{T}\) we investigate the rank of \(\varvec{D}=\varvec{G}_{N}\overline{\Delta \varvec{y}_{i,t}\varvec{y}_{i,t1}'}_{T}\varvec{F}_{N}\) (for any full rank matrices \({{\mathrm{plim}}}_{N\rightarrow \infty }\varvec{G}_{N}=\varvec{G}\) and \({{\mathrm{plim}}}_{N\rightarrow \infty }\varvec{F}_{N}=\varvec{F}\)). One interesting special case is obtained when we set \(\varvec{G}_{N}=\varvec{I}_{m}\) and \(\varvec{F}_{N}^{1}=\frac{1}{N}\sum _{i=1}^{N}\frac{1}{T1}\sum _{t=2}^{T}\varvec{y}_{i,t1}\varvec{y}_{i,t1}'\), as in this case we are testing the rank of the pooled OLS estimator \(\hat{\varvec{\varPi }}\). Even though the estimator itself is inconsistent (due to the presence of the unobserved heterogeneity), as we show in this paper, it can be used for estimation of \({\text {rk}}\,{\varvec{\varPi }_{0}}\).
3.3 Discussions
In this section we summarize some of the underlying assumptions, and related problems, for the rkJ test.
Effect nonstationarity Recall that results in Theorem 2 are written in terms of the rank of \(\overline{\Delta {}\varvec{y}_{i,t}\varvec{y}_{i,t1}'}_{T}\) rather than \(\varvec{\varPi }\). This suggests that if one uses this rank test to perform the sequential procedure in testing the rank of \(\varvec{\varPi }\) the procedure is “conservative”, i.e. as for some values of \(\varvec{\varUpsilon }\), the jacobian can be of a reduced rank, even if \(\varvec{\varPi }\) is of full rank. However, the rank of \(\overline{\Delta {}\varvec{y}_{i,t}\varvec{y}_{i,t1}'}_{T}\) can never be larger than the rank of \(\varvec{\varPi }\). In such situations, the rkJ procedure controls the size of the test for \(\varvec{\varPi }\), i.e. it rejects in at most \(\alpha \%\) cases, and it never gets larger than the nominal level, thus this test controls the size uniformly over (\(\varvec{\varSigma }, \varvec{\varUpsilon }\)). However, for some combinations of the nuisance parameters (\(\varvec{\varSigma }, \varvec{\varUpsilon }\)), the power of this test does not converge to 1 as \(N\rightarrow \infty \) even if \({\text {rk}}\,(\varvec{\varPi })>r_{0}\), and as a result such testing procedure lacks power. Thus, it is difficult to draw general conclusions about the properties of the rkJ test when one does not reject the null hypothesis and it is likely that \(\varvec{y}_{i,t}\) process is not effect stationary.
Common dynamics assumption Throughout the paper we maintain the common dynamics assumption for \(\varvec{\eta }_{i}\). In the univariate case, it is known that if this assumption is satisfied the moment conditions are not relevant at unity, see a more detailed discussion in Bun and Kleibergen (2016). On the other hand, if the common dynamics assumption is violated, it is possible to have a full rank jacobian even for \(\varvec{\varPhi }_{0}=\varvec{I}_{m}\), see the aforementioned paper and the discussion in Hayakawa and Nagata (2016). Hence, even if \(\varvec{\varPi }\) matrix is of reduced rank \(r<m\) the rank of the jacobian matrix can be of full rank m, when the common dynamics assumption is violated. In this case the rejection of the null hypothesis of the rkJ test, is not informative about the underlying rank of the \(\varvec{\varPi }\).^{19}
Initialization For initialization we assumed that \({\text {var}}{\varvec{\varepsilon }_{i,0}}\) is well defined irrespective of the timeseries properties of the data. In the univariate setting, it is known that if e.g. \({\text {E}}[\lim _{\phi \rightarrow 1}(1\phi )\varepsilon _{i,0}^{2}]>0\) then the Anderson and Hsiao (1982) moment conditions have a full rank jacobian matrix. Nevertheless, as discussed in Bun and Kleibergen 2016) this does not imply that \(\phi \) parameter is identified.^{20} Note that the initializations of this type would imply that the crosssectional average of \(y_{i,t}\) is not well defined for any \(t\ge 0\) which is a rather unrealistic assumption to make.
These issues cannot be underestimated in empirical work. However, at the same time we acknowledge that in order to obtain testing procedures that controls size uniformly^{21} one would have to rely on procedures that are numerically challenging, i.e. subset inference using the continuously updated GMM estimator. We should also emphasize that most of the testing procedures for dynamic panel data (especially for persistent data) fail to guarantee uniform inference over the parameter space of autoregressive parameter and/or initialization of the initial condition.^{22}
4 Monte Carlo simulations
Comparing our setup to BHP, we can see that design 3 of BHP is achieved when \(\alpha =0.5\) and \(\lambda =0.0\) (as they consider size). In order to match our designs with the empirical application, we also considered \(N=750\), however the results are qualitatively and quantitatively similar to \(N=500\), thus omitted. Other design parameters are also chosen to match some of the properties of the empirical application, as \(\varvec{\varUpsilon }^{(5)}\) is based on the estimates in Arellano (2016) obtained from the bivariate panel of Spanish firm data.
In terms of the test power, we suspect that it should be decreasing with \(\lambda \), with almost no power against alternatives with \(\lambda \approx 0\). However, it is very likely that for general \(\varvec{\varUpsilon }\) matrices the power curve might not be monotonic because \(\lambda \) not only controls the rank of \(\varvec{\varPi }\) but as well (indirectly) the eigenvalues of the \({\text {E}}[\varvec{u}_{i,t1}\varvec{y}_{i,t1}']\) matrix. Hence, for some specific choices of \(\varvec{\varUpsilon }\) we can observe the weak instruments problem of Anderson and Hsiao (1982) moment conditions that is not caused by the reduced rank of \(\varvec{\varPi }\) matrix.
4.1 Results
The results for all designs are summarized at the top part of Tables 6, 7, and 8 (\(\theta =0\)). All rejection frequencies are rounded up to two digits. Empty entries indicate maximal power of 1, 00.
General patterns First of all, we can observe that rejection frequencies are monotonically decreasing in \(\lambda \) for the vast majority of designs without spatial dependence. As we discussed in Sect. 3.2 this property should not be taken as granted for the rkJ test (as dependence on \(\varvec{\varPhi }\) is nonlinear). For lower values of N the test tends to be undersized for \(T=3\) and oversized for \(T=7\).^{27} In the effect stationary case \(\tau \) does not play substantial role and only affects the \(\varvec{V}\) matrix, but we can still observe that higher value of \(\tau \) is associated with slightly lower power. For \(N=500\), the rkJ test has notable power even when \(\lambda \) is very close to 0. For instance, all rejection frequencies in the effect stationary designs at \(\lambda =0.005\) are above 30% and 25% for \(\alpha =0.5\) and \(\alpha =0.1\) respectively. In the vast majority of cases with size distortions being of similar magnitude, the test power for \(\alpha = 0.5\) tends to be higher than for \(\alpha = 0.1\).
Effect nonstationarity and nonmonotonic power curves First, we consider rejection frequencies for \(\varvec{\varUpsilon }=0.5\times \varvec{I}_{m}\) as this case is most exceptional in terms of observed patterns. In this case we observe power curves that are not monotonic for \(\alpha =0.1\) (especially for \(N=250\)) and sharply decreasing for \(\alpha =0.5\) if \(\tau =5\) and \(T=3\). It can be intuitively explained as in this case the effect nonstationarity term in \({\text {E}}[\Delta {}\varvec{y}_{i,t}\varvec{y}_{i,t1}']\) is negative, driving the whole expression towards the zero matrix (recall the analysis in Hayakawa 2009 for the univariate case). Thus, we have a weak instrument problem under the alternative hypothesis that is not induced by cointegration.^{28} By varying \(\lambda \) parameter we directly vary the relative contributions of time invariant and time varying parts of the variance components in \({\text {var}}{\varvec{y}_{i,t}}\). For larger values of \(\lambda \) the time invariant part is more pronounced, resulting in substantial effects of the “negative” effect stationarity. On the other hand, for \(\lambda \approx 0\) the idiosyncratic part is dominant and there is no substantial effects of the “negative” effect nonstationary initialization.
Remark 6
This nonmonotonicity is further illustrated in Tables 6, 7, where we show how the minimum eigenvalue of the jacobian matrix changes for different nuisance parameters (for very larger N). Those patters resemble power curves of the rkJ test as presented in Fig. 1.
As it can be expected, the results for \(\varvec{\varUpsilon }=1.5\times \varvec{I}_{m}\) are more straightforward. In this case the power curves are monotonic, and rejection frequencies are uniformly dominating the ones from effect stationary case irrespective of other design parameters. Results for \(\varvec{\varUpsilon }^{(4)}\) seem to combine the properties of both \(\varvec{\varUpsilon }^{(3)}\) and \(\varvec{\varUpsilon }^{(1)}\).^{29} Finally, the results of \(\varvec{\varUpsilon }^{(5)}\) are somewhat in between those of \(\varvec{\varUpsilon }^{(1)}\) and \(\varvec{\varUpsilon }^{(2)}\), but are slightly closer to \(\varvec{\varUpsilon }^{(2)}\). It serves as an indication that the offdiagonal element in \(\varvec{\varUpsilon }^{(5)}\) is not of any great importance (given the choice of other design parameters).
Remark 7
In this paper, we do not provide extensive results for the TML estimator of Binder et al. (2005). The main reason for this (besides theoretical problems discussed in Sect. 2.2) is possibly bimodal loglikelihood function (see e.g. Calzolari and Magazzini 2012; Bun et al. 2016; Juodis 2016). For model with stable dynamics, Juodis (2016) presents several alternatives how one can choose the maximizer of the loglikelihood function from the set of local minimizers. Unfortunately, no results are available for nonstationary dynamics analyzed in this paper. Thus, in order to avoid the situation in which unintentionally test procedure based on the TML estimator performs suboptimally, we present only some limited results, see Table 5. Results suggest that for alternatives close to the null hypothesis LR test has low power, as the critical value from the \(\chi ^{2}(1)\) distribution is too large. On the other hand, for some very distant alternative (where the rkJ test struggles to reject the null hypothesis), LR test has sizeable power.
Remark 8
As a robustness check in 6, we also consider model with spatial dependence in the error terms. Evidence of the uniform upward shift in the size can be observed when designs with spatial dependence are considered.
5 Empirical illustration
5.1 Data
In this section, we analyze the Spanish firm panel dataset covering 1983–1990 of 738 manufacturing companies from AlonsoBorrego and Arellano (1999). This datasets constitutes a balanced panel of manufacturing companies recorded in the database of the Bank of Spain’s Central Balance Sheet Office from 1983 to 1990. As it contains data only for firms that were observed for the full time span and in all years satisfied specific coherency requirements, it cannot be considered as being a random sample from the population of all firms. For example, this dataset only contains firms that have majority private shareholding, thus stateowned companies are not represented. Thus all results need to be interpreted as conditional on the underlying characteristics used for sample selection.
Descriptive statistics of the dependant variables
Year  \(\log \,\,(\mathrm{employment})\)  \(\log \,\,(\mathrm{wages})\)  

Mean  Median  Min  Max  Mean  Median  Min  Max  
1983  4.83  4.82  2.30  9.31  0.45  0.47  \(\)0.89  1.32 
1984  4.83  4.79  2.30  9.29  0.43  0.45  \(\)1.13  1.29 
1985  4.83  4.78  2.30  9.21  0.45  0.47  \(\)0.91  1.33 
1986  4.84  4.79  2.40  9.05  0.52  0.53  \(\)0.92  1.55 
1987  4.86  4.84  2.40  8.97  0.58  0.59  \(\)0.65  1.67 
1988  4.88  4.83  2.30  8.91  0.61  0.62  \(\)0.77  1.74 
1989  4.90  4.86  2.40  8.85  0.67  0.67  \(\)0.62  1.83 
1990  4.90  4.87  2.30  8.80  0.74  0.75  \(\)0.75  1.90 
5.2 Results
Estimation results based on full sample
Estimator  \(\phi _{11}\)  \(\phi _{21}\)  \(\phi _{12}\)  \(\phi _{22}\) 

AB (2)  0.86  \(\)0.02  0.14  0.36 
AB (1)  0.86  \(\)0.03  0.12  0.28 
Sys (2)  1.00  0.06  0.07  0.81 
Sys (1)  0.99  0.05  0.07  0.81 
FE  0.71  0.06  0.08  0.44 
FEBC (HK)  0.98  0.02  0.14  0.62 
FEBC (K, Sys(2))  1.02  0.02  0.08  0.77 
FEBC (SPJ)  1.01  0.02  0.05  0.78 
FEBC (BC)  1.05  \(\)0.02  0.04  0.74 
TMLE \((r=1)\)  1.00  0.00  0.07  0.68 
TMLE \((r=2)\)  1.01  0.01  0.08  0.68 
Cointegration testing based on full sample
Name  Test statistic 

ABGMM  14.46 (7.20) 
SysGMM  4.88** (1.31) 
LRTMLE  0.59 
LRRMLE  0.55 
rkJ  13.35*** 
From Table 3 we can see that only the rkJ test based on the Anderson and Hsiao (1982) moment conditions rejects \(H_{0}\). Results for system GMM estimator are mixed, as based on Windmeijer (2005) corrected standard errors the null hypothesis is not rejected, while it is rejected when using the conventional twostep standard errors. Numerous reasons might account for differences in conclusions. First of all, we suspect that the initialization moment conditions of the System estimator are not valid and it does not come as a surprise that this estimator fails to reject \(H_{0}\). Hayakawa and Nagata (2016) provide some evidence based on an incremental Sargan test in support of the latter statement.^{33} Another explanation of results in Table 3 might be the low power of cointegration test used directly on the estimate of \(\varvec{\varPi }\).
Now we turn our attention to the likelihood ratio tests. Based on analytical results in this paper for \(T=2\) we can suspect that the likelihood procedures under \(H_{0}\) of cointegration lack power for close alternatives (recall limited MC results in Table 5), as \(\chi ^{2}(1)\) is a poor approximation of the finite sample distribution. Furthermore, we know that both likelihood methods are robust to violations of mean stationarity, but are not so to timeseries heteroscedasticity. Thus, we can not rule out the possibility that it can be one of the reasons for divergence in conclusions.^{34}
5.3 Subsample analysis
Subsample \(rkJ\) test
Years  T  Time effects  

Yes  No  
1983–1990  7  13.35***  29.01*** 
1983–1989  6  16.09***  35.04*** 
1983–1988  5  15.60***  28.35*** 
1983–1987  4  18.74***  20.14*** 
1984–1990  6  4.59**  27.19*** 
1985–1990  5  2.57  21.54*** 
1986–1990  4  0.79  15.94*** 
The same cannot be generally said when data is crosssectionally demeaned. Note how the value of test statistic increases as T increases for subsample ending in 1990. In particular, for \(T=\{4;5\}\) the null hypothesis is not rejected at any conventional significance level. This behavior emphasizes the value of additional timeseries observations and possible lack of power for small values of T. As it can be seen from Table 4 the observations for 1983 are especially informative about the properties of the bivariate system, as for all subsamples starting in 1983 the test statistic always rejects the null hypothesis.
Overall, omission of timeeffects from the model does not affect the conclusions from Sect. 5.2. However, a moderate amount of time variation in the magnitude of test statistics suggests that this conclusion is sensitive to different estimation horizons.
6 Conclusions
In this paper, we study the properties of the standard Anderson and Hsiao (1982) moment conditions in a PVAR(1) for cointegrated processes. Under the assumptions similar to Binder et al. (2005) we show that these moment conditions are of reduced rank if the process is cointegrated. Based on this observation we propose a rank based test for the null hypothesis of cointegration. We prove that testing procedure in Binder et al. (2005) is invalid due to the singularity of the hessian matrix for persistent data. Monte Carlo results suggest that for most designs, the new test is reasonably sized and has good power properties but might exhibit nonmonotonic power curves for models with substantial effect nonstationarity. We apply our testing procedure to the Spanish manufacturing data of AlonsoBorrego and Arellano (1999) and, unlike the test of BHP, we find no evidence of cointegration.
Footnotes
 1.
\(\rho (\varvec{A})\equiv \max _{k}(\lambda _{k})\), where \(\lambda _{k}\) are (possibly complex) eigenvalues of a matrix \(\varvec{A}\).
 2.
\(\ddot{\varvec{y}}_{i}\equiv \bar{\varvec{y}}_{i}\varvec{y}_{i,0}\) and \(\ddot{\varvec{y}}_{i}\equiv \bar{\varvec{y}}_{i}\varvec{y}_{i,0}\).
 3.
Unlike time series models, we do not define cointegration as a property of time series, as in our setup we keep T fixed.
 4.
We slightly abuse the notation in this case, so that it remains consistent with the general practice of the time series cointegration literature.
 5.
Dhaene and Jochmans (2016) prove singularity of the hessian matrix for \(m=1\).
 6.
However, in this setup we still, for simplicity, assume that the initial observation has a zero mean, i.e. \({\text {E}}[\Delta {}\varvec{y}_{i,1}]=\varvec{0}_{m}\).
 7.
Here subscript r is introduced to highlight that these matrices are of rank r.
 8.
Kruiniger (2013) extended their results by allowing crosssectional heteroscedasticity in the error terms.
 9.
Our numerical simulations suggest that the rank of \({\varvec{\mathcal {H}}}_{\ell }\) has rank deficiency larger than one (for \(m=2\) the rank of \({\varvec{\mathcal {H}}}_{\ell }\) is equal to 7, while full rank is 10), hence results of Roznitzky et al. (2000) need to be generalized taking into account this possibility.
 10.
In particular, setups of BHP were considered.
 11.
See e.g. Magnus and Neudecker (2007).
 12.
Note that for large T consistent estimation is possible, but with nonstandard distribution theory, see Phillips (2015).
 13.
Also referred as “mean nonstationarity”, see e.g. Bun and Sarafidis (2015).
 14.
Here \(\varvec{C}\equiv \varvec{\beta }_{\perp }\left( \varvec{\alpha }_{\perp }'\varvec{\beta }_{\perp }\right) ^{1}\varvec{\alpha }_{\perp }'\) is an \(mr\) rank matrix, while \(\alpha _{\perp }, \varvec{\beta }_{\perp }\) are the orthogonal complements of \(\alpha , \varvec{\beta }\).
 15.
In (DGP.3) for \(\rho (\varvec{\varPhi })<1\) we have \(\varvec{C}=\mathbf {O}_{m}\), resulting in stationary initialization. On the other hand, \(\varvec{\varPhi }=\varvec{I}_{m}\) implies \(\varvec{C}=\varvec{I}_{m}\) (by definition) so that (DGP.3) and (DGP.2) coincide (by redefining M to \(M+1\)).
 16.
 17.
Note that positive definiteness of \({\text {E}}[\varvec{u}_{i,t1}\varvec{y}_{i,t1}']\) is a sufficient, but not a necessary condition. The term can be negative definite or even indefinite, as long as it has full rank.
 18.
In principle, other pooling schemes with weighted averages are possible, but for ease of exposition in this paper we consider simple time average.
 19.
However, in order to accommodate \(\varvec{\eta }_{i}\) that does not satisfy the common dynamics assumption the data generating process of the initial condition needs to be modified.
 20.
For example if \({\text {var}}{\varepsilon _{i,0}}=\sigma ^{2}/(1\phi ^{2})\).
 21.
As e.g. in Andrews and Cheng (2012).
 22.
 23.
Other studies, like Mutl (2009) adapted setups of BHP.
 24.
Results for \(M=5\) are qualitatively and quantitatively similar to the ones presented in this paper.
 25.
Later we use notation \(\varvec{\varUpsilon }^{(q)}\) with q indicating the particular element of this set.
 26.
Setting \(S=50\) would be another option, but it is of similar arbitrariness.
 27.
As in this case orders of magnitude for N and T are not substantially different we suspect that critical values obtained as \(N,T\rightarrow \infty \) (jointly) might be more appropriate.
 28.
Some preliminary MC results, not presented in this paper suggest that effect of \(\tau \) in this setup is notmonotonic. In the sense that higher values of \(\tau \) lead to increase of power rather than further decrease. At least for this particular design it seems that \(\tau =5\) represents the close to worst possible scenario as minimum is reached for \(\tau \approx 6.2\).
 29.
From \(\varvec{\varUpsilon }^{(1)}\) some nonmonotonicities are inherited. Apart from that, the superior test power properties (as compared to the effect stationary case) of \(\varvec{\varUpsilon }^{(3)}\) are dominant. This combined behavior is due to the fact that \(\varvec{\varUpsilon }^{(4)}\) is changing with \(\lambda \). In designs with \(\lambda \) substantially lower than 0 we have \(\varvec{\varUpsilon }^{(4)}\approx \varvec{I}_{m}\), consecutively the weak instrument problem under alternative is less pronounced.
 30.
For a more detailed description of the data, please refer to AlonsoBorrego and Arellano (1999).
 31.
Other testing procedures are described below.
 32.
We focus only on testing \(r=1\) vs. \(r=2\), as e.g. using LR test based on the TML estimator the \(H_{0}: r_{0}=0\) is rejected against the alternative \(H_{A}: r_{0}=1\) with the value of the test statistic equal to 117.561. This value is substantial even if the \(\chi ^{2}(1)\) does not provide a correct asymptotic approximation.
 33.
However, this testing procedure cannot be used if series are cointegrated.
 34.
Arellano (2003) presents some evidence of timeseries heteroscedasticity in this dataset.
 35.
The circle is closed by connecting \(i=1\) with \(i=N\).
 36.
For a graphical illustration see Figure 2 of the aforementioned paper.
Notes
Acknowledgements
This paper greatly benefited from comments made by two anonymous referees. Previous versions of this paper, were presented at the Tinbergen Institute, Netherlands Econometrics Study Group 2013 (Amsterdam) and “Conference on Crosssectional Dependence in Panel Data” in Cambridge 2013. I would like to thank Ramon van den Akker, Peter Boswijk, Maurice Bun and Vasilis Sarafidis for their comments and suggestions.
References
 Abadir KM, Magnus JR (2002) Notation in econometrics: a proposal for a standard. Econom J 5:76–90CrossRefGoogle Scholar
 Ahn SC, Thomas GM (2006) Likelihood based inference for dynamic panel data models. Mimeo, New York CityGoogle Scholar
 AlonsoBorrego C, Arellano M (1999) Symmetrically normalized instrumentalvariable estimation using panel data. J Bus Econ Stat 17:36–49Google Scholar
 Anderson TW, Hsiao C (1982) Formulation and estimation of dynamic models using panel data. J Econom 18:47–82CrossRefGoogle Scholar
 Andrews DWK (1987) Asymptotic results for generalized wald tests. Econom Theory 3(3):348–358CrossRefGoogle Scholar
 Andrews DWK, Cheng X (2012) Estimation and inference with weak, semistrong, and strong identification. Econometrica 80(5):2153–2211CrossRefGoogle Scholar
 Arellano M (2003) Panel data econometrics. Advanced texts in econometrics. Oxford University Press, OxfordCrossRefGoogle Scholar
 Arellano M (2016) Modeling optimal instrumental variables for dynamic panel data models. Res Econ 70(2):238–261CrossRefGoogle Scholar
 Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev Econ Stud 58:277–297CrossRefGoogle Scholar
 Baltagi BH, Bresson G, Pirotte A (2007) Panel unit root tests and spatial dependence. J Appl Econom 22:339–360CrossRefGoogle Scholar
 Binder M, Hsiao C, Pesaran MH (2005) Estimation and inference in short panel vector autoregressions with unit root and cointegration. Econom Theory 21:795–837CrossRefGoogle Scholar
 Blundell RW, Bond S (1998) Initial conditions and moment restrictions in dynamic panel data models. J Econom 87:115–143CrossRefGoogle Scholar
 Bond S, Nauges C, Windmeijer F (2005) Unit roots: identification and testing in micro panels. Mimeo, New York CityGoogle Scholar
 Bun MJG, Carree MA (2005) Biascorrected estimation in dynamic panel data models. J Bus Econ Stat 23(2):200–210CrossRefGoogle Scholar
 Bun MJG, Kleibergen FR (2016) Identification and inference in moments based analysis of linear dynamic panel data models. uvAEconometrics Working Paper SeriesGoogle Scholar
 Bun MJG, Sarafidis V (2015) Dynamic panel data models. In: Baltagi BH (ed) The oxford handbook of panel data, Chap 3. Oxford University Press, OxfordGoogle Scholar
 Bun MJG, Carree MA, Juodis A (2016) On maximum likelihood estimation of dynamic panel data models. Oxford Bull Econ Stat (forthcoming)Google Scholar
 Calzolari G, Magazzini L (2012) Autocorrelation and masked heterogeneity in panel data models estimated by maximum likelihood. Empir Econ 43(1):145–152CrossRefGoogle Scholar
 Dhaene G, Jochmans K (2015) Splitpanel jackknife estimation of fixedeffect models. Rev Econ Stud 82(3):991–1030CrossRefGoogle Scholar
 Dhaene G, Jochmans K (2016) Likelihood inference in an autoregression with fixed effects. Econom Theory 32(5):1178–1215CrossRefGoogle Scholar
 Dovonon P, Hall A (2016) The asymptotic properties of gmm and indirect inference under secondorder identification. Mimeo, New York CityGoogle Scholar
 Dovonon P, Renault E (2009) Gmm overidentification test with first order underidentification. Mimeo, New York CityGoogle Scholar
 Hahn J, Kuersteiner G (2002) Asymptotically unbiased inference for a dynamic panel model with fixed effects when both n and t are large. Econometrica 70(4):1639–1657CrossRefGoogle Scholar
 Harris RD, Tzavalis E (1999) Inference for unit roots in dynamic panels where the time dimension is fixed. J Econom 91(2):201–226CrossRefGoogle Scholar
 Hayakawa K (2009) On the effect of meannonstationarity in dynamic panel data models. J Econom 153:133–135CrossRefGoogle Scholar
 Hayakawa K (2016) An improved gmm estimation of panel var models. Comput Stat Data Anal 100:240–264CrossRefGoogle Scholar
 Hayakawa K, Nagata S (2016) On the behavior of the gmm estimator in persistent dynamic panel data models with unrestricted initial conditions. Comput Stat Data Anal 100:265–303CrossRefGoogle Scholar
 Hsiao C, Pesaran MH, Tahmiscioglu AK (2002) Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. J Econom 109:107–150CrossRefGoogle Scholar
 Johansen S (1991) Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models. Econometrica 59(6):1551–1580CrossRefGoogle Scholar
 Johansen S (1995) Likelihoodbased inference in cointegrated vector autoregressive models. Advanced texts in econometrics. Oxford University Press, OxfordCrossRefGoogle Scholar
 Juodis A (2013) A note on biascorrected estimation in dynamic panel data models. Econ Lett 118:435–438CrossRefGoogle Scholar
 Juodis A (2016) First difference transformation in panel var models: robustness, estimation and inference. Econom Rev (forthcoming)Google Scholar
 Kiviet JF (1995) On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. J Econom 68:53–78CrossRefGoogle Scholar
 Kleibergen FR, Paap R (2006) Generalized reduced rank tests using the singular value decomposition. J Econom 133:97–126CrossRefGoogle Scholar
 Kruiniger H (2008) Maximum likelihood estimation and inference methods for the covariance stationary panel ar(1)/unit root model. J Econom 144:447–464CrossRefGoogle Scholar
 Kruiniger H (2013) Quasi ml estimation of the panel ar(1) model with arbitrary initial conditions. J Econom 173:175–188CrossRefGoogle Scholar
 Kruiniger H, Tzavalis E (2002) Testing for unit roots in short dynamic panels with serially correlated and heteroscedastic disturbance terms, working paper 459, Queen Marry, University of LondonGoogle Scholar
 Magnus JR, Neudecker H (2007) Matrix differential calculus with applications in statistics and econometrics. Wiley, HobokenGoogle Scholar
 Moon HR, Perron B, Phillips PCB (2007) Incidental trends and the power of panel unit root tests. J Econom 141(2):416–459CrossRefGoogle Scholar
 Mutl J (2009) Panel var models with spatial dependence. Mimeo, New York CityGoogle Scholar
 Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 49:1417–1426CrossRefGoogle Scholar
 Phillips PCB (2015) Dynamic panel Anderson–Hsiao estimation with roots near unity. Econom Theory (forthcoming)Google Scholar
 Ramalho JJS (2005) Feasible biascorrected ols, withingroups, and firstdifferences estimators for typical micro and macro ar(1) panel data models. Empir Econ 30:735–748CrossRefGoogle Scholar
 Roodman D (2009) A note on the theme of too many instruments. Oxford Bull Econ Stat 71:135–158CrossRefGoogle Scholar
 Roznitzky A, Cox DR, Bottai M, Robins J (2000) Likelihooodbased inference with singular information matrix. Bernoulli 6(2):243–284CrossRefGoogle Scholar
 Westerlund J (2016) Pooled panel unit root tests and the effect of past initialization. Econom Rev 35(3):396–427. doi: 10.1080/07474938.2013.833829 CrossRefGoogle Scholar
 Windmeijer F (2005) A finite sample correction for the variance of linear efficient twostep gmm estimators. J Econom 126(1):25–51CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.