Lag truncation and the local asymptotic distribution of the ADF test for a unit root

The issue of lag selection in ADF unit root testing is important, even asymptotically, for if the number of lags is not allowed to increase at a certain rate the test might not be correctly sized. However, size control is not the only concern. Indeed, simulations have repeatedly shown how increasing lag lengths tend to be associated with reductions in power, thus adding to the well-known low power problem when the alternative is local to the unit root. But while the simulation evidence is plentiful, there is as of yet almost no asymptotic results that can be used to ascertain whether lag length has any effect on the local asymptotic power of the ADF test. The purpose of the present paper is to fill this gap in the literature.


Introduction
The augmented Dickey-Fuller (ADF) unit root test is the most popular of its kind, with countless applications. An issue that arises with the application of this test is the selection of the order of the lag augmentation, p. There are two considerations. On the B Joakim Westerlund joakim.westerlund@nek.lu.se one hand, for the test be correctly sized in the presence of general ARMA errors it is important that p is allowed to increase with the size of the sample, T (see, for example, Said and Dickey 1984). The rate of increase is also important, for only if the rate is fast enough can one rely on conventional data-driven lag selection procedures, such as information criteria (see Ng and Perron 1995;Chang and Park 2002). On the other hand, Monte Carlo evidence indicates that larger values of p are generally associated with reduced power (see Lopez 1997;Ng and Perron 1995;Ng and Perron 2001). Interestingly, while low power is one of the most well-known problems of the ADF test, as far as we are aware no one has as of yet derived any asymptotic power results for the case when p is allowed to increase with T . In fact, most studies, such as those of Said and Dickey (1984), Chang and Park (2002), and Xiao and Phillips (1998), only report the asymptotic distribution under the unit root null hypothesis, although there is typically some conjecture about the behaviour under the alternative that the largest AR root is local-to-unity (see Chang and Park 2002;Xiao and Phillips 1998). 1 The only exceptions known to us are Ng and Perron (2001), whose results are designed specifically to the case when the errors follow a first-order MA process with a root that is local to −1, and Paparoditis and Politis (2017), where the alternative is taken to be that the process is stationary. Both studies confirm that p is important, even asymptotically, and that it can in fact dominate the asymptotic behaviour of the ADF test.
In the present paper, we take the discussion of the last paragraph as our starting point. The purpose is to evaluate the local asymptotic distribution of the ADF test when the errors follow a general linear process driven by martingale difference innovations, which may exhibit conditional heteroskedasticity. The study may therefore be thought of as a local power extension of the study of Chang and Park (2002), who derived the asymptotic null distribution of the ADF test under the same assumption on the errors.
Notation: L is the lag operator, → p , → w and = d signify convergence in probability, weak convergence, and equality in distribution, respectively, and A = √ tr(A A) is the Frobenius norm of any matrix A.

Model
The data generating process (DGP) of y t is the same as in Chang and Park (2002), and is given by where y 0 = 0, and ε t and π(L) = ∞ k=0 π k L k satisfy Assumptions 1 and 2, respectively.
Remark 1 Assumptions 1 and 2 are the same as in Chang and Park (2002), and are not very restrictive. The assumption that y 0 = 0 is more restrictive than necessary, and can be relaxed, provided that y 0 = O p (1). The fact that there are no deterministic constant and trend terms is restrictive, but as we discuss later in Remark 3 the analysis can be easily extended to accommodate such terms. Note also that the initialization becomes irrelevant if the DGP contains (at least) a constant. All the results of Chang and Park (2002) are derived under the unit root restriction that α = 1. The main contribution of the present paper is to investigate the effect of a violation of this restriction. The particular assumption that we are going to be working under is given by Assumption 3.
As in Chang and Park (2002), π(L) has the Beveridge-Nelson (BN) decomposition Phillips and Solo 1992, Lemma 2.3). We can therefore write where w t = t k=1 α t−k ε k and r t = t k=1 α t−k ū k . Under Assumptions 1 and 2, π(L) can be inverted, giving where θ(L) = π(L) −1 = 1 − ∞ k=1 θ k L k (see Chang and Park 2002). The purpose of this paper is to investigate the effect when this infinite-order AR process is truncated at lag p. Let us therefore define where By using this and the fact that u t = y t − αy t−1 = y t − (α − 1)y t−1 , we obtain the following equation for y t : At this point, it would seem natural given the approach of Chang and Park (2002) to take αy t−1 + δ p (L) y t−1 as the approximating regression function, and ε p,t − δ p (L)(α − 1)y t−2 as the approximation error. But while this is indeed a possibility, there is a much more elegant approach. To fix ideas, let us write the regression model to be estimated by ordinary least squares (OLS) as where β and β p (L) are reduced form coefficients, and e p,t is a reduced form error term. We now write these reduced form quantities in terms of the components of the DGP. We begin by noting that Consider the last term on the right. Similarly to the BN decomposition for infinite polynomials, we may decompose Hence, by collecting the terms, which is (9) with This is important, for (at least) two reasons. One reason is that it shows how unless α = 1 (c = 0), such that β = α, α is not identified. This means that in the regression to be estimated the drift away from a unit root is not determined by c alone, but is in fact affected also by δ p (1), as is clear from This has implications for studies such as Moon andPhillips (2000) and Phillips et al. (2001), where the purpose is to estimate c. Another reason for why the above result is important is that it shows how the regression error in (9) is exactly the same as under the unit root null. This is very convenient in that once the model has been reparameterized as in (9), most of the main results regarding the accuracy of the approximation can be taken more or less directly form Chang and Park (2002). However, this requires p → ∞. It is therefore convenient to treat p as a function T .
Assumption 4 pT −1/2 → 0 as p, T → ∞. Assumption 4 restricts the rate at which p is allowed to increase with T , but is weak enough to enable lag selection by standard information criteria, such as AIC and BIC.

The ADF test statistic and its local asymptotic distribution
Let where x p,t = ( y t−1 , ..., y t− p ) . It is important to remember that the OLS estimator of the coefficient of y t−1 in (9) is not really estimating α, but rather β. Let us therefore consider OLS estimatorβ of β and its standard error, which are such that The test statistic of interest is the usual ADF statistic, which is given by Lemmas 1 and 2, which are analogous to Lemmas 3.1 and 3.2 of Chang and Park (2002), are key in deriving the local asymptotic distribution of AD F.

Lemma 1 Under Assumptions 1-3,
Lemma 2 Under the conditions of Lemma 1, The proofs of Lemmas 1 and 2 are almost identical to the proofs of Lemmas 3.1 and 3.2 in Chang and Park (2002), and are therefore omitted. The only difference is the presence of α in w t , which does not affect the derivations. 2 Lemmas 1 and 2 imply that where the remainder terms are all o p (1) under Assumption 4. In view of Lemma 1 (c), this implieŝ The asymptotic distribution of the right-hand side is easily evaluated using the results provided in Hansen (1995) for the finite-order AR case, and is summarized in Theorem 1.
The effect of the truncation on the asymptotic distribution of the ADF test statistic is therefore negligible. This finding is in stark contrast to the results reported by Ng and Perron (2001) and Paparoditis and Politis (2017), where the effect of p is nonnegligible. In practice, of course, p is fixed, which means that [1 − δ p (1)]π(1) = 1. The asymptotic null distribution of AD F under c = 0 is given by which is independent of [1 − δ p (1)]π(1). One of the effects of the truncation is therefore to affect the drift of the distribution under the alternative hypothesis that c < 0. Hence, while negligible, in finite samples we expect p to have an effect on power. This prediction is in agreement with the bulk of the existing Monte Carlo evidence (see, for example, Ng and Perron 1995). In fact, the local power predictions derived here seem very accurate, even when compared to the stationary predictions of Paparoditis and Politis (2017) when the data are generated as stationary. Let us explain what we mean by this. Paparoditis and Politis (2017) show that the power of the ADF test against stationary alternatives should be decreasing in p, even asymptotically. This is their theoretical prediction. They then simulate power under α ∈ {0.985, 0.97}, π(L) = 1 + π 1 L, π 1 ∈ {−0.5, 0.5}, T ∈ {50, 100, 200, 400, 800, 1600} and p = T a with a going from 0.05 to 0.49 in steps of 0.04. Except for the non-local specification of α, this is consistent with the DGP considered here. Note in particular how p satisfies our Assumption 4. According to the results reported in their Table 6 for the case when α = 0.97 and π 1 = −0.5 (in which the effect of p is most pronounced), while when T = 50 power decreases almost monotonically from 0.17 when a = 0.05 to 0.09 when a = 0.49, when T = 1600 power is flat at 1. Clearly, this finding does not fit well with the prediction that power should always decrease with increases in p. It is, however, consistent with our prediction that the effect of p should tend to decrease with increasing T .
Remark 2 As already mentioned, Chang and Park (2002) only consider the asymptotic distribution under the unit root null. They also claim (without proof) in their Remark 3.2 that the asymptotic distribution under Assumption 3 with c = 0 should be the same, but with W (r ) replaced by J c (r ). In order to asset the validity of this claim, which is identically the local asymptotic distribution reported by Phillips (1987). The fact that this distribution is also the limit of the local asymptotic distribution in Theorem 1 as p → ∞ proves that the claim of Chang and Park (2002) is in fact correct.
Remark 3 As discussed in Remark 3.1 of Chang and Park (2002), DGPs with deterministic constant and trend terms can be easily accommodated. Such an extension is interesting not only in its own right, but also because it shows how the results reported here extends to other unit root tests. Let us therefore use z t to denote the observed data. A common way to accommodate deterministic constant and trend terms is through the following components model: z t = μ + τ t + y t , where y t is as in (1). In this DGP, testing for a unit root in z t is equivalent to testing for a unit root in y t . The problem is how to purge the effect of the deterministic terms. Chang and Park (2002) discuss the case when this is done through an auxiliary OLS regression of z t onto a constant or a constant and trend. In this case, the results reported in this paper are the same, except that J c (r ) has to be replaced by its suitably demeaned or detrended version, J d c v=0 v J c (v)dv. An alternative to OLS is to perform generalized least squares (GLS) under the local alternative, as first suggested by Elliott et al. (1996). As Westerlund (2014) shows, except for [1 − δ p (1)]π(1), the asymptotic distribution of the resulting ADF-GLS test in the constant-only case is identical to the one given in Theorem 1. The results reported here regarding the effect of p therefore apply also this other test. Another possibility is to follow, for example, Shin and So (2001) and to perform the OLS demeaning recursively. The asymptotic distribution in this case is again the same as in Theorem 1 but now with J c (r ) replaced by J d c (r ) = J c (r ) − r −1 r v=0 J c (v)dv. The asymptotic distributions of these other tests in the trend case do not have the same form as in Theorem 1, but the effect of p is still expected to be negligible. Moreover, these results extend quite naturally to the bulk of the existing panel data unit root tests, which are typically nothing but panel extensions of known time series tests (see, for example, Westerlund 2016, for a discussion of the issue of parametric lag correction in the panel data context).