# Lag truncation and the local asymptotic distribution of the ADF test for a unit root

- 1.3k Downloads

## Abstract

The issue of lag selection in ADF unit root testing is important, even asymptotically, for if the number of lags is not allowed to increase at a certain rate the test might not be correctly sized. However, size control is not the only concern. Indeed, simulations have repeatedly shown how increasing lag lengths tend to be associated with reductions in power, thus adding to the well-known low power problem when the alternative is local to the unit root. But while the simulation evidence is plentiful, there is as of yet almost no asymptotic results that can be used to ascertain whether lag length has any effect on the local asymptotic power of the ADF test. The purpose of the present paper is to fill this gap in the literature.

## Keywords

ADF test Unit root testing Lag selection Local asymptotic power## JEL Classification

C12 C13 C33## 1 Introduction

The augmented Dickey–Fuller (ADF) unit root test is the most popular of its kind, with countless applications. An issue that arises with the application of this test is the selection of the order of the lag augmentation, *p*. There are two considerations. On the one hand, for the test be correctly sized in the presence of general ARMA errors it is important that *p* is allowed to increase with the size of the sample, *T* (see, for example, Said and Dickey 1984). The rate of increase is also important, for only if the rate is fast enough can one rely on conventional data-driven lag selection procedures, such as information criteria (see Ng and Perron 1995; Chang and Park 2002). On the other hand, Monte Carlo evidence indicates that larger values of *p* are generally associated with reduced power (see Lopez 1997; Ng and Perron 1995, 2001). Interestingly, while low power is one of the most well-known problems of the ADF test, as far as we are aware no one has as of yet derived any asymptotic power results for the case when *p* is allowed to increase with *T*. In fact, most studies, such as those of Said and Dickey (1984), Chang and Park (2002), and Xiao and Phillips (1998), only report the asymptotic distribution under the unit root null hypothesis, although there is typically some conjecture about the behaviour under the alternative that the largest AR root is local-to-unity (see Chang and Park 2002; Xiao and Phillips 1998).^{1} The only exceptions known to us are Ng and Perron (2001), whose results are designed specifically to the case when the errors follow a first-order MA process with a root that is local to \(-1\), and Paparoditis and Politis (2017), where the alternative is taken to be that the process is stationary. Both studies confirm that *p* is important, even asymptotically, and that it can in fact dominate the asymptotic behaviour of the ADF test.

In the present paper, we take the discussion of the last paragraph as our starting point. The purpose is to evaluate the local asymptotic distribution of the ADF test when the errors follow a general linear process driven by martingale difference innovations, which may exhibit conditional heteroskedasticity. The study may therefore be thought of as a local power extension of the study of Chang and Park (2002), who derived the asymptotic null distribution of the ADF test under the same assumption on the errors.

Notation: *L* is the lag operator, \(\rightarrow _p\), \(\rightarrow _w\) and \(=_d\) signify convergence in probability, weak convergence, and equality in distribution, respectively, and \(\Vert A\Vert = \sqrt{\mathrm {tr}(A'A)}\) is the Frobenius norm of any matrix *A*.

## 2 Model

### Assumption 1

\((\varepsilon _t,\mathcal {F}_t)\)*is a martingale difference sequence with some filtration*\((\mathcal {F}_t)\), \(\mathbf E (\varepsilon _t^2)=\sigma ^2\), \(T^{-1}\sum _{t=1}^T\varepsilon _t^2\rightarrow _p\sigma ^2\)*and*\(\mathbf E (|\varepsilon _t|^4)<\infty \).

### Assumption 2

\(\pi (z)\ne 0\) for all \(|z|\le 1\), and \(\sum _{k=0}^\infty |k|^s|\pi _k|<\infty \) for some \(s\ge 1\).

### Remark 1

Assumptions 1 and 2 are the same as in Chang and Park (2002), and are not very restrictive. The assumption that \(y_0=0\) is more restrictive than necessary, and can be relaxed, provided that \(y_0=O_p(1)\). The fact that there are no deterministic constant and trend terms is restrictive, but as we discuss later in Remark 3 the analysis can be easily extended to accommodate such terms. Note also that the initialization becomes irrelevant if the DGP contains (at least) a constant.

All the results of Chang and Park (2002) are derived under the unit root restriction that \(\alpha = 1\). The main contribution of the present paper is to investigate the effect of a violation of this restriction. The particular assumption that we are going to be working under is given by Assumption 3.

### Assumption 3

\(\alpha = 1 + cT^{-1}\), *where*\(c\le 0\).

*p*. Let us therefore define \(\delta _p(L) = \sum _{k=1}^{p}\theta _kL^{k-1}\), \(\delta ^p(L) = \sum _{k=p+1}^{\infty }\theta _kL^{k-1}\) and \(\delta (L) = \delta _p(L)+\delta ^p(L)\), such that \(\theta (L) = 1-\delta (L)L\). In this notation,

*c*alone, but is in fact affected also by \(\delta _p(1)\), as is clear from

*c*. Another reason for why the above result is important is that it shows how the regression error in (9) is exactly the same as under the unit root null. This is very convenient in that once the model has been reparameterized as in (9), most of the main results regarding the accuracy of the approximation can be taken more or less directly form Chang and Park (2002). However, this requires \(p\rightarrow \infty \). It is therefore convenient to treat

*p*as a function

*T*.

### Assumption 4

\(pT^{-1/2} \rightarrow 0\)*as*\(p,\,T\rightarrow \infty \).

Assumption 4 restricts the rate at which *p* is allowed to increase with *T*, but is weak enough to enable lag selection by standard information criteria, such as AIC and BIC.

## 3 The ADF test statistic and its local asymptotic distribution

*ADF*.

### Lemma 2

^{2}Lemmas 1 and 2 imply that

*ADF*. Note how \(\beta -1 = c[1 - \delta _p(1)]T^{-1}\). Together with Lemmas 1 and 2, this implies

### Theorem 1

*W*(

*r*) being a standard Brownian motion on \(r\in [0,1]\).

*p*is non-negligible. In practice, of course,

*p*is fixed, which means that \([1 - \delta _p(1)]\pi (1) \ne 1\). The asymptotic null distribution of

*ADF*under \(c=0\) is given by

*p*to have an effect on power. This prediction is in agreement with the bulk of the existing Monte Carlo evidence (see, for example, Ng and Perron 1995). In fact, the local power predictions derived here seem very accurate, even when compared to the stationary predictions of Paparoditis and Politis (2017) when the data are generated as stationary. Let us explain what we mean by this. Paparoditis and Politis (2017) show that the power of the ADF test against stationary alternatives should be decreasing in

*p*, even asymptotically. This is their theoretical prediction. They then simulate power under \(\alpha \in \{0.985, 0.97\}\), \(\pi (L) = 1 + \pi _1L\), \(\pi _1\in \{-0.5,0.5\}\), \(T\in \{50, 100, 200, 400, 800, 1600\}\) and \(p=T^a\) with

*a*going from 0.05 to 0.49 in steps of 0.04. Except for the non-local specification of \(\alpha \), this is consistent with the DGP considered here. Note in particular how

*p*satisfies our Assumption 4. According to the results reported in their Table 6 for the case when \(\alpha = 0.97\) and \(\pi _1 = -0.5\) (in which the effect of

*p*is most pronounced), while when \(T=50\) power decreases almost monotonically from 0.17 when \(a=0.05\) to 0.09 when \(a = 0.49\), when \(T= 1600\) power is flat at 1. Clearly, this finding does not fit well with the prediction that power should always decrease with increases in

*p*. It is, however, consistent with our prediction that the effect of

*p*should tend to decrease with increasing

*T*.

### Remark 2

*W*(

*r*) replaced by \(J_c(r)\). In order to asset the validity of this claim, note how \(dJ_c(r)=cJ_c(r)dr + W(r)\), implying

### Remark 3

As discussed in Remark 3.1 of Chang and Park (2002), DGPs with deterministic constant and trend terms can be easily accommodated. Such an extension is interesting not only in its own right, but also because it shows how the results reported here extends to other unit root tests. Let us therefore use \(z_t\) to denote the observed data. A common way to accommodate deterministic constant and trend terms is through the following components model: \(z_t = \mu + \tau t + y_t\), where \(y_t\) is as in (1). In this DGP, testing for a unit root in \(z_t\) is equivalent to testing for a unit root in \(y_t\). The problem is how to purge the effect of the deterministic terms. Chang and Park (2002) discuss the case when this is done through an auxiliary OLS regression of \(z_t\) onto a constant or a constant and trend. In this case, the results reported in this paper are the same, except that \(J_c(r)\) has to be replaced by its suitably demeaned or detrended version, \(J_c^d(r)\) say. Specifically, while in the constant-only case case, \(J_c^d(r) = J_c(r)-\int _{v=0}^1J_c(v)dv\), in the case with both a constant and trend, \(J_c^d(r)= J_c(r)+(6r-4)\int _{v=0}^1J_c(v)dv-(12r-6)\int _{v=0}^1vJ_c(v)dv\). An alternative to OLS is to perform generalized least squares (GLS) under the local alternative, as first suggested by Elliott et al. (1996). As Westerlund (2014) shows, except for \([1 - \delta _p(1)]\pi (1)\), the asymptotic distribution of the resulting ADF–GLS test in the constant-only case is identical to the one given in Theorem 1. The results reported here regarding the effect of *p* therefore apply also this other test. Another possibility is to follow, for example, Shin and So (2001) and to perform the OLS demeaning recursively. The asymptotic distribution in this case is again the same as in Theorem 1 but now with \(J_c(r)\) replaced by \(J_c^d(r) = J_c(r)- r^{-1}\int _{v=0}^r J_c(v)dv\). The asymptotic distributions of these other tests in the trend case do not have the same form as in Theorem 1, but the effect of *p* is still expected to be negligible. Moreover, these results extend quite naturally to the bulk of the existing panel data unit root tests, which are typically nothing but panel extensions of known time series tests (see, for example, Westerlund 2016, for a discussion of the issue of parametric lag correction in the panel data context).

## Footnotes

## Notes

### Acknowledgements

The authors would like to thank Christine Müller (Editor-in-Chief), and two anonymous referees for many valuable comments and suggestions. Westerlund would like to thank the Knut and Alice Wallenberg Foundation for financial support through a Wallenberg Academy Fellowship, and the Jan Wallander and Tom Hedelius Foundation for financial support under research Grant Number P2014–0112:1.

## References

- Chang Y, Park JY (2002) On the asymptotics of ADF tests for unit roots. Econom Rev 21:431–447MathSciNetCrossRefGoogle Scholar
- Elliott G, Rothenberg TJ, Stock JH (1996) Efficient tests for an autoregressive unit root. Econometrica 64:813–836MathSciNetCrossRefGoogle Scholar
- Hansen BE (1995) Rethinking the univariate approach to unit root testing. Econom Theory 11:1148–1171CrossRefGoogle Scholar
- Lopez JH (1997) The power of the adf test. Econom Lett 57:5–10MathSciNetCrossRefGoogle Scholar
- Moon HR, Phillips PC (2000) Estimation of autoregressive roots near unity using panel data. Econom Theory 16:927–997MathSciNetCrossRefGoogle Scholar
- Ng S, Perron P (1995) Unit root tests in ARMA models with data-dependent methods for the selection of the truncation lag. J Am Stat Assoc 90:268–281MathSciNetCrossRefGoogle Scholar
- Ng S, Perron P (2001) Lag length selection and the construction of unit root tests with good size and power. Econometrica 69:1519–1554MathSciNetCrossRefGoogle Scholar
- Paparoditis E, Politis DN (2017) The asymptotic size and power of the augmented dickey-fuller test for a unit root. Econom Rev (forthcoming)Google Scholar
- Phillips PC (1987) Towards a unified asymptotic theory for autoregression. Biometrika 74:535–547MathSciNetCrossRefGoogle Scholar
- Phillips PC, Moon HR, Xiao Z (2001) How to estimate autoregressive roots near unity. Econom Theory 17:29–69MathSciNetCrossRefGoogle Scholar
- Phillips PC, Solo V (1992) Asymptotics for linear processes. Ann Stat 971–1001MathSciNetCrossRefGoogle Scholar
- Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71:599–607MathSciNetCrossRefGoogle Scholar
- Shin DW, So BS (2001) Recursive mean adjustment for unit root tests. J Time Ser Anal 22:595–612MathSciNetCrossRefGoogle Scholar
- Stock JH (1991) Confidence intervals for the largest autoregressive root in US macroeconomic time series. J Monet Econ 28:435–459CrossRefGoogle Scholar
- Westerlund J (2014) On the asymptotic distribution of the Dickey Fuller-GLS test statistic. Statistics 48:1233–1253MathSciNetCrossRefGoogle Scholar
- Westerlund J (2016) The asymptotic distribution of the CADF unit root test in the presence of heterogeneous AR(\(p\)) errors. Stat Pap 57:303–317MathSciNetCrossRefGoogle Scholar
- Xiao Z, Phillips PCB (1998) An ADF coefficient test for a unit root in ARMA models of unknown order with empirical applications to the us economy. Econom J 1:27–43CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.