Nearly nonstationary processes under infinite variance GARCH noises

Let Yt be an autoregressive process with order one, i.e., Yt = μ + ϕnYt−1 + εt, where [εt] is a heavy tailed general GARCH noise with tail index α. Let ϕ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat \phi }_n}$$\end{document} be the least squares estimator (LSE) of ϕn For μ = 0 and α < 2, it is shown by Zhang and Ling (2015) that ϕ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat \phi }_n}$$\end{document} is inconsistent when Yt is stationary (i.e., ϕn ≡ ϕ < 1), however, Chan and Zhang (2010) showed that ϕ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat \phi }_n}$$\end{document} is still consistent with convergence rate n when Yt is a unit-root process (i.e., ϕn = 1) and [εt] is a GARCH(1, 1) noise. There is a gap between the stationary and nonstationary cases. In this paper, two important issues will be considered: (1) what about the nearly unit root case? (2) When can ϕ be estimated consistently by the LSE? We show that when ϕn = 1 − c/n, then ϕ^n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat \phi }_n}$$\end{document} converges to a functional of stable process with convergence rate n. Further, we show that if limn→∞kn(1 − ϕn) = c for a positive constant c, then kn(ϕ^n−ϕ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k_n}({\hat \phi _n} - \phi )$$\end{document} converges to a functional of two stable variables with tail index α/2, which means that ϕn can be estimated consistently only when kn → ∞.

There is an extensive literature on unit-root estimation and testing for the case c(·) ≡ 0 and g(·) ≡ σ δ , i.e., {ε t } are i.i.d. random variables. For a concise review on the recent developments on this topic, see Chan (2008) and the references therein.
On the other hand, the unit-root problem for the case of non i.i.d errors has also been receiving considerable attention in the literature. Under these circumstances, the original testing for unit-root in (1.1) is tantamount to testing for unit-root with GGARCH(1,1) errors. Motivated by this consideration, extensive research have been conducted. For example, Hall and Yao (2003) considered QMLE and Peng and Yao (2003) studied the least absolute deviations estimation (LAD) when Eε 2 t < ∞ and Eη 4 t = ∞. Ling and Li (1998) considered the distribution of the maximum likelihood estimation for non-stationary autoregressive moving average time series with GARCH errors for the case Eε 4 t < ∞. , Ling, Li and McAleer (2003) and Li and Li (2009) generalized the results to the case Eε 2 t < ∞ and obtained that the limit distribution of the estimated unit-root as a functional of the Brownian motion. Chan and Peng (2005) studied the least absolute deviations estimation for stationary AR(1) process with heavy-tailed ARCH(1) noise, see also Zhu and Ling (2015). Chan and Zhang (2010) studied the asymptotic distribution of Dickey-Fuller test for ϕ n = 1 under an infinite-variance GARCH(1, 1) noise with tail index α, they showed that with convergence rate n, the asymptotic distribution of the LSE converges to a functional of a stable process when α < 2 and a functional of the Brownian motion when α = 2. On the other hand,  showed that the LSE of a stationary AR(p) model is inconsistent when α < 2, see also Zhang and Chan (2021). This means a big gap exists between the stationary and nonstationary cases when the noise is an infinite variance GARCH noise.
To shed some intuitive insight into these phenomena, consider the following simple simu- ω + βε 2 t−1 η t , t = 1, 2, . . . , n, where {η t } is a sequence of i.i.d. standard normal noise. Note that the tail index of ε t is given by the solution of E(βη 2 t ) α/2 = 1, see Kesten (1973). Thus, if β = 1, then the tail index α = 2; if β = π/2, then the tail index α = 1. We simulate Y t = ϕY t−1 + ε t with various ϕ and ARCH(1) noise with ω = 0.4 and β = 0.5 for the finite variance case, β = 1, 1.5 for the finite mean but infinite variance case (i.e., 1 < α < 2) and β = 2 for the infinite mean case (i.e., α < 1). For each setting, we replicate the exercise 500 times and take n = 500, 1000, 1500. The empirical sampling bias (Bias,φ n − ϕ) and standard deviation (SD) for the corresponding estimates (Est) ϕ based on the 500 repetitions are reported in Table 1. It can be seen from this table that for all ϕ, as β increases, i.e., the tail index α decreases, the bias and SD for the autoregressive parameter ϕ tend to increase. When β > 1, i.e., the noise has infinite variance, the autoregressive parameter ϕ cannot be estimated well if ϕ < 1, but it can still be estimated consistently when ϕ = 1, and the nearer the ϕ closes to 1, the smaller the bias and SD are. One natural question is when ϕ = 1 − γ/n for some constant γ, does an Ornstein-Uhlenbeck (O-U) limit distribution still hold? We will show that the limit distribution of n(φ n − ϕ) converges to functional of fractional Ornstein-Uhlenbeck (O-U) stable processes.
The second question is when ϕ n can be estimated consistently byφ n . We will show that if ϕ n = 1 − c/k n for a positive constant c, thenφ n − ϕ n = O(1/k n ), which implies ϕ n can be estimated consistently only when k n → ∞. This also gives a smoothing transition from stationary process to nonstationary process similar to Phillips and Magdalinos (2007), who showed that the convergence rate is √ nk n when the noise ε t is a sequence of i.i.d. variables with finite variance.  Throughout the paper, o(1) (o P (1)) denotes a series of numbers (random numbers) con-verging to zero (in probability); O(1) (O P (1)) denotes a series of numbers (random numbers) that are bounded (in probability); when two sequences a n and b n are of the same order, we denote a n ∼ b n ; P −→ and L −→ denote convergence in probability and in distribution, respectively. And C denotes a positive bounded constant taking different values at different places. The rest of the paper is organized as follows. The Dickey-Fuller test and asymptotic theory are developed in Section 2. Section 3 concludes. All the technical proofs are relegated to Section 4. §2 Tests and Asymptotic Distribution

Dickey-Fuller Test
Throughout the paper, we impose the following conditions. Condition 1.
(iii) The density of η 1 is positive in a neighborhood of zero.
Condition 1(i) is a necessary and sufficient condition for the existence of a stationary solution of σ 2 t (see Nelson (1990)). If Condition 1(ii) holds, then Condition 1(i) is equivalent to is equivalent to E(c(η 1 )) µ < 1 for some µ > 0 (see Remark 2.9 of Basrak, Davis and Mikosch (2002)). Conditions 1(i) and (iii) also imply that h t is not a constant and hence exclude the i.i.d. case. By Lemma 2.1 of , it follows that there exists a unique α ∈ (0, k 0 ] such that ) .
Condition 1(iii) can be weakened as the distribution of F of η 1 is a mixture of an absolutely continuous component with respect to the Lebesgue measure λ on R and Dirac masses at some points µ i ∈ R, i = 1, . . . , N . See Francq and Zakoïan (2006).

Asymptotic Distributions
We now derive the limit distributions of the LSE in (2.1). Our first result is about whether the DF test given in (2.2) has power when ε t is a heavy tailed GARCH noise with index α < 2.
Theorem 2.1. Let α be the tail index defined in (2.3). Suppose that α < 2 and Condition 1 holds. (ii) When lim n→∞ n(1 − ϕ n ) = γ for some constant γ and µ = 0, where Z α (t) is a stable process with index α and Z α,γ (t) is an O-U stable process given by is a stable variables with index α/2, and Z α is a stable variables with index α. However, when lim n→∞ n(1 − ϕ n ) = γ for some constant γ and µ = 0, the asymptotic distribution is the same as Theorem 2.1(ii).
From (2.4) and (2.5), we see that the asymptotic behavior ofφ n is totally different between a stationary and a nearly nonstationary case. Note thatφ n is not consistent when ϕ n < 1 and does not depend on n (i.e., Y t is a stationary process), while super consistent with convergence rate n when ϕ n = 1 − γ/n for a certain constant γ. An interesting question is when ϕ can be consistently estimated? Does there exist a smoothing transition from a stationary to a nonstationary case? To address this issue, we consider a moderate deviation from unity model as in Phillips and Magdalinos (2007), i.e., Y t = µ + ϕ n Y t−1 + ε t , with ϕ n = 1 − c/k n , c > 0. The next theorem is about the limit distribution ofφ n under such setting.

Theorem 2.2.
Suppose that µ = 0, α < 2, and Condition 1 holds. If there exists a k n = o(n) such that lim n→∞ k n (1 − ϕ n ) = c > 0, then In this paper, we discuss the limit behaviors of the Dickey-Fuller statistic for a unit-root model with noises driven by heavy-tailed GARCH innovations. It is shown that when the tail index α < 2 of the GARCH innovations, the autoregressive parameter ϕ cannot be consistently estimated by the LSE. However, for such GARCH noise, when lim n→∞ n(1 − ϕ n ) = γ ∈ R, the LSEφ n is still a super consistent estimator and converges to a functional of O-U stable processes. Further, we also develop an asymptotic theory of the LSE for an AR(1) process with coefficient ϕ = 1 − c/k n , c > 0, which gives a smoothing transition from stationary to nonstationary cases, explains why their convergence rates are so different, and shows that the LSE is consistent only when ϕ n → 1, i.e., k n → ∞. The results of this paper can be easily extended to higher order heavy-tailed GARCH-type processes, like GARCH(p, q). Further, using the same argument as in Chan and Zhang (2009), it is east to extend the results to the case with nonzero µ. This paper also opens several interesting questions. First, if a robust procedure instead of LSE is used, could one detect the unit-root more efficiently? Note that Knight (1989) (see also Phillips (1991)) showed that L 1 estimation has significant gains in this framework for the infinite variance case. In view of this fact, one possible way to handle the inconsistency and efficient testing issue is to adopt the L 1 estimate. Second, since the limit distribution of the DF test is complicated, their critical values are difficult to derive, how to construct a new test to avoid deriving the critical values? These issues will be explored in a future work. §4 Technical Proofs In this section, we prove the main results. For any given integers l and H, we define a (H + 1)-dimensional random vector: And denote a n = ( c where S −→ denotes weak convergence under S-topology in D[0, 1], Z α/2 (s) is a stable process with index α, and S α/2 is a H + 1-dimensional stable random vector with index α/2. (2015) and (4.2) can be shown similarly to Theorem 2.2 of Chan and Zhang (2010), here we omit the details.

Conclusion (4.3) can be found in Lemma 3.1 of Zhang and Ling
Proof of Theorem 2.1. Note that when ϕ < 1, This implies thatφ By (4.3),  showed that there exist two stable variables S α/2 and Z α/2 with tail index α/2 such that On the other hand, by (4.2) and a similar argument of Zhang, Sin and Ling (2015), we have Thus, by (4.4) and (4.5), it follows that Since γe −γ(t−i/n) −ϕ

Lemma 4.2. Under conditions of Theorem 2.2, we have
where Z α (1) is a stable variable with tail index α given in (4.2), and S α/2,i−1 denotes the i-th components of S α/2 defined in (4.3).
where v n is a constant sequence satisfying v n /k n → ∞ and v n /n → 0. Since Y 0 is a given random variable, it follows that c k n a n S n1 = (1 + o(1))Y 0 a n = o p (1). For S n3 , we write ε j = ε j I(|σ j | > a n ) + ε j I(|σ j | ≤ a n ) =: ε j,1 + ε j,2 . Note that {ε j,2 /a n } is a martingale difference sequence and Eη 2 t = 1. By Karamata's theorem, it follows that This implies that 1 a n As a result, we have c k n a n ( 1 By Karamata's theorem again, we have for any 0 < p < min{1, α}, which implies that c k n a n ( 1 Thus, c k n a n S n3 = o p (1).