Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Self-weighted generalized empirical likelihood methods for hypothesis testing in infinite variance ARMA models


This paper develops the generalized empirical likelihood (GEL) method for infinite variance ARMA models, and constructs a robust testing procedure for general linear hypotheses. In particular, we use the GEL method based on the least absolute deviations and self-weighting, and construct a natural class of statistics including the empirical likelihood and the continuous updating-generalized method of moments for infinite variance ARMA models. The self-weighted GEL test statistic is shown to converge to a \(\chi ^2\)-distribution, although the model may have infinite variance. Therefore, we can make inference without estimating any unknown quantity of the model such as the tail index or the density function of unobserved innovation processes. We also compare the finite sample performance of the proposed test with the Wald-type test by Pan et al. (Econom Theory 23:852–879, 2007) via some simulation experiments.

This is a preview of subscription content, log in to check access.

Fig. 1


  1. Akashi F (2014) Empirical likelihood approach toward discriminant analysis for dynamics of stable processes. Stat Methodol 19:25–43

  2. Akashi F, Liu Y, Taniguchi M (2015) An empirical likelihood approach for symmetric \(\alpha \)-stable processes. Bernoulli 21:2093–2119

  3. Andrews B, Calder M, Davis RA (2009) Maximum likelihood estimation for \(\alpha \)-stable autoregressive processes. Ann Stat 37:1946–1982

  4. Bravo F (2009) Blockwise generalized empirical likelihood inference for non-linear dynamic moment conditions models. Econom J 12:208–231

  5. Brockwell PJ, Davis RA (1991) Time series: theory and methods: Springer series in statistics, 2nd edn. Springer, Berlin

  6. Chen K, Ying Z, Zhang H, Zhao L (2008) Analysis of least absolute deviation. Biometrika 95:107–122

  7. Davis RA, Dunsmuir WT (1997) Least absolute deviation estimation for regression with ARMA errors. J Theor Probab 10:481–497

  8. Davis RA, Wu W (1997) Bootstrapping M-estimates in regression and autoregression with infinite variance. Stat Sin 7:1135–1154

  9. Drees H, de Haan L, Resnick S (2000) How to make a Hill plot. Ann Stat 28:254–274

  10. Hall P, Heyde C (1980) Martingale limit theory and its application. Academic Press, New York

  11. Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50:1082–1054

  12. Hansen LP, Heaton J, Yaron A (1996) Finite-sample properties of some alternative GMM estimators. J Bus Econ Stat 14:262–280

  13. Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp 221–233

  14. Kakizawa Y (2013) Frequency domain generalized empirical likelihood method. J Time Ser Anal 34:691–716

  15. Kitamura Y (1997) Empirical likelihood methods with weakly dependent processes. Ann Stat 25:2084–2102

  16. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

  17. Koul HL, Saleh AME (1995) Autoregression quantiles and related rank-scores processes. Ann Stat 23:670–689

  18. Li J, Liang W, He S (2011) Empirical likelihood for LAD estimators in infinite variance ARMA models. Stat Probab Lett 81:212–219

  19. Li J, Liang W, He S (2012) Empirical likelihood for AR-ARCH models based on LAD estimation. Acta Math Appl Sin Engl Ser 28:371–382

  20. Ling S (2005) Self-weighted least absolute deviation estimation for infinite variance autoregressive models. J R Stat Soc Ser B (Stat Methodol) 67:381–393

  21. Mikosch T, Gadrich T, Kluppelberg C, Adler RJ (1995) Parameter estimation for ARMA models with infinite variance innovations. Ann Stat 23:305–326

  22. Monti AC (1997) Empirical likelihood confidence regions in time series models. Biometrika 84:395–405

  23. Newey WK (1991) Uniform convergence in probability and stochastic equicontinuity. Econometrica 59:1161–1167

  24. Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Engle R, McFadden D (eds) Handbook of econometrics, vol 4, pp 2111–2245

  25. Newey WK, Smith RJ (2004) Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72:219–255

  26. Nordman DJ, Lahiri SN (2006) A frequency domain empirical likelihood for short-and long-range dependence. Ann Stat 34:3019–3050

  27. Ogata H, Taniguchi M (2010) An empirical likelihood approach fir non-Gaussian vector stationary processes and its application to minimum contrast estimation. Aust N Z J Stat 52:451–468

  28. Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75:237–249

  29. Pan J, Wang H, Yao Q (2007) Weighted least absolute deviations estimation for ARMA models with infinite variance. Econom Theory 23:852–879

  30. Parente PM, Smith RJ (2011) GEL methods for nonsmooth moment indicators. Econom Theory 27:74–113

  31. Petrov VV (1975) Sums of independent random variables. Springer, Berlin

  32. Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications, vol 7. Wiley, New York

  33. Samoradnitsky G, Taqqu MS (1994) Stable non-Gaussian random processes: stochastic models with infinite variance, vol 1. CRC Press, Boca Raton

  34. Tauchen G (1985) Diagnostic testing and evaluation of maximum likelihood models. J Econom 30:415–443

Download references


The author is grateful to two anonymous referees for their comments. The remarks by one of them, who suggested that the EL results in the original draft could be extended to the GEL case were particularly helpful, and led to a significant improvement of the manuscript. The author also would like to thank Professor Marc Hallin and Professor Masanobu Taniguchi for their support and critical comments, and Professor Xiaofeng Shao for expert advice on self-weighting methods.

Author information

Correspondence to Fumiya Akashi.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

This work was supported by Grant-in-Aid for Young Scientists (B) (16K16022, Fumiya Akashi).

Appendix: Proofs

Appendix: Proofs

In what follows, C will denote a generic positive constant that may be different in different uses and “with probability approaching one” will be abbreviated as w.p.a.1. Further, let us adopt the following notations:

$$\begin{aligned} {\hat{g}}^*(\beta ) = \frac{1}{n-u}\sum ^{n}_{t=u+1} g^*_t(\beta ) \quad \text {and}\quad g^*(\beta ) = \mathbb {E}\left[ {\hat{g}}^*(\beta )\right] . \end{aligned}$$

Throughout this section, we assume all conditions of Theorem 1.

First, we impose a lemma, which is equivalent to Assumption 2.2 (d) of Parente and Smith (2011).

Lemma 1

For any \({\tau }_n\rightarrow 0\) as \(n\rightarrow \infty \),

$$\begin{aligned} \sup _{\Vert \beta -\beta _0\Vert \le {\tau }_n} \frac{{n^{1/2}}\Vert {\hat{g}}^*(\beta ) - {\hat{g}}^*(\beta _0) - g^*(\beta )\Vert }{1+{n^{1/2}}\Vert \beta -\beta _0\Vert } =o_p(1). \end{aligned}$$


The proof is similar as Parente and Smith (2011, Proof of Theorem E.2, page 113), so we omit the proof here. \(\square \)

On the other hand, the following lemmas are essentially due to Newey and Smith (2004).

Lemma 2

Let \({\varLambda }_n =\{ \lambda \in \mathbb {R}^{m+d} : \Vert \lambda \Vert \le {c_0} n^{-{1/2}} \}\) for some \(c_0\in (0,\infty )\). Then,

$$\begin{aligned} \sup _{\beta \in \mathcal{B},\lambda \in {{\varLambda }_n}}\max _{{u+1}\le t\le n}\left| \lambda ^{\top } g_t^*(\beta )\right| =o_p(1) \end{aligned}$$

and w.p.a.1, \({\varLambda }_n\subset {\hat{{\varLambda }}}_n(\beta )\) for all \(\beta \in \mathcal{B}\).


We first show that the quantity \(\Vert w_{t-1}A_{t-1}(\beta )\Vert \) is bounded by a constant which is independent of t uniformly in \(\beta \in \mathcal {B}\). Denote the ith element of \(A_{t-1}(\beta )\) by \(A_{i, t-1}(\beta )\) (\(i=1,\ldots , m\)) hereafter. From the equation (6) and the condition (C1), we have, for \(i=1,\ldots ,p\),

$$\begin{aligned} A_{i,t-1}(\beta ) = -\theta (B;a)^{-1}\tilde{y}_{t-i} = \sum ^{\infty }_{k=0}\kappa ^{(1)}_k(a) \tilde{y}_{t-i-k} = \sum ^{t-i-1}_{k=0}\kappa ^{(1)}_k(a) y_{t-i-k}, \end{aligned}$$

where \(\{\kappa ^{(1)}_{k}(a):k\in \mathbb {Z}\}\) satisfy \(|\kappa ^{(1)}_{k-i}(a)|\le c_i r_i^k\) for some \(c_i>0\) and \(r_i\in (0,1)\) uniformly in a. Therefore, we have

$$\begin{aligned} {\left| w_{t-1}^{1/2}A_{i,t-1}(\beta )\right| }&\le \frac{\sum ^{t-i-1}_{k=0}|\kappa ^{(1)}_k(a)| |y_{t-i-k}|}{1+\sum ^{t-1}_{k=1}k^{-\gamma }|y_{t-k}|}\nonumber \\&= \frac{\sum ^{t-1}_{k=i}|\kappa ^{(1)}_{k-i}(a)| |y_{t-k}|}{1+\sum ^{t-1}_{k=1}k^{-\gamma }|y_{t-k}|}\nonumber \\&\le \frac{\sum ^{t-1}_{k=i}|\kappa ^{(1)}_{k-i}(a)| |y_{t-k}|}{1+\sum ^{t-1}_{k=i}k^{-\gamma }|y_{t-k}|}\nonumber \\&\le c_i\frac{\sum ^{t-1}_{k=i}r_i^k |y_{t-k}|}{1+\sum ^{t-1}_{k=i}k^{-\gamma }|y_{t-k}|}\quad \text {(by }|\kappa ^{(1)}_{k-i}(a)|\le c_i r_i^k)\nonumber \\&\le c_i\sum ^{\infty }_{k=1}k^\gamma r_i^k\quad \text {(by 9)}, \end{aligned}$$

where the right hand side of (21) is independent of t, and \(w_{t-1}^{1/2}\le 1\). Therefore, it is shown that \(|w_{t-1}A_{i,t-1}(\beta )|\) is bounded by a constant which is independent of t uniformly in \(\beta \in \mathcal {B}\). By the same argument and the definition (4), it is also shown that \(|w_{t-1}A_{p+j,t-1}(\beta )|\) is bounded by a constant for all \(j=1,\ldots ,q\) uniformly in \(\beta \in \mathcal {B}\) and t. Furthermore, by the condition (C3), \(\Vert w_{t-1} \varphi _{t-1}\Vert \) is bounded by a constant which is independent of t. Hence \(\Vert g_t^*(\beta )\Vert \) is bounded by a constant which is independent of t and \(\beta \). Therefore, we get

$$\begin{aligned} \sup _{\beta \in \mathcal{B},\lambda \in {{\varLambda }_n}}\max _{{u+1}\le t\le n}\left| \lambda ^{\top } g_t^*(\beta )\right| \le C n ^{-{1/2}} =o_p(1){,} \end{aligned}$$

so w.p.a.1, \(\lambda ^\top g_t^*(\beta )\in \mathcal{V}_\rho \) for all \(\beta \in \mathcal {B}\) and \(\Vert \lambda \Vert \le {c_0} n^{-1/2}\). \(\square \)

Lemma 3

Suppose that there exists \({\bar{\beta }}\in \mathcal {B}\) such that \({\bar{\beta }}\xrightarrow {\mathcal{P}}\beta _0\). Then,

$$\begin{aligned} \left\| \frac{1}{n-u}\sum ^{n}_{t={u+}1}g^*_t({\bar{\beta }})g^*_t({\bar{\beta }})^{\top } - {\varOmega }\right\| = o_p(1). \end{aligned}$$


By the definition, we can write \((n-u)^{-1}\sum ^{n}_{t=u+1}g^*_t({{\bar{\beta }}})g^*_t({{\bar{\beta }}})^{\top }\) as

$$\begin{aligned} \frac{1}{n-u}\sum ^{n}_{t=u+1}g^*_t({\bar{\beta }})g^*_t({\bar{\beta }})^{\top } = \frac{1}{n-u}\sum ^{n}_{t=u+1} w_{{t-1}}^2 \left( \begin{array}{ll} A_{{t-1}}({\bar{\beta }}) A_{{t-1}}({\bar{\beta }})^\top &{}\quad A_{{t-1}}({\bar{\beta }}) \varphi _{{t-1}}^\top \\ \varphi _{{t-1}} A_{{t-1}}({\bar{\beta }})^\top &{}\quad \varphi _{{t-1}} \varphi _{{t-1}}^\top \end{array} \right) . \end{aligned}$$

We shall show the consistency of each submatrix in (22) in succession.

First, we focus on the (ij)th element of the first \(m\times m\)-submatrix of (22). For simplicity, we adopt the notation \({\bar{A}}_{i,t-1} = A_{i,t-1}({\bar{\beta }})\) and \(A^0_{i,t-1} = A_{i,t-1}(\beta _0)\). Then, we have the following decomposition:

$$\begin{aligned} \frac{1}{n-u}\sum ^{n}_{t=u+1}{w_{t-1}^2 {\bar{A}}_{i,t-1} {\bar{A}}_{j, t-1}} =\left( {\bar{{\varOmega }}}_{n,A} - {\varOmega }_{n,A} \right) +\left( {\varOmega }_{n,A} - {\varOmega }_{n,S} \right) +{\varOmega }_{n,S}, \end{aligned}$$


$$\begin{aligned}&{\bar{{\varOmega }}}_{n,A} = \frac{1}{n-u}\sum ^{n}_{t=u+1}w^2_{t-1}{\bar{A}}_{i,t-1} {\bar{A}}_{j,t-1},\\&{\varOmega }_{n,A} = \frac{1}{n-u}\sum ^{n}_{t=u+1}\delta ^2_{t-1}A^0_{i,t-1} A^0_{j,t-1},\\&{\varOmega }_{n,S} = \frac{1}{n-u}\sum ^{n}_{t=u+1}\delta ^2_{t-1}S_{i,t-1} S_{j,t-1} \end{aligned}$$

and \(S_{i,t-1}\) is the ith element of \(S_{t-1} = (U_{t-1}, \ldots , U_{t-p}, V_{t-1}, \ldots , V_{t-q})^\top \).

For the first part of (23), the expansion \({\bar{A}}_{i,t-1} = A^0_{i,t-1} + (\partial _\beta {\bar{A}}^0_{i,t-1})^\top ({\bar{\beta }} - \beta _0)\) holds, where \(\partial _\beta {\bar{A}}^0_{i,t-1}= (\partial A_{i,t-1}(\beta )/\partial \beta )|_{\beta ={\bar{\beta }}_0}\), and \({\bar{\beta }}_0\) is on the line joining \({\bar{\beta }}\) and \(\beta _0\). So the first term of (23) is decomposed as

$$\begin{aligned} {\bar{{\varOmega }}}_{n,A} - {\varOmega }_{n,A}&= \frac{1}{n-u}\sum ^{n}_{t=u+1}\delta _{t-1}^2\left( {\bar{A}}_{i,t-1}{\bar{A}}_{j,t-1} - A^0_{i,t-1} A^0_{j,t-1} \right) \nonumber \\&\quad + \frac{1}{n-u}\sum ^{n}_{t=u+1} \left( w_{t-1}^2-\delta _{t-1}^2\right) {\bar{A}}_{i,t-1}{\bar{A}}_{j,t-1}\nonumber \\&= \frac{1}{n-u}\sum ^{n}_{t=u+1}\delta _{t-1}^2 \left( A^0_{i,t-1} (\partial _\beta {\bar{A}}^0_{j,t-1})^\top + A^0_{j,t-1} (\partial _\beta {\bar{A}}^0_{i,t-1})^\top \right) ({\bar{\beta }}-\beta _0) \end{aligned}$$
$$\begin{aligned}&\quad + ({\bar{\beta }}-\beta _0)^\top \frac{1}{n-u}\sum ^{n}_{t=u+1}\delta _{t-1}^2 (\partial _\beta {\bar{A}}^0_{i,t-1}) (\partial _\beta {\bar{A}}^0_{j,t-1})^\top ({\bar{\beta }}-\beta _0) \end{aligned}$$
$$\begin{aligned}&\quad + \frac{1}{n-u}\sum ^{n}_{t=u+1} \left( w_{t-1}^2-\delta _{t-1}^2\right) {\bar{A}}_{i,t-1}{\bar{A}}_{j,t-1}. \end{aligned}$$

By the similar argument as in the proof of Lemma 2, the summands in (24) and (25) are bounded by some constants with probability one. From this fact and \({\bar{\beta }}-\beta _0\xrightarrow {\mathcal{P}}0\), the terms (24) and (25) converge to zero in probability as \(n\rightarrow \infty \). On the other hand, we have

$$\begin{aligned}&\left| \frac{1}{n-u}\sum ^{n}_{t=u+1} \left( w_{t-1}^2-\delta _{t-1}^2\right) {\bar{A}}_{i,t-1}{\bar{A}}_{j,t-1}\right| \nonumber \\&\quad \le \frac{1}{n-u}\sum ^{n}_{t=u+1} |w_{t-1}-\delta _{t-1}||w_{t-1}+\delta _{t-1}| \left| {\bar{A}}_{i,t-1}\right| \left| {\bar{A}}_{j,t-1}\right| \nonumber \\&\quad \le \frac{2}{n-u}\sum ^{n}_{t=u+1} |w_{t-1}-\delta _{t-1}| \left| w_{t-1}^{1/2}{\bar{A}}_{i,t-1}\right| \left| w_{t-1}^{1/2}{\bar{A}}_{j,t-1}\right| \quad \text {(by }\delta _{t-1}\le w_{t-1}\le 1)\nonumber \\&\quad \le \frac{C}{n-u}\sum ^{n}_{t=u+1} |w_{t-1}-\delta _{t-1}|\xrightarrow {\mathcal{P}}0. \end{aligned}$$

Therefore, the term (26) converges to zero in probability as \(n\rightarrow \infty \).

For the second part of (23), we have \(|A_{i,t-1}^0 -S_{i,t-1}|\le \xi _t\) from Lemma 1 of Pan et al. (2007), where \(\xi _t = c'\sum ^{\infty }_{k=t}r^j|y_{t-k}|\) for some \(c'\in (0,\infty )\) and \(r\in (0,1)\). Obviously, \(\xi _t = o_p(1)\) as \(t\rightarrow \infty \) and hence \({\varOmega }_{n,A} - {\varOmega }_{n,S} = o_p(1)\).

For the third part of (23), it is easy to see that \({\varOmega }_{n,S}\) converges to the first \(m\times m\)-submatrix of \({\varOmega }\) by the ergodicity of \(S_{t-1}\).

Second, we consider the last \(d\times d\)-submatrix of (22). For \(i,j\in \{1,\ldots ,d\}\), consider the decomposition

$$\begin{aligned} \frac{1}{n-u} \sum ^{n}_{t=1}w_{t-1}^2 \varphi _{i,t-1}\varphi _{j,t-1}&= \frac{1}{n-u} \sum ^{n}_{t=1}\delta _{t-1}^2 \varphi _{i,t-1}\varphi _{j,t-1} \end{aligned}$$
$$\begin{aligned}&\quad +\frac{1}{n-u} \sum ^{n}_{t=1}(w_{t-1}^2 - \delta _{t-1}^2) \varphi _{i,t-1}\varphi _{j,t-1}. \end{aligned}$$

Note that (28) converges to \(\mathbb {E}[\delta ^2_{t-1}\varphi _{i,t-1}\varphi _{j,t-1}]\) a.e. from

$$\begin{aligned} \mathbb {E}\left[ |\delta _{t-1}^2\varphi _{i,t-1}\varphi _{j,t-1}|\right] \le \mathbb {E}\left[ |w_{t-1}^2\varphi _{i,t-1}\varphi _{j,t-1}|\right] <\infty \end{aligned}$$

by the condition (C3), stationarity and ergodicity of \(\delta _{t-1}^2\varphi _{i,t-1}\varphi _{j,t-1}\). On the other hand, it is shown that (29) converges to zero in probability as \(n\rightarrow \infty \) by the same argument as (27) and the condition (C3).

Third, we show the consistency of the off-diagonal part of (22). For \(i\in \{1,\ldots ,m\}\) and \(j\in \{1,\ldots ,d\}\), we have

$$\begin{aligned} \frac{1}{n-u}\sum ^{n}_{t=u+1} w_{t-1}^2 {\bar{A}}_{i,t-1} \varphi _{j,t-1}&=\frac{1}{n-u}\sum ^{n}_{t=u+1} \delta _{t-1}^2A_{i,t-1}^0\varphi _{j,t-1} \end{aligned}$$
$$\begin{aligned}&\quad +\frac{1}{n-u}\sum ^{n}_{t=u+1} \left( w_{t-1}^2 - \delta _{t-1}^2\right) A_{i,t-1}^0\varphi _{j,t-1} \end{aligned}$$
$$\begin{aligned}&\quad +\frac{1}{n-u}\sum ^{n}_{t=u+1} w_{t-1}^2\left( \partial _\beta {\bar{A}}^0_{i,t-1}\right) ^\top \left( {\bar{\beta }}-\beta _0\right) \varphi _{j,t-1}. \end{aligned}$$

Again from Lemma 1 of Pan et al. (2007), \((n-u)^{-1}\sum ^{n}_{t=u+1} \delta _{t-1}^2\{A^0_{i,t-1}-S_{i,t-1}\}\varphi _{j,t-1} = o_p(1)\) and hence (30) converges to \(\mathbb {E}[\delta _{t-1}^2S_{i,t-1}\varphi _{j,t-1}]\) in probability. On the other hand, the terms (31) and (32) converge to zero in probability by the Cauchy-Schwarz inequality and the same arguments above.

Thus, we get the desired result. \(\square \)

Lemma 4

Suppose that there exists \({\bar{\beta }}\in \mathcal{B}\) such that \({\bar{\beta }}\xrightarrow {\mathcal{P}}\beta _0\), \({\hat{g}}^*({\bar{\beta }}) = O_p(n^{-1/2})\). Then,

$$\begin{aligned} {\bar{\lambda }} = \arg \max _{\lambda \in {\hat{{\varLambda }}}_n({\bar{\beta }})} P^*_n({\bar{\beta }},\lambda ) \end{aligned}$$

exists w.p.a.1, \({\bar{\lambda }}=O_p(n^{-1/2})\) and \(P^*_n({\bar{\beta }},{\bar{\lambda }}) = O_p(1)\).


Since \({\varLambda }_n\) is a closed set, \(\check{\lambda }= \arg \max _{\lambda \in {\varLambda }_n} P^*_n({\bar{\beta }},\lambda )\) exists with probability one. From Lemma 2, \(P^*_n({\bar{\beta }},\lambda )\) is continuously twice differentiable w.p.a.1 with respect to \(\lambda \). So by a Taylor expansion around \(\lambda =0_{m{+d}}\), there exists \(\dot{\lambda }\) on the line joining \(\check{\lambda }\) and \(0_{m{+d}}\) such that

$$\begin{aligned} 0&= P^*_n({\bar{\beta }},0_{m{+d}})\nonumber \\&\le P^*_n({\bar{\beta }},\check{\lambda })\nonumber \\&= -n\check{\lambda }^{\top }{\hat{g}}^*({\bar{\beta }}) +\frac{n}{2}\check{\lambda }^{\top } \left[ \frac{1}{n-u}\sum ^{n}_{t=u+1} {{\bar{\rho }}'_t} g^*_t({\bar{\beta }}) g^*_t({\bar{\beta }})^{\top }\right] \check{\lambda }. \end{aligned}$$

where \({\bar{\rho }}'_{t} = \rho '\{\dot{\lambda }^{\top }g^*_t({\bar{\beta }})\}\). Furthermore, by Lemmas 2 and 3,

$$\begin{aligned} {\bar{{\varOmega }}}_n^\rho = -\frac{1}{n-u}\sum _{t=u+1}^n {\bar{\rho }}'_{t} g^*_t({\bar{\beta }})g^*_t({\bar{\beta }})^{\top } \xrightarrow {\mathcal{P}}{\varOmega }, \end{aligned}$$

and therefore the minimum eigenvalue of \({\bar{{\varOmega }}}_n^\rho \) is bounded away from 0 w.p.a.1. from (C6). Then, it holds that

$$\begin{aligned} 0 \le -\check{\lambda }^{\top }{\hat{g}}^*({\bar{\beta }}) -\frac{1}{2}\check{\lambda }^{\top } {\bar{{\varOmega }}}_n^\rho \check{\lambda }\le \Vert \check{\lambda }\Vert \Vert {\hat{g}}^*({\bar{\beta }})\Vert -{C}\Vert \check{\lambda }\Vert ^2 \end{aligned}$$

w.p.a.1. Dividing both side of (34) by \(\Vert \check{\lambda }\Vert \), we get \(\Vert \check{\lambda }\Vert = {O_p(n^{-1/2})}\), and hence \(\check{\lambda }\in {\varLambda }_n\) w.p.a.1. Again by Lemma 2, concavity of \(P^*_n({\bar{\beta }},\lambda )\) and convexity of \({\hat{{\varLambda }}}_n({\bar{\beta }})\), it is shown that \({\bar{\lambda }} =\check{\lambda }\) exists w.p.a.1 and \({\bar{\lambda }}=O_p(n^{-1/2})\). These results and (33) for \(\check{\lambda }={\bar{\lambda }}\) also imply that \(P^*_n({\bar{\beta }},{\bar{\lambda }}) = O_p(1)\). \(\square \)

Lemma 5

\({\hat{g}}^*({\hat{\beta }}) = O_p(n^{-1/2})\) as \(n\rightarrow \infty \).


We define \(\hat{{\hat{g}}} = {\hat{g}}^*({\hat{\beta }})\) and \( \tilde{\lambda }= -n^{-1/2}{\hat{{\hat{g}}}}/{\Vert \hat{{\hat{g}}}\Vert }.\)

First, by a quite similar argument as in the proof of Lemma 4, and by noting that \(\rho '\{\tilde{\lambda }^\top g_t^*(\beta )\}\ge -C\) uniformly in t and \(\beta \) w.p.a.1. from Lemma 2, we have

$$\begin{aligned} P^*_{n}({\hat{\beta }},\tilde{\lambda })&\ge n \left( n^{-1/2}\Vert \hat{{\hat{g}}}\Vert -\frac{C}{2}\tilde{\lambda }^{\top } \left[ \frac{1}{n-u}\sum ^{n}_{t=u+1}g^*_t({\hat{\beta }}) g^*_t({\hat{\beta }})^\top \right] \tilde{\lambda }\right) \nonumber \\&\ge n \left( n^{-1/2}\Vert \hat{{\hat{g}}}\Vert -C\Vert \tilde{\lambda }\Vert ^2 \right) \nonumber \\&= \left( n^{1/2}\Vert \hat{{\hat{g}}}\Vert -C\right) \end{aligned}$$


Second, by the definition of \({\hat{\lambda }}\),

$$\begin{aligned} P^*_n({\hat{\beta }},{\hat{\lambda }}) = \max _{\lambda \in {\hat{{\varLambda }}}_n({\hat{\beta }})}P^*_n({\hat{\beta }},\lambda ) \ge P^*_n({\hat{\beta }},\tilde{\lambda }). \end{aligned}$$

Third, note that the central limit theorem yields \({\hat{g}}^*(\beta _0)=O_p(n^{-1/2})\). Then by applying Lemma 4 for \({\bar{\beta }}=\beta _0\), we get \(\sup _{\lambda \in {\hat{{\varLambda }}}_n(\beta _0)}P^*_n(\beta _0,\lambda ) = O_p(1)\). Thus we obtain

$$\begin{aligned} P^*_n({\hat{\beta }},{\hat{\lambda }}) = \min _{\beta \in \mathcal{B}} \sup _{\lambda \in {\hat{{\varLambda }}}_n(\beta )}P^*_n(\beta ,\lambda ) \le \sup _{\lambda \in {\hat{{\varLambda }}}_n(\beta _0)}P^*_n(\beta _0,\lambda ) = O_p(1). \end{aligned}$$

Finally, from (35)–(37), \( n^{1/2}\Vert \hat{{\hat{g}}}\Vert = O_p(1), \) which implies the assertion of this lemma. \(\square \)

Lemma 6

\(\sup _{\beta \in \mathcal{B}}\Vert {\hat{g}}^*(\beta ) - g^*(\beta )\Vert = o_p(1)\).

Since \({\hat{g}}^*(\beta )\) contains nonsmooth part with respect to \(\beta \), it is not easy to use Corollary 2.2 of Newey (1991). It is also hard to show the stochastic equicontinuity of \(\{{\hat{g}}^*(\beta ):n\ge 1\}\) directly. Therefore we shall make use of the methodology of Tauchen (1985), which is based on Huber (1967).


Let us define

$$\begin{aligned} h_t(\beta ,{\tau }) = \sup _{\Vert \tilde{\beta }- \beta \Vert <{\tau }}\left\| g_t^*(\tilde{\beta }) - g_t^*(\beta ) \right\| \end{aligned}$$

and show that \(h_t(\beta ,{\tau })\rightarrow 0\) almost surely as \({\tau }\rightarrow 0\). However, it suffices to show that \(g_t^*(\beta )\) is continuous at each \(\beta \) with probability one, and from the definition of \(\varepsilon _t(\beta )\), \(\mathrm{sign}\{\varepsilon _t(\beta )\}\) is continuous at each \(\beta \) with probability one. Hence we get \(h_t(\beta ,{\tau })\rightarrow 0\) almost surely as \({\tau }\rightarrow 0\). Thus by dominated convergence, for any \(\epsilon \) and each \(\beta \), there exists \({\tau }(\beta )\) such that \(\mathbb {E}[h_t(\beta ,{\tau })]\le \epsilon /4\) for all \({\tau }\le {\tau }(\beta )\). Next, define \(B(\beta ,{\tau }) = \left\{ b\in \mathcal{B}: \Vert b-\beta \Vert <{\tau }\right\} \). By the compactness, there exist \(\beta _1,\ldots ,\beta _K\) such that \(\{B(\beta _1,{\tau }(\beta _1)),\ldots ,B(\beta _K,{\tau }(\beta _K))\}\) is a finite open covering of \(\mathcal{B}\). Let \({\tau }_k = {\tau }(\beta _k)\) and \(\mu _k = \mathbb {E}[h_t(\beta _k,{\tau }_k)]\). By the definition of \({\tau }_k\), it follows that \(\mu _k \le \epsilon /4\) for all \(k=1,\ldots ,K\). Now, for any \(\beta \), without loss of generality, let \(B_k\) contain \(\beta \). Then

$$\begin{aligned} \left\| {\hat{g}}^*(\beta ) - g^*(\beta )\right\|&\le \left( \frac{1}{n-u}\sum ^{n}_{t=u+1}\left\| g^*_t(\beta ) - g^*_t(\beta _k) \right\| -\mu _k\right) + \mu _k\\&\quad +\left\| \frac{1}{n-u}\sum ^{n}_{t=u+1}g_t^*(\beta _k) - g^*(\beta _k) \right\| + \left\| g^*(\beta _k) - g^*(\beta ) \right\| \\&\le \left( \frac{1}{n-u}\sum ^{n}_{t=u+1}h_t(\beta _k,{\tau }_k) - \mu _k\right) + \mu _k\\&\quad +\left\| \frac{1}{n-u}\sum ^{n}_{t=u+1}g_t^*(\beta _k) - g^*(\beta _k) \right\| + \left\| g^*(\beta _k) - g^*(\beta ) \right\| . \end{aligned}$$

By the ergodicity, \((n-u)^{-1}\sum ^{n}_{t=u+1}h_t(\beta _k,{\tau }_k)\xrightarrow {\text {a.s.}}\mu _k\) as \(n\rightarrow \infty \). Therefore, there exists \(n_{1k}(\epsilon )\in \mathbb {N}\) such that \(\Vert (n-u)^{-1}\sum ^{n}_{t=u+1}h_i(\beta _k,{\tau }_k)-\mu _k\Vert \le \epsilon /4\) a.s. for all \(n\ge n_{1k}(\epsilon )\). Similarly, there exists \(n_{2k}(\epsilon )\in \mathbb {N}\) such that \(\Vert (n-u)^{-1}\sum ^{n}_{t=u+1}g^*_t(\beta _k)-g^*(\beta _k)\Vert \le \epsilon /4\) a.s. for all \(n\ge n_{2k}(\epsilon )\). Furthermore,

$$\begin{aligned} \left\| g^*(\beta _k) - g^*(\beta ) \right\|&= \left\| \mathbb {E}\left[ g^*_t(\beta _k) - g_t^*(\beta )\right] \right\| \\&\le \mathbb {E}\left[ \left\| g_t^*(\beta _k) - g_t^*(\beta )\right\| \right] \\&\le \mathbb {E}\left[ \sup _{\beta , \Vert \beta -\beta _k\Vert <{\tau }_k}\left\| g_t^*(\beta _k) - g_t^*(\beta )\right\| \right] \\&= \mathbb {E}\left[ h_t(\beta _k,{\tau }_k)\right] \\&= \mu _k\\&\le \epsilon /4. \end{aligned}$$

Finally, let \(n_k(\epsilon ) = \max \{n_{1k}(\epsilon ),n_{2k}(\epsilon )\}\). Then for any \(\epsilon >0\), it holds that

$$\begin{aligned} \left\| {\hat{g}}^*(\beta ) - g^*(\beta ) \right\| \le \epsilon \quad \text { for all } n\ge n_k(\epsilon )\text { a.s.} \end{aligned}$$

Thus, we obtain

$$\begin{aligned} \sup _{\beta \in \mathcal{B}}\left\| {\hat{g}}^*(\beta ) - g^*(\beta ) \right\| \le \epsilon \quad \text { for all }n\ge n(\epsilon )\text { a.s.,} \end{aligned}$$

where \(n(\epsilon ) = \max \{n_k(\epsilon ) : k=1,\ldots ,K\}\). Hence we get the desired result. \(\square \)

Lemma 7

\({\hat{\beta }}-\beta _0 = O_p(n^{-1/2})\).


It follows from the triangular inequality, Lemmas 5 and 6 that

$$\begin{aligned} \Vert g^*({\hat{\beta }})\Vert&\le \Vert g^*({\hat{\beta }}) - {\hat{g}}^*({\hat{\beta }})\Vert + \Vert {\hat{g}}^*({\hat{\beta }})\Vert \\&\le \sup _{\beta \in \mathcal{B}}\Vert g^*(\beta ) - {\hat{g}}^*(\beta )\Vert + \Vert {\hat{g}}^*({\hat{\beta }})\Vert = o_p(1). \end{aligned}$$

Since \(g^*(\beta )-\mathbb {E}[g_t^{*0}(\beta )]=o_p(1)\) uniformly in \(\beta \) and \(\mathbb {E}[g_t^{*0}(\beta )]\) has a unique zero at \(\beta _0\) by the condition (C5), \(\Vert g^*(\beta )\Vert \) must be bounded away from zero outside any neighborhood of \(\beta _0\). Therefore, \({\hat{\beta }}\) must be inside any neighborhood of \(\beta _0\) w.p.a.1. Then, \({\hat{\beta }}\xrightarrow {\mathcal{P}}\beta \).

Next, we show that \({\hat{\beta }}-\beta _0 = O_p(n^{-1/2})\). By Lemma 5, \({\hat{g}}^*({\hat{\beta }}) =O_p(n^{-1/2})\) and by the central limit theorem, \({\hat{g}}^*(\beta _0)\) is also \(O_p(n^{-1/2})\). Further, from Lemma 1,

$$\begin{aligned} \Vert {\hat{g}}^*({\hat{\beta }}) - {\hat{g}}^*(\beta _0) - g^*({\hat{\beta }})\Vert \le (1+{n^{1/2}}\Vert {\hat{\beta }}-\beta _0\Vert )o_p(n^{-1/2}). \end{aligned}$$


$$\begin{aligned} \Vert g^*({\hat{\beta }})\Vert&\le \Vert {\hat{g}}^*({\hat{\beta }})-{\hat{g}}^*(\beta _0)-g^*({\hat{\beta }}) \Vert + \Vert {\hat{g}}^*({\hat{\beta }})\Vert + \Vert {\hat{g}}^*(\beta _0)\Vert \\&= (1+{n^{1/2}}\Vert {\hat{\beta }}-\beta _0\Vert )o_p(n^{-1/2}) + O_p(n^{-1/2}). \end{aligned}$$

In addition, by the similar argument as Newey and McFadden (1994, p.2191) and differentiability of \(\Vert g^*(\beta )\Vert \), \(\Vert g^*({\hat{\beta }})\Vert \ge C\Vert {\hat{\beta }}-\beta _0\Vert \) w.p.a.1. Therefore, we get

$$\begin{aligned} \Vert {\hat{\beta }} - \beta _0\Vert = (1+{n^{1/2}}\Vert {\hat{\beta }}-\beta _0\Vert )o_p(n^{-1/2}) + O_p(n^{-1/2}) \end{aligned}$$

and hence \(\Vert {\hat{\beta }}-\beta _0\Vert = O_p(n^{-1/2})/\{1+o_p(1)\} = O_p(n^{-1/2})\). \(\square \)

To prove Theorem 1, we first show that \(P^*_{n}(\beta ,\lambda )\) is well approximated by some smooth function near its optima, by following Parente and Smith (2011). Let us define

$$\begin{aligned}&L^*_n(\beta ,\lambda ) = -n\{ G(\beta -\beta _0) + {\hat{g}}^*(\beta _0) \}^{\top }\lambda - \frac{n}{2}\lambda ^{\top }{\varOmega }\lambda . \end{aligned}$$

Furthermore, hereafter redefine

$$\begin{aligned} \tilde{\beta }= \arg \min _{\beta \in \mathcal{B}}\sup _{\lambda \in \mathbb {R}^{m+d}} L^*_n(\beta ,\lambda ) \quad \text { and }\quad \tilde{\lambda }= \arg \max _{\lambda \in \mathbb {R}^{m{+d}}} L^*_n(\tilde{\beta },\lambda ). \end{aligned}$$

Lemma 8

\(P^*_{n}({\hat{\beta }},{\hat{\lambda }}) = L^*_{n}(\tilde{\beta },\tilde{\lambda }) + o_p(1)\).


It is suffice to show the following three relationships:

  1. (i)

    \(P^*_{n}({\hat{\beta }},{\hat{\lambda }}) - L^*_{n}({\hat{\beta }},{\hat{\lambda }})=o_p(1)\),

  2. (ii)

    \(L^*_{n}({\hat{\beta }},{\hat{\lambda }}) - L^*_{n}(\tilde{\beta },{\hat{\lambda }})=o_p(1)\),

  3. (iii)

    \(L^*_{n}(\tilde{\beta },{\hat{\lambda }}) - L^*_{n}(\tilde{\beta },\tilde{\lambda })=o_p(1)\).

For (i), Taylor expansion yields

$$\begin{aligned} P^*_n({\hat{\beta }},{\hat{\lambda }})&= -n{\hat{\lambda }}^{\top } {\hat{g}}^*({\hat{\beta }}) +\frac{n}{2}{\hat{\lambda }}^{\top } \left[ \frac{1}{n-u}\sum ^{n}_{t=u+1} \rho '\{\ddot{\lambda }^{\top }g^*_t({\bar{\beta }})\} g_t^*({\hat{\beta }})g_t^*({\hat{\beta }})^{\top }\right] {\hat{\lambda }}, \end{aligned}$$

where \(\ddot{\lambda }\) is on the line joining \(0_{m{+d}}\) and \({\hat{\lambda }}\). Then

$$\begin{aligned} \left| P^*_n({\hat{\beta }},{\hat{\lambda }}) - {\hat{L}}_n({\hat{\beta }},{\hat{\lambda }})\right|&\le \left| -n\left( {\hat{g}}^*({\hat{\beta }}) - {\hat{g}}^*(\beta _0) - G({\hat{\beta }}-\beta _0) \right) ^{\top }{\hat{\lambda }} \right| \end{aligned}$$
$$\begin{aligned}&\quad + \left| \frac{n}{2}{\hat{\lambda }}^{\top } \left[ \frac{1}{n-u}\sum ^{n}_{t=u+1} \rho '\{\ddot{\lambda }^{\top }g^*_t({\bar{\beta }})\} g_t^*({\hat{\beta }})g_t^*({\hat{\beta }})^{\top } + {\varOmega }\right] {\hat{\lambda }} \right| . \end{aligned}$$

Since \({\hat{\beta }}\xrightarrow {\mathcal{P}}\beta _0\) by Lemma 7, we can apply Lemma 4 for \({\bar{\beta }}={\hat{\beta }}\) and hence \({\hat{\lambda }}=O_p(n^{-1/2})\). Then, by recalling (38), the quantity (39) becomes

$$\begin{aligned}&\left| -n\left( {\hat{g}}^*({\hat{\beta }}) - {\hat{g}}^*(\beta _0) - G({\hat{\beta }}-\beta _0) \right) ^{\top }{\hat{\lambda }} \right| \\&\quad \le n\left\{ \left\| {\hat{g}}^*({\hat{\beta }}) - {\hat{g}}^*(\beta _0) - g^*({\hat{\beta }})\right\| +\left\| g^*({\hat{\beta }}) - G({\hat{\beta }} - \beta _0) \right\| \right\} \left\| {\hat{\lambda }} \right\| \\&\quad =\left\{ \left( 1+{n^{1/2}}\left\| {\hat{\beta }}-\beta _0\right\| \right) o_p(n^{-1/2}) + O_p\left( \left\| {\hat{\beta }} - \beta _0 \right\| ^2\right) \right\} O_p(n^{1/2})\\&\quad =o_p(1). \end{aligned}$$

Moreover, (40) is \(o_p(1)\). Hence, we get \( \left| P^*_n({\hat{\beta }},{\hat{\lambda }}) - L^*_n({\hat{\beta }},{\hat{\lambda }})\right| = o_p(1) \).

To get (ii), we first show \(| P^*_{n}(\tilde{\beta },{\hat{\lambda }}) -L^*_{n}(\tilde{\beta },{\hat{\lambda }}) | = o_p(1)\). Note that \(L^*_{n}(\beta ,\lambda )\) is smooth in \(\beta \) and \(\lambda \). Then, the first order conditions for an interior global maximum:

$$\begin{aligned}&0_m = \frac{\partial L^*_{n}(\beta ,\lambda )}{\partial \beta } = -nG^{\top }\lambda , \end{aligned}$$
$$\begin{aligned}&0_{m{+d}} = \frac{\partial L^*_{n}(\beta ,\lambda )}{\partial \lambda } = -n\left\{ G(\beta -\beta _0) +{{\hat{g}}^*(\beta _0)} + {\varOmega }\lambda \right\} \end{aligned}$$

are satisfied at \((\beta ^{\top },\lambda ^{\top })^{\top } = (\tilde{\beta }^{\top },\tilde{\lambda }^{\top })^{\top }\). The conditions above are stacked as

$$\begin{aligned} \left( \begin{array}{ll} O_{p\times p}&{} G^{\top }\\ G &{} {\varOmega }\end{array} \right) \left( \begin{array}{c} \tilde{\beta }-\beta _0\\ \tilde{\lambda }\end{array} \right) + \left( \begin{array}{c} 0_{m}\\ {{\hat{g}}^*(\beta _0)} \end{array} \right) = 0_{2m{+d}}. \end{aligned}$$

Now, set

$$\begin{aligned} {\varSigma }=(G^{\top }{\varOmega }^{-1}G)^{-1},\quad H={\varOmega }^{-1}G{\varSigma }\quad \text {and}\quad P={\varOmega }^{-1}-H{\varSigma }^{-1}H^{\top }. \end{aligned}$$

Then (43) is equivalent to

$$\begin{aligned} \left( \begin{array}{c} \tilde{\beta }-\beta _0\\ \tilde{\lambda }\end{array} \right) = -\left( \begin{array}{lll} {\varSigma }&{}\quad -H^{\top } \\ -H &{}\quad -P \end{array} \right) \left( \begin{array}{c} 0_m\\ -{{\hat{g}}^*(\beta _0)} \end{array} \right) = \left( \begin{array}{c} -H^{\top } {{\hat{g}}^*(\beta _0)} \\ -P {{\hat{g}}^*(\beta _0)} \end{array} \right) , \end{aligned}$$

so both \(\tilde{\beta }- \beta _0\) and \(\tilde{\lambda }\) are \(O_p(n^{-1/2})\). Therefore, by the same arguments as (i) in this proof, \(| P^*_{n}(\tilde{\beta },{\hat{\lambda }}) - L^*_{n}(\tilde{\beta },{\hat{\lambda }}) | = o_p(1)\). This relationship and the fact that \(({\hat{\beta }}^{\top },{\hat{\lambda }}^{\top })^{\top }\) and \((\tilde{\beta }^{\top },\tilde{\lambda }^{\top })^{\top }\) are, respectively, the saddle points of \(P^*_{n}(\beta ,\lambda )\) and \(L^*_{n}(\beta ,\lambda )\) imply that

$$\begin{aligned} L^*_{n}({\hat{\beta }},{\hat{\lambda }}) = P^*_{n}({\hat{\beta }},{\hat{\lambda }}) + o_p(1) \le P^*_{n}(\tilde{\beta },{\hat{\lambda }}) + o_p(1) = L^*_{n}(\tilde{\beta },{\hat{\lambda }}) + o_p(1). \end{aligned}$$

On the other hand,

$$\begin{aligned} L^*_{n}(\tilde{\beta },{\hat{\lambda }})&\le L^*_{n}(\tilde{\beta },\tilde{\lambda })\nonumber \\&\le L^*_{n}({\hat{\beta }},\tilde{\lambda })= P^*_{n}({\hat{\beta }},\tilde{\lambda }) + o_p(1)\nonumber \\&\le P^*_{n}({\hat{\beta }},{\hat{\lambda }}) + o_p(1) = L^*_{n}({\hat{\beta }},{\hat{\lambda }}) + o_p(1). \end{aligned}$$

Thus, (45) and (46) yield \(L^*_{n}({\hat{\beta }},{\hat{\lambda }}) - L^*_{n}(\tilde{\beta },{\hat{\lambda }})=o_p(1)\).

We can prove (iii) similarly; i.e.,

$$\begin{aligned} {\hat{L}}_{n}(\tilde{\beta },\tilde{\lambda })&\le L^*_{n}({\hat{\beta }},\tilde{\lambda }) = P^*_{n}({\hat{\beta }},\tilde{\lambda }) + o_p(1)\\&\le P^*_{n}({\hat{\beta }},{\hat{\lambda }} ) + o_p(1)\\&\le P^*_{n}(\tilde{\beta },{\hat{\lambda }} ) + o_p(1) = L^*_{n}(\tilde{\beta },{\hat{\lambda }} ) + o_p(1) \end{aligned}$$

and \( L^*_{n}(\tilde{\beta },{\hat{\lambda }} )\le L^*_{n}(\tilde{\beta },\tilde{\lambda }).\) That is, \(L^*_{n}(\tilde{\beta },{\hat{\lambda }} )= L^*_{n}(\tilde{\beta },\tilde{\lambda })+o_p(1)\).

Proof of Theorem 1.

First, we show the assertion (i) by establishing the relations \({n^{1/2}}({\hat{\beta }}-\tilde{\beta }) = o_p(1)\) and \({n^{1/2}}({\hat{\lambda }}-\tilde{\lambda }) = o_p(1)\). For \({\hat{\beta }}\), by \(L^*_n({\hat{\beta }},{\hat{\lambda }}) - L^*_n(\tilde{\beta },{\hat{\lambda }}) = o_p(1)\), we have

$$\begin{aligned} o_p\left( \frac{1}{n}\right) = \frac{1}{n}(L^*_n({\hat{\beta }},{\hat{\lambda }}) - L^*_n(\tilde{\beta },{\hat{\lambda }})) = -\left( {\hat{\beta }}-\tilde{\beta }\right) ^\top G^\top {\hat{\lambda }}. \end{aligned}$$

Since G is full rank and \({\hat{\lambda }} = O_p(n^{-1/2})\) by Lemma 4, we get \({\hat{\beta }}-\tilde{\beta }= o_p(n^{-1/2})\). For \({\hat{\lambda }}\), from (iii) in the proof of Lemma 8, (41) and (42), we obtain

$$\begin{aligned} o_p\left( \frac{1}{n}\right) = \frac{1}{n}(L^*_n(\tilde{\beta },{\hat{\lambda }}) - L^*_n(\tilde{\beta },\tilde{\lambda })) = -\frac{1}{2}\left( {\hat{\lambda }} - \tilde{\lambda }\right) ^\top {\varOmega }\left( {\hat{\lambda }} - \tilde{\lambda }\right) . \end{aligned}$$

Since \({\varOmega }\) is nonsingular, \({\hat{\lambda }} - \tilde{\lambda }= o_p(n^{-1/2})\). As in the proof of Lemma 1 of Li et al. (2011) and by (C3), we have

$$\begin{aligned} {n^{1/2}}{\hat{g}}^*(\beta _0) - \frac{1}{{n^{1/2}}}\sum ^{n}_{t=u+1}m_t^* = o_p\left( 1\right) , \end{aligned}$$

where \(m_t^* = \delta _{t-1} \text {sign}(e_t) Q_{t-1}\). Since \(m_t^*\) is a stationary ergodic square-integrable sequence of martingale differences with respect to \(\mathcal{F}_t = \sigma \{e_s : s\le t\}\), \(n^{-1/2}\sum ^{n}_{t=u+1}m_t^* \xrightarrow {\mathcal{L}}N(0_{m+d}, {\varOmega })\) by Theorem 3.2 of Hall and Heyde (1980). Therefore, we have \({n^{1/2}}{\hat{g}}^*(\beta _0) \xrightarrow {\mathcal{L}}N(0_{m+d}, {\varOmega })\) and get the desired result from (44).

Second, we show the assertion (ii). By Lemma 8, we expand the second part of (13) as

$$\begin{aligned} 2P^*_n({\hat{\beta }},{\hat{\lambda }})&= 2L^*_n(\tilde{\beta },\tilde{\lambda }) + o_p(1)\nonumber \\&= n {\hat{g}}^*(\beta _0)^{\top }\left( {\varOmega }-{\varOmega }^{-1} G {\varSigma }G^{\top }{\varOmega }^{-1} \right) {\hat{g}}^*(\beta _0) + o_p(1)\nonumber \\&= \left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} ^{\top } {\varLambda }\left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} + o_p(1), \end{aligned}$$

where \({\varLambda }= I_{m+d} -{\varOmega }^{-1/2} G {\varSigma }G^{\top }{\varOmega }^{-1/2}\). On the other hand, by the same argument as above,

$$\begin{aligned} 2\inf _{R\beta =c}\sup _{\lambda \in {\hat{{\varLambda }}}_n(\beta )}P^*_n(\beta ,\lambda )&= 2L^*_n(\tilde{\beta }^r,\tilde{\lambda }^r) + o_p(1), \end{aligned}$$

where \( \tilde{\beta }^r = \arg \min _{R\beta = c}\sup _{\lambda \in \mathbb {R}^{{m+d}}}{\hat{L}}_n(\beta , \lambda ) \) and \( \tilde{\lambda }^r = \arg \max _{\lambda \in \mathbb {R}^{{m+d}}}{\hat{L}}_n(\tilde{\beta }^r,\lambda )\). We first derive the stochastic expansion for \(L^*_n(\tilde{\beta }^r,\tilde{\lambda }^r)\). If we define the Lagrangian

$$\begin{aligned} F(\beta ,\lambda ,\eta ) = -\{ G(\beta -\beta _0) + {\hat{g}}^*(\beta _0) \}^{\top }\lambda - \frac{1}{2}\lambda ^{\top }{\varOmega }\lambda +\eta ^{\top }(R\beta -c), \end{aligned}$$

then the following first-order conditions hold:

$$\begin{aligned}&0_{m} = \frac{\partial F(\beta , \lambda , \eta )}{\partial \beta } = -G^{\top }\lambda +R^\top \eta , \nonumber \\&0_{m{+d}} = \frac{\partial F(\beta , \lambda , \eta )}{\partial \lambda } = -G(\beta -\beta _0) - {\hat{g}}^*(\beta _0) - {\varOmega }\lambda , \\&0_r = \frac{\partial F(\beta , \lambda , \eta )}{\partial \eta } = R\beta - c. \end{aligned}$$

Solving the above equations, finally we obtain

$$\begin{aligned} 2\inf _{R\beta =c}\sup _{\lambda \in {\hat{{\varLambda }}}_n(\beta )}P^*_n(\beta ,\lambda )&= \left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} ^{\top } {\varLambda }^r \left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} + o_p(1), \end{aligned}$$

where \({\varLambda }^r = I_{m{+d}} -{\varOmega }^{-1/2} G P^r G^{\top }{\varOmega }^{-1/2}\) and \(P^r = {\varSigma }-{\varSigma }R^{\top }(R{\varSigma }R^{\top })^{-1}R{\varSigma }\).

By (47) and (48), \(r_{\rho ,n}^*\) is shown to have the expansion

$$\begin{aligned} r_{\rho ,n}^* = \left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} ^{\top } ({\varLambda }^r - {\varLambda }) \left\{ n^{1/2}{\varOmega }^{-1/2}{\hat{g}}^*(\beta _0)\right\} + o_p(1). \end{aligned}$$

It is also easily shown that \(({\varLambda }^r - {\varLambda })^2={\varLambda }^r - {\varLambda }\) and \(\mathrm{rank}({\varLambda }^r - {\varLambda }) = r\). Therefore, \(r_{\rho ,n}^*\xrightarrow {\mathcal{L}}\chi ^2_r\) as \(n\rightarrow \infty \) by Rao and Mitra (1971).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Akashi, F. Self-weighted generalized empirical likelihood methods for hypothesis testing in infinite variance ARMA models. Stat Inference Stoch Process 20, 291–313 (2017).

Download citation


  • Generalized empirical likelihood
  • Linear hypothesis
  • Heavy-tailed time series
  • Infinite variance
  • Self-weighted least absolute deviations