Skip to main content
Log in

On estimation of nonparametric regression models with autoregressive and moving average errors

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

The nonparametric regression model with correlated errors is a powerful tool for time series forecasting. We are interested in the estimation of such a model, where the errors follow an autoregressive and moving average (ARMA) process, and the covariates can also be correlated. Instead of estimating the constituent parts of the model in a sequential fashion, we propose a spline-based method to estimate the mean function and the parameters of the ARMA process jointly. We establish the desirable asymptotic properties of the proposed approach under mild regularity conditions. Extensive simulation studies demonstrate that our proposed method performs well and generates strong evidence supporting the established theoretical results. Our method provides a new addition to the arsenal of tools for analyzing serially correlated data. We further illustrate the practical usefulness of our method by modeling and forecasting the weekly natural gas scraping data for the state of Iowa.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Armstrong J.S., Collopy F (1992) Error measures for generalizing about forecasting methods: empirical comparisons. International Journal of Forecasting, 8(1), 69–80 .

    Article  Google Scholar 

  • Bowerman B.L., O’Connell R.T., Koehler A.B. (2005) Forecasting, time series, and regression: an applied approach, 4th ed., Boston, MA: Brooks/Cole, Cengage Learning

    Google Scholar 

  • Box G.E., Jenkins G.M., Reinsel G.C., Ljung M., (2016) Time series analysis: forecasting and control, 5th ed., Hoboken, New Jersey: John Wiley and Sons Inc.

    Google Scholar 

  • Brockwell P.J., Davis R.A., (1991) Time series: theory and methods. 2nd ed. Springer Series in Statistics New York: Springer.

    Book  Google Scholar 

  • Carroll R.J., Fan J., Gijbels I., Wand M.P. (1997) Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489

    Article  MathSciNet  Google Scholar 

  • Chernozhukov V., Chetverikov D., Kato K. (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6), 2786–2819

    Article  MathSciNet  Google Scholar 

  • Davis R.A., Dunsmuir W.T. (1997) Least absolute deviation estimation for regression with ARMA errors. Journal of Theoretical Probability, 10(2), 481–497

    Article  MathSciNet  Google Scholar 

  • Davis R.A., Knight K., Liu J. (1992) M-estimation for autoregressions with infinite variance. Stochastic Processes and Their Applications, 40(1), 145–180

    Article  MathSciNet  Google Scholar 

  • De Boor C., De Boor C. (1978) A practical guide to splines, Vol. 27, New York: Springer

    Book  Google Scholar 

  • Durbán M., Currie I.D. (2003) A note on p-spline additive models with correlated errors. Computational Statistics, 18(2), 251–262

    Article  MathSciNet  Google Scholar 

  • Fan J. (1993) Local linear regression smoothers and their minimax efficiencies. The Annals of Statistics, 21(1), 196–216

    Article  MathSciNet  Google Scholar 

  • Ganesh E., Rajendran V., Ravikumar D., Kumar P.S., Revathy G., Harivardhan P. (2021) Detection and route estimation of ship vessels using linear filtering and ARMA model from AIS data. International Journal of Oceans and Oceanography 15(1), 1–10

    Google Scholar 

  • Greenhouse J.B., Kass R.E., Tsay R.S. (1987) Fitting nonlinear models with ARMA errors to biological rhythm data. Statistics in Medicine 6(2), 167–183

    Article  CAS  PubMed  Google Scholar 

  • Hall P., Heyde C.C. (2014) Martingale limit theory and its application, New York: Academic Press Inc

    Google Scholar 

  • Hall P., Keilegom I. V. (2003) Using difference-based methods for inference in nonparametric regression with time series errors. Journal of the Royal Statistical Society. Series B, 65(2), 443–456

    Article  MathSciNet  Google Scholar 

  • Hart J.D. (1994) Automated kernel smoothing of dependent data by using time series cross-validation. Journal of the Royal Statistical Society, Series B, 56(3), 529–542

    MathSciNet  Google Scholar 

  • Hart J.D., Wehrly T.E. (1986) Kernel regression estimation using repeated measurements data. Journal of the American Statistical Association, 81(396), 1080–1088

    Article  MathSciNet  Google Scholar 

  • Hastie T.J., Tibshirani R.J. (1990) Generalized additive models, Boca Raton: Routledge

    Google Scholar 

  • Huang J.Z. (2003) Local asymptotics for polynomial spline regression. The Annals of Statistics, 31(5), 1600–1635

    Article  MathSciNet  Google Scholar 

  • Hyndman R.J., Koehler A.B., Ord J.K., Snyder R.D. (2008) Forecasting with exponential smoothing: the state space approach, Berlin: Springer-Verlag

    Book  Google Scholar 

  • Kohn R., Ansley C.F., Wong C.-M. (1992) Nonparametric spline regression with autoregressive moving average errors. Biometrika, 79(2), 335–346

    Article  MathSciNet  Google Scholar 

  • Krivobokova T., Kauermann G. (2007) A note on penalized spline smoothing with correlated errors. Journal of the American Statistical Association, 102(480), 1328–1337

    Article  MathSciNet  CAS  Google Scholar 

  • Lee Y.K., Mammen E., Park B.U. (2010) Bandwidth selection for kernel regression with correlated errors. Statistics, 44(4), 327–340

    Article  MathSciNet  Google Scholar 

  • Liang H.-Y., Jing B.-Y. (2009) Asymptotic normality in partial linear models based on dependent errors. Journal of statistical planning and inference, 139(4), 1357–1371

    Article  MathSciNet  Google Scholar 

  • Merlevède F., Peligrad M., Rio E. (2011) A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151(3–4), 435–474

    Article  MathSciNet  Google Scholar 

  • Miaou S.-P. (1990) A stepwise time series regression procedure for water demand model identification. Water Resources Research, 26(9), 1887–1897

    Article  ADS  Google Scholar 

  • Mokkadem A. (1988) Mixing properties of ARMA processes. Stochastic Processes and Their Applications, 29(2), 309–315

    Article  MathSciNet  Google Scholar 

  • Opsomer J., Wang Y., Yang Y. (2001) Nonparametric regression with correlated errors. Statistical Science, 16(2), 134–153

    Article  MathSciNet  Google Scholar 

  • Petropoulos F., Apiletti F., Assimakopoulos V., Babai M.Z., Barrow D.K., Ben Taieb S., Ziel F. et al. (2022) Forecasting: Theory and practice. International Journal of Forecasting, 38(3), 705–871

    Article  Google Scholar 

  • Qiu D., Shao Q., Yang L. (2013) Efficient inference for autoregressive coefficients in the presence of trends. Journal of Multivariate Analysis, 114, 40–53

    Article  MathSciNet  Google Scholar 

  • Roussas G.G., Tran L.T. (1992) Asymptotic normality of the recursive kernel regression estimate under dependence conditions. The Annals of Statistics 20(1), 98–120

    Article  MathSciNet  Google Scholar 

  • Roussas G.G., Tran L.T., Ioannides D.A. (1992) Fixed design regression for time series: Asymptotic normality. Journal of Multivariate Analysis 40(2), 262–291

    Article  MathSciNet  Google Scholar 

  • Serra, P., Krivobokova, T., and Rosales, F. (2018) Adaptive non-parametric estimation of mean and autocovariance in regression with dependent errors. arXiv preprintarXiv:1812.06948.

  • Shao Q., Yang L. (2011) Autoregressive coefficient estimation in nonparametric analysis. Journal of Time Series Analysis 32(2), 587–597

    Article  MathSciNet  Google Scholar 

  • Shao Q., Yang L. (2017) Oracally efficient estimation and consistent model selection for auto-regressive moving average time series with trend. Journal of the Royal Statistical Society Series B 79(2), 507–524

    Article  MathSciNet  Google Scholar 

  • Stone C.J. (1968) Optimal rates of convergence for nonparametric estimators. The Annals of Statistics 8(6), 1348–1360

    Article  MathSciNet  Google Scholar 

  • Stone C.J. (1986) The dimensionality reduction principle for generalized additive models. The Annals of Statistics 14(2), 590–606

    Article  MathSciNet  Google Scholar 

  • Straumann D., Mikosch T. (2006) Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The Annals of Statistics 34(5), 2449–2495

    Article  MathSciNet  Google Scholar 

  • Tibshirani R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288

    MathSciNet  Google Scholar 

  • Tran L., Roussas G., Yakowitz S., Van B.T. (1996) Fixed-design regression for linear time series. The Annals of Statistics, 24(3), 975–991

    Article  MathSciNet  Google Scholar 

  • Truong Y.K. (1991) Nonparametric curve estimation with time series errors. Journal of Statistical Planning and Inference, 28(2), 167–183

    Article  MathSciNet  Google Scholar 

  • Truong-Van B., Bru N. (2001) Asymptotic normality of spline estimator when the errors are a linear stationary process. Journal of Nonparametric Statistics, 13(5), 741–761

    Article  MathSciNet  Google Scholar 

  • Van de Geer S., Bühlmann P., Ritov Y., Dezeure R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics,42(3), 1166–1202

    MathSciNet  Google Scholar 

  • Volkonskii V., Rozanov Y.A. (1959) Some limit theorems for random functions. I. Theory of Probability & Its Applications, 4(2), 178–197

    Article  MathSciNet  Google Scholar 

  • Wu R., Wang Q. (2012) Shrinkage estimation for linear regression with ARMA errors. Journal of Statistical Planning and Inference, 142(7), 2136–2148

    Article  MathSciNet  Google Scholar 

  • Zhou S., Shen X., Wolfe D. (1998) Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26(5), 1760–1782

    MathSciNet  Google Scholar 

  • Zinde-Walsh V., Galbraith J.W. (1991) Estimation of a linear regression model with stationary ARMA(p, q) errors. Journal of Econometrics, 47(2–3), 333–357

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the two anonymous referees for their invaluable comments and suggestions that have significantly improved the quality of the paper. The natural gas scrape data were obtained by the second author through the research contract (Grant 5040224) between Applied Mathematics Laboratory of Towson University and Exelon Generation Company LLC. This work was partially supported by the National Institutes of Health grant R03AG067611 and R21AG070659, and the National Science Foundation grant DMS-1952486.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The online version of this article contains supplementary material.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 255 KB)

Appendix

Appendix

1.1 A. The main results and proofs

We present the main results in this section. The proofs for Theorem 1 and Theorem 2 are provided. The remaining proofs are all relegated to the supplementary materials.

As \(\mathcal {L}_{n}({\varvec{\xi }})=\sum _{t=1}^{n} \zeta _{t}^{2}({\varvec{\xi }})\) is not convex with respect to \({\varvec{\xi }}\), due to the MA component \({\varvec{\theta }}\), in order to study the asymptotic property of \({\hat{{\varvec{\xi }}}}\), we employ a second-order Taylor’s expansion of \(\zeta _{t}({\varvec{\xi }})\) around \({\varvec{\xi }}_{*}\) (Davis and Dunsmuir, 1997): \(\zeta _{t}({\varvec{\xi }})\approx \zeta _{t}({\varvec{\xi }}_{*})-\textbf{D}_{t}^{{\top }}({\varvec{\xi }}_{*})({\varvec{\xi }}-{\varvec{\xi }}_{*})-({\varvec{\xi }}-{\varvec{\xi }}_{*})^{{\top }}\textbf{H}_{t}({\varvec{\xi }}_{*})({\varvec{\xi }}-{\varvec{\xi }}_{*})/2\), where \(\textbf{D}_{t}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\xi }}\) and \(\textbf{H}_{t}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/(\partial {\varvec{\xi }}\partial {\varvec{\xi }}^{{\top }})\).

We decompose \(\textbf{D}_{t}({\varvec{\xi }})\) as \((\textbf{D}_{t1}({\varvec{\xi }}), \textbf{D}_{t2}({\varvec{\xi }}), \textbf{D}_{t3}({\varvec{\xi }}))^{{\top }}\), such that \(\textbf{D}_{t1}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\), \(\textbf{D}_{t2}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\), and \(\textbf{D}_{t3}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\theta }}\), and partition \(\textbf{H}_{t}({\varvec{\xi }})\) as follows:

$$\begin{aligned} \textbf{H}_{t}({\varvec{\xi }})=\left( \begin{array}{ccc} \textbf{H}_{t,11}({\varvec{\xi }}) &{} \textbf{H}_{t,12}({\varvec{\xi }}) &{} \textbf{H}_{t,13}({\varvec{\xi }})\\ \textbf{H}_{t,21}({\varvec{\xi }}) &{} \textbf{H}_{t,22}({\varvec{\xi }})&{} \textbf{H}_{t,23}({\varvec{\xi }})\\ \textbf{H}_{t,31}({\varvec{\xi }}) &{} \textbf{H}_{t,32}({\varvec{\xi }}) &{} \textbf{H}_{t,33}({\varvec{\xi }}) \end{array} \right) \end{aligned}$$

where \(\textbf{H}_{t,11}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{{\top }}\) is a zero \(J\times J\) matrix, \(\textbf{H}_{t,12}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\phi }}^{{\top }}\) is a \(J\times p\) matrix, \(\textbf{H}_{t,13}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\theta }}^{{\top }}\) is a \(J\times q\) matrix, \(\textbf{H}_{t,21}({\varvec{\xi }})=\textbf{H}_{t,12}^{{\top }}({\varvec{\xi }})\), \(\textbf{H}_{t,22}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\partial {\varvec{\phi }}^{{\top }}\) is a zero \(p\times p\) matrices, \(\textbf{H}_{t,23}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\partial {\varvec{\theta }}^{{\top }}\) is a \(p\times q\) matrix, \(\textbf{H}_{t,31}({\varvec{\xi }})=\textbf{H}_{t,13}^{{\top }}({\varvec{\xi }})\), \(\textbf{H}_{t,32}({\varvec{\xi }})=\textbf{H}_{t,23}^{{\top }}({\varvec{\xi }})\), and \(\textbf{H}_{t,33}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\theta }}\partial {\varvec{\theta }}^{{\top }}\) is a \(q\times q\) matrix.

Let \([\textbf{A}]_{l}\) denote the \(l^{\text {th}}\) element of the vector \(\textbf{A}\). By simple algebra, we obtain that \(\textbf{D}_{t1}({\varvec{\xi }})={\varvec{\theta }}^{-1}(B){\varvec{\phi }}(B)\textbf{W}_{t}\), \(\left[ \textbf{D}_{t2}({\varvec{\xi }})\right] _{l}={\varvec{\theta }}^{-1}(B)\epsilon _{t-l}({\varvec{\beta }}), 1\le l\le p\), \(\left[ \textbf{D}_{t3}({\varvec{\xi }})\right] _{l}={\varvec{\theta }}^{-1}(B)\zeta _{t-l}({\varvec{\xi }})\), \(1\le l\le q\),

$$\begin{aligned}&\left[ \frac{\partial }{\partial {\varvec{\phi }}}\left[ \frac{\partial \zeta _{t}({\varvec{\xi }})}{\partial {\varvec{\beta }}}\right] _{l}\right] _{m}=\frac{1}{{\varvec{\theta }}(B)}\left[ \textbf{W}_{t-m}\right] _{l}, 1\le l\le J, 1\le m\le p, \\&\left[ \frac{\partial }{\partial {\varvec{\theta }}}\left[ \frac{\partial \zeta _{t}({\varvec{\xi }})}{\partial {\varvec{\beta }}}\right] _{l}\right] _{m}=\frac{{\varvec{\phi }}(B)}{{\varvec{\theta }}^{2}(B)}\left[ \textbf{W}_{t-m}\right] _{l}, 1\le l\le J, 1\le m\le q, \\&\left[ \frac{\partial }{\partial {\varvec{\theta }}}\left[ \frac{\partial \zeta _{t}({\varvec{\xi }})}{\partial {\varvec{\phi }}}\right] _{l}\right] _{m} = \frac{\epsilon _{t-l-m}({\varvec{\beta }})}{{\varvec{\phi }}(B){\varvec{\theta }}(B)}, 1\le l\le p, 1\le m\le q, \text { and }\\&\left[ \frac{\partial }{\partial {\varvec{\theta }}}\left[ \frac{\partial \zeta _{t}({\varvec{\xi }})}{\partial {\varvec{\theta }}}\right] _{l}\right] _{m} =\frac{2}{{\varvec{\theta }}^{2}(B)}\zeta _{t-l-m}({\varvec{\xi }}), 1\le l, m\le q. \end{aligned}$$

Furthermore, let \(\textbf{V}_{t}\) be a symmetric matrix of dimension \((J+p+q)\times (J+p+q)\), whose upper triangular elements are given as

$$\begin{aligned} \left[ \textbf{V}_{t}\right] _{l,m}=\left\{ \begin{array}{ll} 0 &{} \text {if } 1\le l\le m\le J \text { or } J+1\le l\le m\le J+p, \\ -{\varvec{\theta }}_{*}^{-1}(B)\left[ \textbf{W}_{t-(m-J)}\right] _{l} &{} \text {if } 1\le l\le J, 1\le m-J\le p, \\ -{\varvec{\theta }}_{*}^{-2}(B){\varvec{\phi }}_{*}(B)\left[ \textbf{W}_{t-(m-J-p)}\right] _{l} &{} \text {if } 1\le l\le J, 1\le m-J-p\le q, \\ -{\varvec{\theta }}_{*}^{-1}(B){\varvec{\phi }}_{*}^{-1}(B)\zeta _{t-(l-J)-(m-J-p)} &{} \text {if } 1\le l-J\le p, 1\le m-J-p\le q, \\ -2{\varvec{\theta }}_{*}^{-2}(B)\zeta _{t-(l-J-p)-(m-J-p)} &{} \text {if } J+p+1\le l\le m\le J+p+q. \end{array} \right. \end{aligned}$$

We partition \(\textbf{V}_{t}\) as follows:

$$\begin{aligned} \textbf{V}_{t}=\left( \begin{array}{ccc} \textbf{V}_{t,11} &{} \textbf{V}_{t,12} &{} \textbf{V}_{t,13}\\ \textbf{V}_{t,21} &{} \textbf{V}_{t,22} &{} \textbf{V}_{t,23}\\ \textbf{V}_{t,31} &{} \textbf{V}_{t,32} &{} \textbf{V}_{t,33} \end{array} \right) , \end{aligned}$$

where \(\textbf{V}_{t,11}\) is a \(J\times J\) matrix, \(\textbf{V}_{t,12}\) is a \(J\times p\) matrix, \(\textbf{V}_{t,13}\) is a \(J\times q\) matrix, \(\textbf{V}_{t,22}\) is a \(p\times p\) matrix, \(\textbf{V}_{t,23}\) is a \(p\times q\) matrix, and \(\textbf{V}_{t,33}\) is a \(q\times q\) matrix. By the definition, \(\textbf{V}_{t,11}=\textbf{0}\) and \(\textbf{V}_{t,22}=\textbf{0}\).

In addition, let \(R_{t}=(g_{0}(X_{t})-{\varvec{\beta }}_{*}^{{\top }}\textbf{B}(X_t))1\{t>0\}=(\epsilon _{t}({\varvec{\beta }}_{*})-\epsilon _{t})1\{t>0\}\) be the spline approximation error at time t. In the following Proposition 1, we show that \(\textbf{D}_{t}({\varvec{\xi }}_{*})\) and \(\textbf{H}_{t}({\varvec{\xi }}_{*})\) are well approximated by \(\textbf{Q}_{t}\) and \(\textbf{V}_{t}\), respectively.

Proposition 1

Suppose Conditions (C1) – (C4) hold. There exists some constants \(\delta _1\) and \(\delta _2\), such that for all \(\Vert {\varvec{\beta }}-{\varvec{\beta }}_{*}\Vert \le \delta _{1}, \Vert ({\varvec{\phi }}^{{\top }},{\varvec{\theta }}^{{\top }})-({\varvec{\phi }}_{*}^{{\top }},{\varvec{\theta }}_{*}^{{\top }})\Vert \le \delta _{2}\),

  1. (i)

    \(\left| \zeta _{t}\right| \le \eta _{t}\), \(|\zeta _{t}({\varvec{\xi }}_{*})-{\varvec{\phi }}_{*}(B){\varvec{\theta }}_{*}^{-1}(B)R_{t}-\zeta _{t}|\le r^{t}\eta _{0}\), \(|\zeta ({\varvec{\xi }})|\le \eta _{t}+C_{2}(\varDelta +\delta _{1})\), and \(\left| \zeta _{t}({\varvec{\xi }})-\zeta _{t}({\varvec{\xi }}_{*})\right| \le C_{3}\delta _{2}\eta _{t}+C_{2}C_{3}\delta _{2}(\delta _{1}+\varDelta )+C_{2}\delta _{1}\),

  2. (ii)

    \(\left\| \textbf{D}_{t}({\varvec{\xi }}) \right\| _{\infty }\le \omega _{t}\), \(\textbf{D}_{t1}({\varvec{\xi }}_{*})-\textbf{Q}_{t1}=\textbf{0}\), and \(\left\| \left( \textbf{D}_{t2}^{{\top }}({\varvec{\xi }}_{*}),\textbf{D}_{t3}^{{\top }}({\varvec{\xi }}_{*})\right) -(\textbf{Q}_{t2}^{{\top }},\textbf{Q}_{t3}^{{\top }}) \right\| _{\infty }\le r^{t}\eta _{0}+C_{2}\varDelta\),

  3. (iii)

    \(\left\| \textbf{H}_{t}({\varvec{\xi }})\right\| _{\max }\le \omega _{t}\), \(\textbf{H}_{t,11}({\varvec{\xi }}_{*})-\textbf{V}_{t,11}=\textbf{0}\), and \(\left\| \textbf{H}_{t}({\varvec{\xi }}_{*})-\textbf{V}_{t}\right\| _{\max } \le r^{t}\eta _{0}+C_{2}\varDelta\),

where \(\eta _{t}=C_{1}\sum _{j=0}^{\infty }r^{j}\left| \epsilon _{t-j}\right|\), \(\omega _{t}= \max \left\{ C_{2}, r^{-(p+q)}\eta _{t}+C_{2}\left( \varDelta +\delta _{1}\right) \right\}\), and \(C_{3}\) is defined in Lemma 7.

Proposition 1 indicates that \(\textbf{D}_{t}({\varvec{\xi }}_{*})\) and \(\textbf{H}_{t}({\varvec{\xi }}_{*})\) can be approximated by \(\textbf{Q}_{t}\) and \(\textbf{V}_{t}\), respectively. Moreover, if \({\varvec{\xi }}\) is sufficiently close to the true parameters \({\varvec{\xi }}_{*}\), \(\Vert \textbf{D}_{t}({\varvec{\xi }})\Vert _{\infty }\) and \(\Vert \textbf{H}_{t}({\varvec{\xi }})\Vert _{\max }\) are bounded and the difference between \(\zeta _{t}({\varvec{\xi }})\) and \(\zeta _{t}({\varvec{\xi }}_{*})\) is well bounded, too.

To circumvent the non-convexity of \(T(\textbf{h})\) with respect to \(\textbf{h}\), we study a convex objective function

$$\begin{aligned} T_{1}(\textbf{h})=\sum _{t=1}^{n}\left[ \left( \zeta _{t}+\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}-\textbf{h}^{{\top }}\textbf{Q}_{t} \right) ^{2}-\left( \zeta _{t}+\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}\right) ^{2} \right] . \end{aligned}$$

To facilitate the investigation of the property of \(T_{1}(\textbf{h})\), two extra terms, \(T_{2}(\textbf{h})\) and \(T_{3}(\textbf{h})\), are introduced for the theoretical development

$$\begin{aligned}&T_{2}(\textbf{h})=\sum _{t=1}^{n}\left[ \left( \zeta _{t}+\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}-\textbf{h}^{{\top }}\textbf{Q}_{t}-\frac{1}{2}\textbf{h}^{{\top }}\textbf{V}_{t}\textbf{h}\right) ^{2}-\left( \zeta _{t}+\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}\right) ^{2} \right] , \\&T_{3}(\textbf{h})=\sum _{t=1}^{n}\left[ \left( \zeta _{t}({\varvec{\xi }}_{*})-\textbf{h}^{{\top }}\textbf{D}_{t}({\varvec{\xi }}_{*})- \frac{1}{2}\textbf{h}^{{\top }}\textbf{H}_{t}({\varvec{\xi }}_{*})\textbf{h}\right) ^{2}-\zeta _{t}^{2}({\varvec{\xi }}_{*}) \right] , \end{aligned}$$

which are to be investigated in the Lemmas 46 to bridge the gap between \(T_{1}(\textbf{h})\) and \(T(\textbf{h})\). It is noteworthy that, as these terms involve unknown quantities, such as \(\textbf{Q}_{t}\) and \(R_{t}\), they cannot be computed in practice.

In light of Lemmas 46, we first establish that \(T_{1}(\textbf{h})\) is an excellent approximation of \(T(\textbf{h})\). Define \(\varOmega (C):=\{\textbf{h}: \Vert \textbf{h}_{1}\Vert \le CJn^{-1/2}, \Vert \left( \textbf{h}_{2}^{{\top }},\textbf{h}_{3}^{{\top }} \right) \Vert \le CJ^{1/2}n^{-1/2} \}\) for any \(C>0\). We use \({\bar{\varOmega }}(C)\) and \(\varOmega ^{c}(C)\) to denote the boundary and the complement of \(\varOmega (C)\), respectively.

Proposition 2

Suppose Conditions (C1)–(C4) hold. If \(J=n^{1/(2\alpha +1)}\), for any \(C>0\),

$$\begin{aligned} \sup _{\textbf{h}\in \varOmega (C)}\left| T_{1}(\textbf{h})-T(\textbf{h})\right| \rightarrow _{p} 0. \end{aligned}$$

Proposition 2 is inspired by Davis and Dunsmuir (1997). It demonstrates that \(T(\textbf{h})\) can be well approximated by \(T_{1}(\textbf{h})\) locally. Therefore, we can study the properties of the minimizer of \(T_{1}(\textbf{h})\) and infer the properties of the minimizer of \(T(\textbf{h})\). We refer to Davis and Dunsmuir (1997) for a detailed discussion. We next show that \(T_{1}(\textbf{h})\) achieves its minimum in a ball round 0 in the following proposition.

Proposition 3

Under the same conditions as in Proposition 2, given any \(0<\varepsilon <1\), there exists some \(C_{\varepsilon }>0\), such that

$$\begin{aligned} P\left( \inf _{\textbf{h}\in {\bar{\varOmega }}(C_{\varepsilon })\bigcup \varOmega ^{c}(C_{\varepsilon })} T_{1}(\textbf{h})>1\right) >1-\varepsilon . \end{aligned}$$

Propositions 2 and 3 together enable us to establish the consistency of \({\hat{\textbf{h}}}\) and subsequently \({\hat{{\varvec{\xi }}}}\). Hence, the proofs for Theorem 1 and Theorem 2 are in order.

Proof of Theorem 1:

By Proposition 3, given any \(0<\varepsilon <1\), there exists some \(C_{\varepsilon }\), such that

$$\begin{aligned} P\left( \inf _{\textbf{h}\in {\bar{\varOmega }}(C_{\varepsilon })\bigcup \varOmega ^{c}(C_{\varepsilon })} T_{1}(\textbf{h})>1\right) >1-\varepsilon . \end{aligned}$$

Under the event \(\{\inf _{\textbf{h}\in {\bar{\varOmega }}(C_{\varepsilon })\bigcup \varOmega ^{c}(C_{\varepsilon })} T_{1}(\textbf{h})>1\}\), we claim that there exists a local minimizer of \(T(\textbf{h})\), \({\widehat{\textbf{h}}}\), which satisfies \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) but \({\widehat{\textbf{h}}}\notin {\bar{\varOmega }}(C_{\varepsilon })\). Suppose the claim is not true. We can find a \(\textbf{h}_{a} \in {\bar{\varOmega }}(C_{\varepsilon })\), such that \(T(\textbf{h}_{a})= \min _{\textbf{h}\in \varOmega (C_{\varepsilon })}T(\textbf{h}).\)

By Proposition 2, for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{1}(\textbf{h})-T(\textbf{h})\right| \rightarrow _{p} 0.\) Choose C as \(C_{\varepsilon }\). Then \(0\ge T(\textbf{h}_{a})-T(\textbf{0})\rightarrow _{p}T_{1}(\textbf{h}_{a})-T_{1}(\textbf{0})= T_{1}(\textbf{h}_{a})>1\). Contradiction! Therefore, for any \(0<\varepsilon <1\), there exists \(C_{\varepsilon }\), such that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) with probability at least \(1-\varepsilon.\)

Given any \(\textbf{h}\in \varOmega (C_{\varepsilon })\), \(E\left[ \textbf{h}_{1}^{{\top }}\textbf{W}_{t}\textbf{W}_{t}^{{\top }}\textbf{h}_{1} \right] \le \lambda _{\max }J^{-1}\left( C_{\varepsilon }^2J^{2}n^{-1}\right) =\lambda _{\max }C_{\epsilon }^{2}Jn^{-1}\). Noting that \({\hat{{\varvec{\xi }}}}={\varvec{\xi }}_{*}+{\widehat{\textbf{h}}}\), with probability at least \(1-\varepsilon\),

$$\begin{aligned}&{\hspace{0.2in}} E\left[ \big ({\hat{g}}(X_{t})-g_{0}(X_{t})\big )^{2}\right] \le 2 E\left[ \big ({\hat{g}}(X_{t})-g_{*}(X_{t})\big )^{2}\right] +2E\left[ \big (g_{*}(X_{t})-g_{0}(X_{t})\big )^{2}\right] \\&=E\left[ {\widehat{\textbf{h}}}_{1}^{{\top }}\textbf{W}_{t}\textbf{W}_{t}^{{\top }}{\widehat{\textbf{h}}}_{1}\right] +2C_{0}^{2}J^{-2\alpha }\le 2\lambda _{\max }C_{\epsilon }^{2}Jn^{-1}+2C_{0}^{2}J^{-2\alpha }. \end{aligned}$$

Thus, \(E\left[ \big ({\hat{g}}(X_{t})-g_{0}(X_{t})\big )^{2}\right] =O_{p}(Jn^{-1}+J^{-2\alpha })=O_{p}\left( n^{-2\alpha /(2\alpha +1)}\right)\). This completes the proof of Theorem 1. \(\square\)

Proof of Theorem 2:

In the proof of Theorem 1, we have shown that for any \(0<\varepsilon <1\), there exists \(C_{\varepsilon }\), such that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) with probability at least \(1-\varepsilon\). Thus, we restrict our attention to the event that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\).

We consider that \(S(\textbf{b}_{2},\textbf{b}_{3}):= T_{1}(({\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{b}_{2}^{{\top }}/\sqrt{n},\textbf{b}_{3}^{{\top }}/\sqrt{n})^{{\top }})- T_{1}(({\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }})^{{\top }})\). It is easily seen that

$$\begin{aligned} S(\textbf{b}_{2},\textbf{b}_{3})&= \sum _{t=1}^{n}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) ^2-2\sum _{t=1}^{n}\zeta _{t}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \\&{\hspace{0.2in}}-2\sum _{t=1}^{n}\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \\&{\hspace{0.2in}} +2\sum _{t=1}^{n}{\widehat{\textbf{h}}}_{1}^{{\top }}\textbf{Q}_{t1}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}} +\frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \end{aligned}$$

By Lemma 3, we obtain that

$$\begin{aligned}{} & {} {\hspace{0.2in}} \sum _{t=1}^{n}\left[ \left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) ^2-2\zeta _{t}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \right] \nonumber \\{} & {} \rightarrow _{d}\sigma ^2\left( \textbf{b}_{2}^{{\top }}, \textbf{b}_{3}^{{\top }}\right) {\varvec{\varSigma }}\left( \textbf{b}_{2}, \textbf{b}_{3}\right) -2\left( \textbf{b}_{2}^{{\top }}, \textbf{b}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }}), \end{aligned}$$
(4)

over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).

According to Condition (C3), \(\{\zeta _{t}\}\) and \(\{X_{t}\}\) are independent. Hence, \(\{R_{t}\}\) and \(\{(\textbf{Q}_{t2}, \textbf{Q}_{t3})\}\) are independent. As \(|R_{t}|\le \varDelta \le C_{0}J^{-\alpha }\) and hence \(|{\varvec{\phi }}_{*}(B){\varvec{\theta }}_{*}^{-1}(B)R_{t}|\le C_{0}C_{2}J^{-\alpha }\rightarrow 0\), by the same arguments as used for Lemma 3, we can show that

$$\begin{aligned} \sup _{\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C} 2\left| \sum _{t=1}^{n}\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}R_{t}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \right| =o_{p}(1). \end{aligned}$$
(5)

The independence between \(\{\zeta _{t}\}\) and \(\{X_{t}\}\) again implies the independence between \(\textbf{Q}_{t1}\) and \((\textbf{Q}_{t2},\textbf{Q}_{t3})\), \(\textbf{b}_{1}^{{\top }}\textbf{Q}_{t1}\). Thus, \(E\left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\left( \textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}+ \textbf{b}_{3}^{{\top }}\textbf{Q}_{t3} \right) \right] =0\), as \(E\left[ \textbf{Q}_{t2}\right] =E\left[ \textbf{Q}_{t3}\right] =0\). Noting that \(\Vert {\widehat{\textbf{h}}}_{1}\Vert \le C_{\varepsilon }Jn^{-1/2}\), it follows from Lemma 2 that

$$\begin{aligned} 2\left| \sum _{t=1}^{n}{\widehat{\textbf{h}}}_{1}^{{\top }}\textbf{Q}_{t1}\left( \frac{\textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}}{\sqrt{n}}+ \frac{\textbf{b}_{3}^{{\top }}\textbf{Q}_{t3}}{\sqrt{n}} \right) \right|= & {} C_{\varepsilon }Jn^{-1/2}O_{p}\left( 7(p^{1/2}+q^{1/2})\sqrt{C_{4}J\log n}\right) \nonumber \\= & {} o_{p}(1). \end{aligned}$$
(6)

Combining (4), (5), and (6) together yield that

$$\begin{aligned} S(\textbf{b}_{2},\textbf{b}_{3})\rightarrow _{d} \sigma ^2\left( \textbf{b}_{2}^{{\top }}, \textbf{b}_{3}^{{\top }}\right) {\varvec{\varSigma }}\left( \textbf{b}_{2}, \textbf{b}_{3}\right) -2\left( \textbf{b}_{2}^{{\top }}, \textbf{b}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }}) \end{aligned}$$

over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).

Following from Lemmas 46, we have uniformly over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).

$$\begin{aligned}&{\hspace{0.2in}} T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{b}_{2}^{{\top }}/\sqrt{n}, \textbf{b}_{3}^{{\top }}/\sqrt{n}\right) ^{{\top }}\right) -T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }}\right) ^{{\top }}\right) \rightarrow _{p} S(\textbf{b}_{2},\textbf{b}_{3}). \end{aligned}$$

Noting that \(N(0, \sigma ^2{\varvec{\varSigma }}^{-1})\) is the minimizer of the random process which \(S(\textbf{b}_{2},\textbf{b}_{3})\) converges to, by Lemma 2.2 and Remark 1 in Davis et al. (1992), there exists \(\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{2}^{{\top }}\right)\), a local minimizer of \(T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{b}_{2}^{{\top }}/\sqrt{n}, \textbf{b}_{3}^{{\top }}/\sqrt{n}\right) ^{{\top }}\right) -T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }}\right) ^{{\top }}\right)\), such that \(\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{3}^{{\top }}\right) ^{{\top }}\rightarrow _{d} N(0, \sigma ^2{\varvec{\varSigma }}^{-1})\).

Since \({\widehat{\textbf{h}}}\) is the minimizer of \(T(\textbf{h})\), \(\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{2}^{{\top }}\right)\) must also be the minimizer of

$$\begin{aligned} T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) ^{{\top }}\right) -T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }}\right) ^{{\top }}\right) . \end{aligned}$$

We thus have \(\sqrt{n}\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{2}^{{\top }}\right) =\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{2}^{{\top }}\right)\) and \(\sqrt{n}\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{3}^{{\top }}\right) ^{{\top }}\rightarrow _{d} N\left( 0, \sigma ^2{\varvec{\varSigma }}^{-1}\right)\). This completes the proof of Theorem 2. \(\square\)

1.2 B. Preliminary proposition and lemmas

Next, we present the technical proposition and lemmas that are used in the proofs of our theorems and corollaries. The proofs of the proposition and lemmas are relegated to supplementary materials.

Proposition 4

If Condition (C4) is satisfied,

$$\begin{aligned} \sup _{\Vert \textbf{h}_{1}\Vert =1, \Vert ({\varvec{\phi }}^{{\top }},{\varvec{\theta }}^{{\top }})-({\varvec{\phi }}_{*}^{{\top }},{\varvec{\theta }}_{*}^{{\top }})\Vert \le \delta _{2}}\textbf{h}_{1}^{{\top }} E\left[ \left( \frac{{\varvec{\phi }}(B)}{{\varvec{\theta }}(B)}\textbf{W}_{t}\right) \left( \frac{{\varvec{\phi }}(B)}{{\varvec{\theta }}(B)}\textbf{W}_{t}^{{\top }}\right) \right] \textbf{h}_{1}\le \lambda _{\max }J^{-1}C_{2}^{2}, \end{aligned}$$

where \(\delta _{2}\) is chosen as in Proposition 1.

Lemma 1

Suppose Condition (C3) holds. Then

  1. (i)

    \(P\left( |\zeta _{t}|>v\right) \le 2\exp \left( \frac{-v^{2}}{2(C_{B}^{2}+C_{B}v)}\right)\) and

  2. (ii)

    \(E[\left| \sum _{i=0}^{\infty }a_{i}\zeta _{t-i}\right| ^{k} ]\le \left( \sum _{i=0}^{\infty }|a_{i}|\right) ^{k}k!C_{B}^{k}/2\), for any a sequence \(\{a_{t}, t\ge 0\}\) and \(k\ge 1\).

Lemma 2

Suppose Conditions (C1)–(C4) hold. There exists some constant \(C_{4}>0\) that does not depend on n, such that if \(J=O(n^{1/(2\alpha +1)})\),

  1. (i)
    $$\begin{aligned} P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1} \left| \mathbb {G}_{n} \left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\zeta _{t}^{2}\right] \right| >7C_{2}\sqrt{C_{4}J\log n} \right) \le 2\exp (-6J\log n). \end{aligned}$$
  2. (ii)
    $$\begin{aligned} \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \textbf{h}_{1}\ne 0} \left| \left( \sigma ^{2}E\left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\right] \right) ^{-1} \mathbb {E}_{n}\left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\zeta _{t}^{2}\right] \right| =1+o_{p}(1). \end{aligned}$$
  3. (iii)
    $$\begin{aligned}&P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \Vert \textbf{h}_{2}\Vert \le 1} n^{-1/2}\left| \mathbb {G}_{n} \left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\textbf{Q}_{t2}^{{\top }}\textbf{h}_{2}\right] \right|>7p^{1/2}\sqrt{C_{4}Jn^{-1}\log n} \right) \\ \le&2p\exp (-6J\log n).\\&P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \Vert \textbf{h}_{3}\Vert \le 1} n^{-1/2}\left| \mathbb {G}_{n} \left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\textbf{Q}_{t3}^{{\top }}\textbf{h}_{3}\right] \right| >7q^{1/2}\sqrt{C_{4}Jn^{-1}\log n} \right) \\ \le&2q\exp (-6J\log n). \end{aligned}$$

Lemma 3

Suppose Conditions (C1)–(C4) hold. Then,

  1. (i)

    \(\sup _{\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}) \Vert \le 1} \left| \mathbb {E}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})^{2}\zeta _{t}^{2}\right] - \sigma ^{2}\left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) {\varvec{\varSigma }}\left( \textbf{h}_{2}^{{\top }},\textbf{h}_{3}^{{\top }}\right) ^{{\top }} \right| \rightarrow _{a.s.}0\),

  2. (ii)

    \(\mathbb {G}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})\zeta _{t}\right] \rightarrow _{d} \left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }})\), given any \((\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\) such that \(\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\Vert \le C\), for any \(C>0\).

  3. (iii)

    \(\mathbb {G}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})\zeta _{t}\right] \rightarrow _{d} \left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }})\) on \(\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\Vert \le C\), for any \(C>0\).

Lemmas 46 follow from the steps in Davis and Dunsmuir (1997) and Brockwell and Davis (1991).

According to Proposition 1, \(\left| \zeta _{t}\right| \le \eta _{t}\), \(\Vert \textbf{Q}_{t}\Vert _{\infty }\le \Vert \textbf{Q}_{t}-\textbf{D}_{t}({\varvec{\xi }}_{*})\Vert _{\infty }+\Vert \textbf{D}_{t}({\varvec{\xi }}_{*})\Vert _{\infty }\le r^{t}\eta _{0}+C_{2}\varDelta +\omega _{t}=: \chi _t\), and similarly \(\Vert \textbf{V}_{t}\Vert _{\max }\le \chi _t\). Thus,

$$\begin{aligned} \left| \textbf{h}^{{\top }}\textbf{Q}_{t}\right|\le & {} \Vert \textbf{h}_{1}\textbf{Q}_{t1}\Vert +\Vert \textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3}\Vert \le C_{2}\Vert \textbf{h}_{1}\Vert +\chi _{t}(\sqrt{p}\Vert \textbf{h}_{2}\Vert +\sqrt{q}\Vert \textbf{h}_{3}\Vert ), \end{aligned}$$
(7)
$$\begin{aligned} \left| \textbf{h}^{{\top }}\textbf{V}_{t}\textbf{h}\right|= & {} \left| 2\textbf{h}_{2}^{{\top }}\textbf{V}_{t,21}\textbf{h}_{1}+2\textbf{h}_{3}^{{\top }}\textbf{V}_{t,31}\textbf{h}_{1}+2\textbf{h}_{3}^{{\top }}\textbf{V}_{t,32}\textbf{h}_{2}+ \textbf{h}_{3}^{{\top }}\textbf{V}_{t,33}\textbf{h}_{3}\right| \nonumber \\&{\hspace{-0.15in}} \le {\hspace{-0.15in}}&2C_{2}(\sqrt{p}\Vert \textbf{h}_{2}\Vert +\sqrt{q}\Vert \textbf{h}_{3}\Vert )\Vert \textbf{h}_{1}\Vert +2\sqrt{pq}\chi _{t}\Vert \textbf{h}_{2}\Vert \Vert \textbf{h}_{3}\Vert +q\chi _{t}\Vert \textbf{h}_{3}\Vert ^{2}. \end{aligned}$$
(8)

Lemma 4

Suppose Conditions (C1) – (C4) hold. If \(J^{2}\log n=o(n^{1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{1}(\textbf{h})-T_{2}(\textbf{h})\right| \rightarrow _{p} 0\).

Lemma 5

Suppose Conditions (C1) – (C4) hold. If \(J^{-2\alpha +1/2}=o(n^{-1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{2}(\textbf{h})-T_{3}(\textbf{h})\right| \rightarrow _{p} 0\).

Lemma 6

Suppose Conditions (C1) – (C4) hold. If \(J^{2}\log n=o(n^{1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{3}(\textbf{h})-T(\textbf{h})\right| \rightarrow _{p} 0\).

Lemma 7

Under the same conditions as in Proposition 1, for any sequence \(\{a_{t}\}, t\ge 1\), there exists some constant \(C_{3}\) such that

$$\begin{aligned} {\hspace{0.2in}} \left| \left( \frac{{\varvec{\phi }}(B)}{{\varvec{\theta }}(B)}-\frac{{\varvec{\phi }}_{*}(B)}{{\varvec{\theta }}_{*}(B)}\right) a_{t}\right| \le C_{3}\delta _{2}\sum _{i=0}^{\infty }r^{i} |a_{t-i}|, \end{aligned}$$

where \(\delta _{2}\) and r are defined in Proposition 1.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Cui, Y. & Wu, R. On estimation of nonparametric regression models with autoregressive and moving average errors. Ann Inst Stat Math 76, 235–262 (2024). https://doi.org/10.1007/s10463-023-00882-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-023-00882-6

Keywords

Navigation