Abstract
The nonparametric regression model with correlated errors is a powerful tool for time series forecasting. We are interested in the estimation of such a model, where the errors follow an autoregressive and moving average (ARMA) process, and the covariates can also be correlated. Instead of estimating the constituent parts of the model in a sequential fashion, we propose a spline-based method to estimate the mean function and the parameters of the ARMA process jointly. We establish the desirable asymptotic properties of the proposed approach under mild regularity conditions. Extensive simulation studies demonstrate that our proposed method performs well and generates strong evidence supporting the established theoretical results. Our method provides a new addition to the arsenal of tools for analyzing serially correlated data. We further illustrate the practical usefulness of our method by modeling and forecasting the weekly natural gas scraping data for the state of Iowa.
Similar content being viewed by others
References
Armstrong J.S., Collopy F (1992) Error measures for generalizing about forecasting methods: empirical comparisons. International Journal of Forecasting, 8(1), 69–80 .
Bowerman B.L., O’Connell R.T., Koehler A.B. (2005) Forecasting, time series, and regression: an applied approach, 4th ed., Boston, MA: Brooks/Cole, Cengage Learning
Box G.E., Jenkins G.M., Reinsel G.C., Ljung M., (2016) Time series analysis: forecasting and control, 5th ed., Hoboken, New Jersey: John Wiley and Sons Inc.
Brockwell P.J., Davis R.A., (1991) Time series: theory and methods. 2nd ed. Springer Series in Statistics New York: Springer.
Carroll R.J., Fan J., Gijbels I., Wand M.P. (1997) Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489
Chernozhukov V., Chetverikov D., Kato K. (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6), 2786–2819
Davis R.A., Dunsmuir W.T. (1997) Least absolute deviation estimation for regression with ARMA errors. Journal of Theoretical Probability, 10(2), 481–497
Davis R.A., Knight K., Liu J. (1992) M-estimation for autoregressions with infinite variance. Stochastic Processes and Their Applications, 40(1), 145–180
De Boor C., De Boor C. (1978) A practical guide to splines, Vol. 27, New York: Springer
Durbán M., Currie I.D. (2003) A note on p-spline additive models with correlated errors. Computational Statistics, 18(2), 251–262
Fan J. (1993) Local linear regression smoothers and their minimax efficiencies. The Annals of Statistics, 21(1), 196–216
Ganesh E., Rajendran V., Ravikumar D., Kumar P.S., Revathy G., Harivardhan P. (2021) Detection and route estimation of ship vessels using linear filtering and ARMA model from AIS data. International Journal of Oceans and Oceanography 15(1), 1–10
Greenhouse J.B., Kass R.E., Tsay R.S. (1987) Fitting nonlinear models with ARMA errors to biological rhythm data. Statistics in Medicine 6(2), 167–183
Hall P., Heyde C.C. (2014) Martingale limit theory and its application, New York: Academic Press Inc
Hall P., Keilegom I. V. (2003) Using difference-based methods for inference in nonparametric regression with time series errors. Journal of the Royal Statistical Society. Series B, 65(2), 443–456
Hart J.D. (1994) Automated kernel smoothing of dependent data by using time series cross-validation. Journal of the Royal Statistical Society, Series B, 56(3), 529–542
Hart J.D., Wehrly T.E. (1986) Kernel regression estimation using repeated measurements data. Journal of the American Statistical Association, 81(396), 1080–1088
Hastie T.J., Tibshirani R.J. (1990) Generalized additive models, Boca Raton: Routledge
Huang J.Z. (2003) Local asymptotics for polynomial spline regression. The Annals of Statistics, 31(5), 1600–1635
Hyndman R.J., Koehler A.B., Ord J.K., Snyder R.D. (2008) Forecasting with exponential smoothing: the state space approach, Berlin: Springer-Verlag
Kohn R., Ansley C.F., Wong C.-M. (1992) Nonparametric spline regression with autoregressive moving average errors. Biometrika, 79(2), 335–346
Krivobokova T., Kauermann G. (2007) A note on penalized spline smoothing with correlated errors. Journal of the American Statistical Association, 102(480), 1328–1337
Lee Y.K., Mammen E., Park B.U. (2010) Bandwidth selection for kernel regression with correlated errors. Statistics, 44(4), 327–340
Liang H.-Y., Jing B.-Y. (2009) Asymptotic normality in partial linear models based on dependent errors. Journal of statistical planning and inference, 139(4), 1357–1371
Merlevède F., Peligrad M., Rio E. (2011) A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151(3–4), 435–474
Miaou S.-P. (1990) A stepwise time series regression procedure for water demand model identification. Water Resources Research, 26(9), 1887–1897
Mokkadem A. (1988) Mixing properties of ARMA processes. Stochastic Processes and Their Applications, 29(2), 309–315
Opsomer J., Wang Y., Yang Y. (2001) Nonparametric regression with correlated errors. Statistical Science, 16(2), 134–153
Petropoulos F., Apiletti F., Assimakopoulos V., Babai M.Z., Barrow D.K., Ben Taieb S., Ziel F. et al. (2022) Forecasting: Theory and practice. International Journal of Forecasting, 38(3), 705–871
Qiu D., Shao Q., Yang L. (2013) Efficient inference for autoregressive coefficients in the presence of trends. Journal of Multivariate Analysis, 114, 40–53
Roussas G.G., Tran L.T. (1992) Asymptotic normality of the recursive kernel regression estimate under dependence conditions. The Annals of Statistics 20(1), 98–120
Roussas G.G., Tran L.T., Ioannides D.A. (1992) Fixed design regression for time series: Asymptotic normality. Journal of Multivariate Analysis 40(2), 262–291
Serra, P., Krivobokova, T., and Rosales, F. (2018) Adaptive non-parametric estimation of mean and autocovariance in regression with dependent errors. arXiv preprintarXiv:1812.06948.
Shao Q., Yang L. (2011) Autoregressive coefficient estimation in nonparametric analysis. Journal of Time Series Analysis 32(2), 587–597
Shao Q., Yang L. (2017) Oracally efficient estimation and consistent model selection for auto-regressive moving average time series with trend. Journal of the Royal Statistical Society Series B 79(2), 507–524
Stone C.J. (1968) Optimal rates of convergence for nonparametric estimators. The Annals of Statistics 8(6), 1348–1360
Stone C.J. (1986) The dimensionality reduction principle for generalized additive models. The Annals of Statistics 14(2), 590–606
Straumann D., Mikosch T. (2006) Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The Annals of Statistics 34(5), 2449–2495
Tibshirani R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288
Tran L., Roussas G., Yakowitz S., Van B.T. (1996) Fixed-design regression for linear time series. The Annals of Statistics, 24(3), 975–991
Truong Y.K. (1991) Nonparametric curve estimation with time series errors. Journal of Statistical Planning and Inference, 28(2), 167–183
Truong-Van B., Bru N. (2001) Asymptotic normality of spline estimator when the errors are a linear stationary process. Journal of Nonparametric Statistics, 13(5), 741–761
Van de Geer S., Bühlmann P., Ritov Y., Dezeure R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics,42(3), 1166–1202
Volkonskii V., Rozanov Y.A. (1959) Some limit theorems for random functions. I. Theory of Probability & Its Applications, 4(2), 178–197
Wu R., Wang Q. (2012) Shrinkage estimation for linear regression with ARMA errors. Journal of Statistical Planning and Inference, 142(7), 2136–2148
Zhou S., Shen X., Wolfe D. (1998) Local asymptotics for regression splines and confidence regions. The Annals of Statistics, 26(5), 1760–1782
Zinde-Walsh V., Galbraith J.W. (1991) Estimation of a linear regression model with stationary ARMA(p, q) errors. Journal of Econometrics, 47(2–3), 333–357
Acknowledgements
The authors thank the two anonymous referees for their invaluable comments and suggestions that have significantly improved the quality of the paper. The natural gas scrape data were obtained by the second author through the research contract (Grant 5040224) between Applied Mathematics Laboratory of Towson University and Exelon Generation Company LLC. This work was partially supported by the National Institutes of Health grant R03AG067611 and R21AG070659, and the National Science Foundation grant DMS-1952486.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The online version of this article contains supplementary material.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 A. The main results and proofs
We present the main results in this section. The proofs for Theorem 1 and Theorem 2 are provided. The remaining proofs are all relegated to the supplementary materials.
As \(\mathcal {L}_{n}({\varvec{\xi }})=\sum _{t=1}^{n} \zeta _{t}^{2}({\varvec{\xi }})\) is not convex with respect to \({\varvec{\xi }}\), due to the MA component \({\varvec{\theta }}\), in order to study the asymptotic property of \({\hat{{\varvec{\xi }}}}\), we employ a second-order Taylor’s expansion of \(\zeta _{t}({\varvec{\xi }})\) around \({\varvec{\xi }}_{*}\) (Davis and Dunsmuir, 1997): \(\zeta _{t}({\varvec{\xi }})\approx \zeta _{t}({\varvec{\xi }}_{*})-\textbf{D}_{t}^{{\top }}({\varvec{\xi }}_{*})({\varvec{\xi }}-{\varvec{\xi }}_{*})-({\varvec{\xi }}-{\varvec{\xi }}_{*})^{{\top }}\textbf{H}_{t}({\varvec{\xi }}_{*})({\varvec{\xi }}-{\varvec{\xi }}_{*})/2\), where \(\textbf{D}_{t}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\xi }}\) and \(\textbf{H}_{t}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/(\partial {\varvec{\xi }}\partial {\varvec{\xi }}^{{\top }})\).
We decompose \(\textbf{D}_{t}({\varvec{\xi }})\) as \((\textbf{D}_{t1}({\varvec{\xi }}), \textbf{D}_{t2}({\varvec{\xi }}), \textbf{D}_{t3}({\varvec{\xi }}))^{{\top }}\), such that \(\textbf{D}_{t1}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\), \(\textbf{D}_{t2}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\), and \(\textbf{D}_{t3}({\varvec{\xi }})=-\partial \zeta _{t}({\varvec{\xi }})/\partial {\varvec{\theta }}\), and partition \(\textbf{H}_{t}({\varvec{\xi }})\) as follows:
where \(\textbf{H}_{t,11}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{{\top }}\) is a zero \(J\times J\) matrix, \(\textbf{H}_{t,12}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\phi }}^{{\top }}\) is a \(J\times p\) matrix, \(\textbf{H}_{t,13}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\beta }}\partial {\varvec{\theta }}^{{\top }}\) is a \(J\times q\) matrix, \(\textbf{H}_{t,21}({\varvec{\xi }})=\textbf{H}_{t,12}^{{\top }}({\varvec{\xi }})\), \(\textbf{H}_{t,22}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\partial {\varvec{\phi }}^{{\top }}\) is a zero \(p\times p\) matrices, \(\textbf{H}_{t,23}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\phi }}\partial {\varvec{\theta }}^{{\top }}\) is a \(p\times q\) matrix, \(\textbf{H}_{t,31}({\varvec{\xi }})=\textbf{H}_{t,13}^{{\top }}({\varvec{\xi }})\), \(\textbf{H}_{t,32}({\varvec{\xi }})=\textbf{H}_{t,23}^{{\top }}({\varvec{\xi }})\), and \(\textbf{H}_{t,33}({\varvec{\xi }})=-\partial ^{2}\zeta _{t}({\varvec{\xi }})/\partial {\varvec{\theta }}\partial {\varvec{\theta }}^{{\top }}\) is a \(q\times q\) matrix.
Let \([\textbf{A}]_{l}\) denote the \(l^{\text {th}}\) element of the vector \(\textbf{A}\). By simple algebra, we obtain that \(\textbf{D}_{t1}({\varvec{\xi }})={\varvec{\theta }}^{-1}(B){\varvec{\phi }}(B)\textbf{W}_{t}\), \(\left[ \textbf{D}_{t2}({\varvec{\xi }})\right] _{l}={\varvec{\theta }}^{-1}(B)\epsilon _{t-l}({\varvec{\beta }}), 1\le l\le p\), \(\left[ \textbf{D}_{t3}({\varvec{\xi }})\right] _{l}={\varvec{\theta }}^{-1}(B)\zeta _{t-l}({\varvec{\xi }})\), \(1\le l\le q\),
Furthermore, let \(\textbf{V}_{t}\) be a symmetric matrix of dimension \((J+p+q)\times (J+p+q)\), whose upper triangular elements are given as
We partition \(\textbf{V}_{t}\) as follows:
where \(\textbf{V}_{t,11}\) is a \(J\times J\) matrix, \(\textbf{V}_{t,12}\) is a \(J\times p\) matrix, \(\textbf{V}_{t,13}\) is a \(J\times q\) matrix, \(\textbf{V}_{t,22}\) is a \(p\times p\) matrix, \(\textbf{V}_{t,23}\) is a \(p\times q\) matrix, and \(\textbf{V}_{t,33}\) is a \(q\times q\) matrix. By the definition, \(\textbf{V}_{t,11}=\textbf{0}\) and \(\textbf{V}_{t,22}=\textbf{0}\).
In addition, let \(R_{t}=(g_{0}(X_{t})-{\varvec{\beta }}_{*}^{{\top }}\textbf{B}(X_t))1\{t>0\}=(\epsilon _{t}({\varvec{\beta }}_{*})-\epsilon _{t})1\{t>0\}\) be the spline approximation error at time t. In the following Proposition 1, we show that \(\textbf{D}_{t}({\varvec{\xi }}_{*})\) and \(\textbf{H}_{t}({\varvec{\xi }}_{*})\) are well approximated by \(\textbf{Q}_{t}\) and \(\textbf{V}_{t}\), respectively.
Proposition 1
Suppose Conditions (C1) – (C4) hold. There exists some constants \(\delta _1\) and \(\delta _2\), such that for all \(\Vert {\varvec{\beta }}-{\varvec{\beta }}_{*}\Vert \le \delta _{1}, \Vert ({\varvec{\phi }}^{{\top }},{\varvec{\theta }}^{{\top }})-({\varvec{\phi }}_{*}^{{\top }},{\varvec{\theta }}_{*}^{{\top }})\Vert \le \delta _{2}\),
-
(i)
\(\left| \zeta _{t}\right| \le \eta _{t}\), \(|\zeta _{t}({\varvec{\xi }}_{*})-{\varvec{\phi }}_{*}(B){\varvec{\theta }}_{*}^{-1}(B)R_{t}-\zeta _{t}|\le r^{t}\eta _{0}\), \(|\zeta ({\varvec{\xi }})|\le \eta _{t}+C_{2}(\varDelta +\delta _{1})\), and \(\left| \zeta _{t}({\varvec{\xi }})-\zeta _{t}({\varvec{\xi }}_{*})\right| \le C_{3}\delta _{2}\eta _{t}+C_{2}C_{3}\delta _{2}(\delta _{1}+\varDelta )+C_{2}\delta _{1}\),
-
(ii)
\(\left\| \textbf{D}_{t}({\varvec{\xi }}) \right\| _{\infty }\le \omega _{t}\), \(\textbf{D}_{t1}({\varvec{\xi }}_{*})-\textbf{Q}_{t1}=\textbf{0}\), and \(\left\| \left( \textbf{D}_{t2}^{{\top }}({\varvec{\xi }}_{*}),\textbf{D}_{t3}^{{\top }}({\varvec{\xi }}_{*})\right) -(\textbf{Q}_{t2}^{{\top }},\textbf{Q}_{t3}^{{\top }}) \right\| _{\infty }\le r^{t}\eta _{0}+C_{2}\varDelta\),
-
(iii)
\(\left\| \textbf{H}_{t}({\varvec{\xi }})\right\| _{\max }\le \omega _{t}\), \(\textbf{H}_{t,11}({\varvec{\xi }}_{*})-\textbf{V}_{t,11}=\textbf{0}\), and \(\left\| \textbf{H}_{t}({\varvec{\xi }}_{*})-\textbf{V}_{t}\right\| _{\max } \le r^{t}\eta _{0}+C_{2}\varDelta\),
where \(\eta _{t}=C_{1}\sum _{j=0}^{\infty }r^{j}\left| \epsilon _{t-j}\right|\), \(\omega _{t}= \max \left\{ C_{2}, r^{-(p+q)}\eta _{t}+C_{2}\left( \varDelta +\delta _{1}\right) \right\}\), and \(C_{3}\) is defined in Lemma 7.
Proposition 1 indicates that \(\textbf{D}_{t}({\varvec{\xi }}_{*})\) and \(\textbf{H}_{t}({\varvec{\xi }}_{*})\) can be approximated by \(\textbf{Q}_{t}\) and \(\textbf{V}_{t}\), respectively. Moreover, if \({\varvec{\xi }}\) is sufficiently close to the true parameters \({\varvec{\xi }}_{*}\), \(\Vert \textbf{D}_{t}({\varvec{\xi }})\Vert _{\infty }\) and \(\Vert \textbf{H}_{t}({\varvec{\xi }})\Vert _{\max }\) are bounded and the difference between \(\zeta _{t}({\varvec{\xi }})\) and \(\zeta _{t}({\varvec{\xi }}_{*})\) is well bounded, too.
To circumvent the non-convexity of \(T(\textbf{h})\) with respect to \(\textbf{h}\), we study a convex objective function
To facilitate the investigation of the property of \(T_{1}(\textbf{h})\), two extra terms, \(T_{2}(\textbf{h})\) and \(T_{3}(\textbf{h})\), are introduced for the theoretical development
which are to be investigated in the Lemmas 4–6 to bridge the gap between \(T_{1}(\textbf{h})\) and \(T(\textbf{h})\). It is noteworthy that, as these terms involve unknown quantities, such as \(\textbf{Q}_{t}\) and \(R_{t}\), they cannot be computed in practice.
In light of Lemmas 4–6, we first establish that \(T_{1}(\textbf{h})\) is an excellent approximation of \(T(\textbf{h})\). Define \(\varOmega (C):=\{\textbf{h}: \Vert \textbf{h}_{1}\Vert \le CJn^{-1/2}, \Vert \left( \textbf{h}_{2}^{{\top }},\textbf{h}_{3}^{{\top }} \right) \Vert \le CJ^{1/2}n^{-1/2} \}\) for any \(C>0\). We use \({\bar{\varOmega }}(C)\) and \(\varOmega ^{c}(C)\) to denote the boundary and the complement of \(\varOmega (C)\), respectively.
Proposition 2
Suppose Conditions (C1)–(C4) hold. If \(J=n^{1/(2\alpha +1)}\), for any \(C>0\),
Proposition 2 is inspired by Davis and Dunsmuir (1997). It demonstrates that \(T(\textbf{h})\) can be well approximated by \(T_{1}(\textbf{h})\) locally. Therefore, we can study the properties of the minimizer of \(T_{1}(\textbf{h})\) and infer the properties of the minimizer of \(T(\textbf{h})\). We refer to Davis and Dunsmuir (1997) for a detailed discussion. We next show that \(T_{1}(\textbf{h})\) achieves its minimum in a ball round 0 in the following proposition.
Proposition 3
Under the same conditions as in Proposition 2, given any \(0<\varepsilon <1\), there exists some \(C_{\varepsilon }>0\), such that
Propositions 2 and 3 together enable us to establish the consistency of \({\hat{\textbf{h}}}\) and subsequently \({\hat{{\varvec{\xi }}}}\). Hence, the proofs for Theorem 1 and Theorem 2 are in order.
Proof of Theorem 1:
By Proposition 3, given any \(0<\varepsilon <1\), there exists some \(C_{\varepsilon }\), such that
Under the event \(\{\inf _{\textbf{h}\in {\bar{\varOmega }}(C_{\varepsilon })\bigcup \varOmega ^{c}(C_{\varepsilon })} T_{1}(\textbf{h})>1\}\), we claim that there exists a local minimizer of \(T(\textbf{h})\), \({\widehat{\textbf{h}}}\), which satisfies \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) but \({\widehat{\textbf{h}}}\notin {\bar{\varOmega }}(C_{\varepsilon })\). Suppose the claim is not true. We can find a \(\textbf{h}_{a} \in {\bar{\varOmega }}(C_{\varepsilon })\), such that \(T(\textbf{h}_{a})= \min _{\textbf{h}\in \varOmega (C_{\varepsilon })}T(\textbf{h}).\)
By Proposition 2, for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{1}(\textbf{h})-T(\textbf{h})\right| \rightarrow _{p} 0.\) Choose C as \(C_{\varepsilon }\). Then \(0\ge T(\textbf{h}_{a})-T(\textbf{0})\rightarrow _{p}T_{1}(\textbf{h}_{a})-T_{1}(\textbf{0})= T_{1}(\textbf{h}_{a})>1\). Contradiction! Therefore, for any \(0<\varepsilon <1\), there exists \(C_{\varepsilon }\), such that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) with probability at least \(1-\varepsilon.\)
Given any \(\textbf{h}\in \varOmega (C_{\varepsilon })\), \(E\left[ \textbf{h}_{1}^{{\top }}\textbf{W}_{t}\textbf{W}_{t}^{{\top }}\textbf{h}_{1} \right] \le \lambda _{\max }J^{-1}\left( C_{\varepsilon }^2J^{2}n^{-1}\right) =\lambda _{\max }C_{\epsilon }^{2}Jn^{-1}\). Noting that \({\hat{{\varvec{\xi }}}}={\varvec{\xi }}_{*}+{\widehat{\textbf{h}}}\), with probability at least \(1-\varepsilon\),
Thus, \(E\left[ \big ({\hat{g}}(X_{t})-g_{0}(X_{t})\big )^{2}\right] =O_{p}(Jn^{-1}+J^{-2\alpha })=O_{p}\left( n^{-2\alpha /(2\alpha +1)}\right)\). This completes the proof of Theorem 1. \(\square\)
Proof of Theorem 2:
In the proof of Theorem 1, we have shown that for any \(0<\varepsilon <1\), there exists \(C_{\varepsilon }\), such that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\) with probability at least \(1-\varepsilon\). Thus, we restrict our attention to the event that \({\widehat{\textbf{h}}}\in \varOmega (C_{\varepsilon })\).
We consider that \(S(\textbf{b}_{2},\textbf{b}_{3}):= T_{1}(({\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{b}_{2}^{{\top }}/\sqrt{n},\textbf{b}_{3}^{{\top }}/\sqrt{n})^{{\top }})- T_{1}(({\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }})^{{\top }})\). It is easily seen that
By Lemma 3, we obtain that
over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).
According to Condition (C3), \(\{\zeta _{t}\}\) and \(\{X_{t}\}\) are independent. Hence, \(\{R_{t}\}\) and \(\{(\textbf{Q}_{t2}, \textbf{Q}_{t3})\}\) are independent. As \(|R_{t}|\le \varDelta \le C_{0}J^{-\alpha }\) and hence \(|{\varvec{\phi }}_{*}(B){\varvec{\theta }}_{*}^{-1}(B)R_{t}|\le C_{0}C_{2}J^{-\alpha }\rightarrow 0\), by the same arguments as used for Lemma 3, we can show that
The independence between \(\{\zeta _{t}\}\) and \(\{X_{t}\}\) again implies the independence between \(\textbf{Q}_{t1}\) and \((\textbf{Q}_{t2},\textbf{Q}_{t3})\), \(\textbf{b}_{1}^{{\top }}\textbf{Q}_{t1}\). Thus, \(E\left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\left( \textbf{b}_{2}^{{\top }}\textbf{Q}_{t2}+ \textbf{b}_{3}^{{\top }}\textbf{Q}_{t3} \right) \right] =0\), as \(E\left[ \textbf{Q}_{t2}\right] =E\left[ \textbf{Q}_{t3}\right] =0\). Noting that \(\Vert {\widehat{\textbf{h}}}_{1}\Vert \le C_{\varepsilon }Jn^{-1/2}\), it follows from Lemma 2 that
Combining (4), (5), and (6) together yield that
over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).
Following from Lemmas 4–6, we have uniformly over \(\Vert (\textbf{b}_{2}^{{\top }},\textbf{b}_{3}^{{\top }})\Vert \le C\) for any \(C>0\).
Noting that \(N(0, \sigma ^2{\varvec{\varSigma }}^{-1})\) is the minimizer of the random process which \(S(\textbf{b}_{2},\textbf{b}_{3})\) converges to, by Lemma 2.2 and Remark 1 in Davis et al. (1992), there exists \(\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{2}^{{\top }}\right)\), a local minimizer of \(T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{b}_{2}^{{\top }}/\sqrt{n}, \textbf{b}_{3}^{{\top }}/\sqrt{n}\right) ^{{\top }}\right) -T\left( \left( {\widehat{\textbf{h}}}_{1}^{{\top }}, \textbf{0}^{{\top }}, \textbf{0}^{{\top }}\right) ^{{\top }}\right)\), such that \(\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{3}^{{\top }}\right) ^{{\top }}\rightarrow _{d} N(0, \sigma ^2{\varvec{\varSigma }}^{-1})\).
Since \({\widehat{\textbf{h}}}\) is the minimizer of \(T(\textbf{h})\), \(\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{2}^{{\top }}\right)\) must also be the minimizer of
We thus have \(\sqrt{n}\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{2}^{{\top }}\right) =\left( {\widehat{\textbf{b}}}_{2}^{{\top }},{\widehat{\textbf{b}}}_{2}^{{\top }}\right)\) and \(\sqrt{n}\left( {\widehat{\textbf{h}}}_{2}^{{\top }},{\widehat{\textbf{h}}}_{3}^{{\top }}\right) ^{{\top }}\rightarrow _{d} N\left( 0, \sigma ^2{\varvec{\varSigma }}^{-1}\right)\). This completes the proof of Theorem 2. \(\square\)
1.2 B. Preliminary proposition and lemmas
Next, we present the technical proposition and lemmas that are used in the proofs of our theorems and corollaries. The proofs of the proposition and lemmas are relegated to supplementary materials.
Proposition 4
If Condition (C4) is satisfied,
where \(\delta _{2}\) is chosen as in Proposition 1.
Lemma 1
Suppose Condition (C3) holds. Then
-
(i)
\(P\left( |\zeta _{t}|>v\right) \le 2\exp \left( \frac{-v^{2}}{2(C_{B}^{2}+C_{B}v)}\right)\) and
-
(ii)
\(E[\left| \sum _{i=0}^{\infty }a_{i}\zeta _{t-i}\right| ^{k} ]\le \left( \sum _{i=0}^{\infty }|a_{i}|\right) ^{k}k!C_{B}^{k}/2\), for any a sequence \(\{a_{t}, t\ge 0\}\) and \(k\ge 1\).
Lemma 2
Suppose Conditions (C1)–(C4) hold. There exists some constant \(C_{4}>0\) that does not depend on n, such that if \(J=O(n^{1/(2\alpha +1)})\),
-
(i)
$$\begin{aligned} P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1} \left| \mathbb {G}_{n} \left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\zeta _{t}^{2}\right] \right| >7C_{2}\sqrt{C_{4}J\log n} \right) \le 2\exp (-6J\log n). \end{aligned}$$
-
(ii)
$$\begin{aligned} \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \textbf{h}_{1}\ne 0} \left| \left( \sigma ^{2}E\left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\right] \right) ^{-1} \mathbb {E}_{n}\left[ \left( \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\right) ^{2}\zeta _{t}^{2}\right] \right| =1+o_{p}(1). \end{aligned}$$
-
(iii)
$$\begin{aligned}&P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \Vert \textbf{h}_{2}\Vert \le 1} n^{-1/2}\left| \mathbb {G}_{n} \left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\textbf{Q}_{t2}^{{\top }}\textbf{h}_{2}\right] \right|>7p^{1/2}\sqrt{C_{4}Jn^{-1}\log n} \right) \\ \le&2p\exp (-6J\log n).\\&P\left( \sup _{\Vert \textbf{h}_{1}\Vert \le 1, \Vert \textbf{h}_{3}\Vert \le 1} n^{-1/2}\left| \mathbb {G}_{n} \left[ \textbf{h}_{1}^{{\top }}\textbf{Q}_{t1}\textbf{Q}_{t3}^{{\top }}\textbf{h}_{3}\right] \right| >7q^{1/2}\sqrt{C_{4}Jn^{-1}\log n} \right) \\ \le&2q\exp (-6J\log n). \end{aligned}$$
Lemma 3
Suppose Conditions (C1)–(C4) hold. Then,
-
(i)
\(\sup _{\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}) \Vert \le 1} \left| \mathbb {E}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})^{2}\zeta _{t}^{2}\right] - \sigma ^{2}\left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) {\varvec{\varSigma }}\left( \textbf{h}_{2}^{{\top }},\textbf{h}_{3}^{{\top }}\right) ^{{\top }} \right| \rightarrow _{a.s.}0\),
-
(ii)
\(\mathbb {G}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})\zeta _{t}\right] \rightarrow _{d} \left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }})\), given any \((\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\) such that \(\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\Vert \le C\), for any \(C>0\).
-
(iii)
\(\mathbb {G}_{n}\left[ (\textbf{h}_{2}^{{\top }}\textbf{Q}_{t2}+\textbf{h}_{3}^{{\top }}\textbf{Q}_{t3})\zeta _{t}\right] \rightarrow _{d} \left( \textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }}\right) N(0,\sigma ^2{\varvec{\varSigma }})\) on \(\Vert (\textbf{h}_{2}^{{\top }}, \textbf{h}_{3}^{{\top }})\Vert \le C\), for any \(C>0\).
Lemmas 4–6 follow from the steps in Davis and Dunsmuir (1997) and Brockwell and Davis (1991).
According to Proposition 1, \(\left| \zeta _{t}\right| \le \eta _{t}\), \(\Vert \textbf{Q}_{t}\Vert _{\infty }\le \Vert \textbf{Q}_{t}-\textbf{D}_{t}({\varvec{\xi }}_{*})\Vert _{\infty }+\Vert \textbf{D}_{t}({\varvec{\xi }}_{*})\Vert _{\infty }\le r^{t}\eta _{0}+C_{2}\varDelta +\omega _{t}=: \chi _t\), and similarly \(\Vert \textbf{V}_{t}\Vert _{\max }\le \chi _t\). Thus,
Lemma 4
Suppose Conditions (C1) – (C4) hold. If \(J^{2}\log n=o(n^{1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{1}(\textbf{h})-T_{2}(\textbf{h})\right| \rightarrow _{p} 0\).
Lemma 5
Suppose Conditions (C1) – (C4) hold. If \(J^{-2\alpha +1/2}=o(n^{-1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{2}(\textbf{h})-T_{3}(\textbf{h})\right| \rightarrow _{p} 0\).
Lemma 6
Suppose Conditions (C1) – (C4) hold. If \(J^{2}\log n=o(n^{1/2})\), then for any \(C>0\), \(\sup _{\textbf{h}\in \varOmega (C)}\left| T_{3}(\textbf{h})-T(\textbf{h})\right| \rightarrow _{p} 0\).
Lemma 7
Under the same conditions as in Proposition 1, for any sequence \(\{a_{t}\}, t\ge 1\), there exists some constant \(C_{3}\) such that
where \(\delta _{2}\) and r are defined in Proposition 1.
About this article
Cite this article
Zheng, Q., Cui, Y. & Wu, R. On estimation of nonparametric regression models with autoregressive and moving average errors. Ann Inst Stat Math 76, 235–262 (2024). https://doi.org/10.1007/s10463-023-00882-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-023-00882-6