Skip to main content
Log in

Additive models with autoregressive symmetric errors based on penalized regression splines

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this paper additive models with p-order autoregressive conditional symmetric errors based on penalized regression splines are proposed for modeling trend and seasonality in time series. The aim with this kind of approach is try to model the autocorrelation and seasonality properly to assess the existence of a significant trend. A backfitting iterative process jointly with a quasi-Newton algorithm are developed for estimating the additive components, the dispersion parameter and the autocorrelation coefficients. The effective degrees of freedom concerning the fitting are derived from an appropriate smoother. Inferential results and selection model procedures are proposed as well as some diagnostic methods, such as residual analysis based on the conditional quantile residual and sensitivity studies based on the local influence approach. Simulations studies are performed to assess the large sample behavior of the maximum penalized likelihood estimators. Finally, the methodology is applied for modeling the daily average temperature of San Francisco city from January 1995 to April 2020.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://academic.udayton.edu/kissock/http/Weather/default.htm.

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) International symposium on information theory. Akademiai Kiado Budapest, Hungary, pp 267–281

    Google Scholar 

  • Barros M, Paula GA (2019) Discussion of Birnbaum-Saunders distributions: a review of models, analysis and applications. Appl Stoch Models Bus Ind 35:96–99

    Article  MathSciNet  Google Scholar 

  • Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208

    Article  MathSciNet  Google Scholar 

  • Cao CZ, Lin JG, Zhu LX (2010) Heteroscedasticity and/or autocorrelation diagnostics in nonlinear models with AR(1) and symmetrical errors. Stat Pap 51:813–836

    Article  MathSciNet  Google Scholar 

  • Cook RD (1986) Assessment of local influence. J R Stat Soc B 48:133–169

    MathSciNet  MATH  Google Scholar 

  • Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman and Hall, London

    MATH  Google Scholar 

  • Cleveland WS, McRae JE, Terpenning I (1990) STL: a seasonal-trend decomposition. J Off Stat 6:3–73

    Google Scholar 

  • Cysneiros FJA, Paula GA (2005) Restricted methods in symmetrical linear regression models. Comput Stat Data Anal 49:689–708

    Article  MathSciNet  Google Scholar 

  • Davidon WC (1991) Variable metric method for minimization. SIAM J Optim 1:1–17

    Article  MathSciNet  Google Scholar 

  • Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput and Graph Stat 5:236–244

    Google Scholar 

  • Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information. Biometrika 65:457–487

    Article  MathSciNet  Google Scholar 

  • Eilers PH, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11:89–102

    Article  MathSciNet  Google Scholar 

  • Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, London

    Book  Google Scholar 

  • Fox J (2015) Applied regression analysis and generalized linear models, 3rd edn. Sage Publications, London

    Google Scholar 

  • Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall/CRC, London

    Book  Google Scholar 

  • Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall/CRC, London

    MATH  Google Scholar 

  • Huang L, Jiang H, Wang H (2019) A novel partial-linear single-index model for time series data. Comput Stat Data Anal 134:110–122

    Article  MathSciNet  Google Scholar 

  • Huang L, Xia Y, Qin X (2016) Estimation of semivarying coefficient time series models with ARMA errors. Ann Stat 44:1618–1660

    Article  MathSciNet  Google Scholar 

  • Ibacache-Pulgar G, Paula GA, Cysneiros FJA (2013) Semiparametric additive models under symmetric distributions. TEST 22:103–121

    Article  MathSciNet  Google Scholar 

  • Judge GG, Griffiths WE, Hill RC, Lutkepohl H, Lee TC (1985) The theory and practice of econometrics, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Kissock JK (1999) UD EPA Average Daily Temperature Archive, http://academic.udayton.edu/kissock/http/Weather/default.htm. Accessed 20 Feb 2021

  • Lancaster P, Salkauskas K (1986) An introduction curve and surface fitting. Academic Press, London

    MATH  Google Scholar 

  • Lee SY, Xu L (2004) Influence analyses of nonlinear mixed-effects models. Comput Stat Data Anal 45:321–341

    Article  MathSciNet  Google Scholar 

  • Liu JM, Chen R, Yao Q (2010) Nonparametric transfer function models. J Econom 157:151–164

    Article  MathSciNet  Google Scholar 

  • Liu S (2004) On diagnostics in conditionally heteroscedastic time series models under elliptical distributions. J Appl Probab 41A:393–405

    Article  MathSciNet  Google Scholar 

  • Lucas A (1997) Robustness of the student-t based M-estimator. Commun Stat Theory Methods 26:1165–1182

    Article  MathSciNet  Google Scholar 

  • Mittelhammer RC, Judge GG, Miller DJ (2000) Econometric foundations. Cambridge University Press, New York

    MATH  Google Scholar 

  • Paula GA, Medeiros MJ, Vilca-Labra FE (2009) Influence diagnostics for linear models with first-order autoregressive elliptical errors. Stat Probab Lett 79:339–346

    Article  MathSciNet  Google Scholar 

  • Poon WY, Poon YS (1999) Conformal normal curvature and assessment of local influence. J R Stat Soc B 61:51–61

    Article  MathSciNet  Google Scholar 

  • R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.Rproject.org. Accessed 10 Jan 2021

  • Relvas CEM, Paula GA (2016) Partially linear models with first-order autoregressive symmetric errors. Stat Pap 57:795–825

    Article  MathSciNet  Google Scholar 

  • Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  Google Scholar 

  • Vanegas LH, Paula GA (2016) An extension of log-symmetric regression models: R codes and applications. J Stat Comp Simul 86:1709–1735

    Article  MathSciNet  Google Scholar 

  • Wise J (1955) The autocorrelation function and the spectral density function. Biometrika 42:151–159

    Article  MathSciNet  Google Scholar 

  • Wood SN (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, London

    Book  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Associate Editor and reviewers for their helpful comments. This study was partially supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and CNPq, Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilberto A. Paula.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Penalized score function

Let \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) denote the penalized log-likelihood function for the parameter vector \({\varvec{\theta }}=({\varvec{\gamma }}_T^\top ,{\varvec{\gamma }}_S^\top ,\phi ,\rho _1, \ldots , \rho _p)^\top \). One has that

$$\begin{aligned} \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})=-\frac{n}{2}\log (\phi ) + \sum _{i=1}^{n}\log \{g(\delta _i)\} - \frac{\lambda _T}{2}{\varvec{\gamma }}_T^\top \mathbf{M}_T {\varvec{\gamma }}_T - \frac{\lambda _S}{2}{\varvec{\gamma }}_S^\top \mathbf{M}_S {\varvec{\gamma }}_S, \end{aligned}$$

where \(\delta _i=\frac{(\epsilon _i-\rho _1\epsilon _{i-1}-\ldots -\rho _p\epsilon _{i-p})^2}{\phi }\), \(\epsilon _i=y_i-\mathbf{n}_T(t_i)^\top {\varvec{\gamma }}_T-\mathbf{n}_S(s_i)^\top {\varvec{\gamma }}_S\), for \(i=1,\ldots ,n\).

The penalized score functions for \(\phi \) and \({\varvec{\rho }}\) are, respectively, given by

$$\begin{aligned} \text {U}_p^\phi = \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \phi }= -\frac{1}{2\phi }\left\{ n+\sum _{i=1}^{n}2W_g(\delta _i)\delta _i\right\} \end{aligned}$$

and

$$\begin{aligned} \text {U}_p^{\rho _j} \!=\! \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho _j}\!=\! \!-\! \frac{2}{\phi }\sum _{i\!=\!1}^{n}W_g(\delta _i)(\epsilon _i\!-\!\rho _1\epsilon _{i\!-\!1}\!-\!\ldots \!-\!\rho _p\epsilon _{i\!-\!p}) (\epsilon _{i\!-\!j}),\ \ (j\!=\!1,\ldots ,p). \end{aligned}$$

In addition, the derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \(\gamma _{T_j}\) and \(\gamma _{S_l}\) yield

$$\begin{aligned} \text {U}_p^{\gamma _{T_j}}= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{T_ j}}\\\!=\! & {} \! -\!\frac{2}{\phi } \sum _{i\!=\!1}^{n}W_g(\delta _i)(\epsilon _i\!-\!\rho _1\epsilon _{i\!-\!1}\!-\!\cdots \!-\!\rho _p\epsilon _{i\!-\!p})(n_{T_{ij}} \!-\!\rho _1 n_{T_{(i\!-\!1)j}}\!-\!\cdots \!-\!\rho _p n_{T_{(i\!-\!p)j}})\\&-\lambda _T[\mathbf{M}_T{\varvec{\gamma }}_T]_j, \ \ (j=1,\ldots ,r_T) \end{aligned}$$

and

$$\begin{aligned} \text {U}_p^{\gamma _{S_l}}= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{S_l}}\\\!=\! & {} \!-\!\frac{2}{\phi } \sum _{i\!=\!1}^{n}W_g(\delta _i)(\epsilon _i\!-\!\rho _1\epsilon _{i\!-\!1}\!-\!\cdots \!-\!\rho _p\epsilon _{i\!-\!p})(n_{S_{il}} \!-\!\rho _1 n_{S_{(i\!-\!1)l}}\!-\!\cdots \!-\!\rho _p n_{S_{(i\!-\!p)l}})\\&-\lambda _S[\mathbf{M}_S{\varvec{\gamma }}_S]_l,\ \ (l=1,\ldots ,r_S-1), \end{aligned}$$

where \([\mathbf{M}_T{\varvec{\gamma }}_T]_j\) and \([\mathbf{M}_S{\varvec{\gamma }}_S]_l\) denote the jth and lth positions of the vectors \(\mathbf{M}_T{\varvec{\gamma }}_T\) and \(\mathbf{M}_S{\varvec{\gamma }}_S\), respectively.

In matrix notation we obtain

$$\begin{aligned} \mathbf{U}_p^{\gamma _T}= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial {\varvec{\gamma }}_T} = \frac{1}{\phi }(\mathbf{A}\mathbf{N}_T)^\top \mathbf{D}_v\mathbf{A}{\varvec{\epsilon }}-{\varvec{\lambda }}_T\mathbf{M}_T{\varvec{\gamma }}_T,\\ \mathbf{U}_p^{\gamma _S}= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial {\varvec{\gamma }}_S} = \frac{1}{\phi }(\mathbf{A}\mathbf{N}_S)^\top \mathbf{D}_v\mathbf{A}{\varvec{\epsilon }}-{\varvec{\lambda }}_S\mathbf{M}_S{\varvec{\gamma }}_S,\\ \text {U}_p^{\phi }= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \phi } = \frac{1}{2\phi }{} \mathbf{1} _n^\top \left( \mathbf{D}_m\mathbf{1} _n-\mathbf{1} _n\right) \ \ \text {and} \\ \text {U}_p^{\rho _j}= & {} \frac{\partial \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho } = -\frac{1}{\phi }(\mathbf{C}_j{\varvec{\epsilon }})^\top \mathbf{D}_v\mathbf{A}{\varvec{\epsilon }},\ \ (j=1,\ldots ,p). \end{aligned}$$

where the quantities \(\mathbf{N}_T,\mathbf{N}_S,{\varvec{\epsilon }},\mathbf{D}_v,\mathbf{D}_m,\mathbf{A}\) and \(\mathbf{C}_j\) were defined in Sect. 4.

Appendix B: Penalized Hessian matrix

For simplicity of notation we will consider \(n_{T_{ij}}=n_{T_j}(t_i)\) and \(n_{S_{il}}=n_{S_l}(t_i)\), \((i=1,\ldots ,n)\), \((j=1,\ldots ,r_T)\) and \((l=1,\ldots ,r_S-1)\).

Consider the parameters \(({\gamma _{T_j}},{\gamma _{T_h}})\) for which we obtain the derivatives

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _{T_j}\gamma _{T_h}}= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{T_j}\partial \gamma _{T_h}^\top }\\\!=\! & {} \frac{4}{\phi }\sum _{i\!=\!1}^{n}W'_g(\delta _i)\delta _i(n_{T_{ij}}\!-\!\rho _1 n_{T_{(i\!-\!1)j}}\!-\!\cdots \!-\!\rho _p n_{T_{(i\!-\!p)j}})(n_{T_{ih}}\!-\! \rho _1 n_{T_{(i\!-\!1)h}}\!-\!\cdots \\&-\rho _p n_{T_{(i-p)h}}) +\frac{2}{\phi }\sum _{i=1}^{n}W_g(\delta _i)(n_{T_{ij}}-\rho _1 n_{T_{(i-1)j}}-\cdots -\rho _p n_{T_{(i-p)j}})\times \\&\times (n_{T_{ih}}\!-\!\rho _1 n_{T_{(i\!-\!1)h}}\!-\!\cdots \!-\!\rho _p n_{T_{(i\!-\!p)h}})\!-\!\lambda _T\left[ \mathbf{M} _T\right] _{jh}, \ \ (j,h=1,\ldots ,r_T). \end{aligned}$$

In matrix notation, we obtain

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _T\gamma _T}= \frac{\partial ^2\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _T \partial \gamma _T^\top }= \frac{1}{\phi }\left\{ (\mathbf{A}\mathbf{N}_T)^\top (-\mathbf{D}_v+4\mathbf{D}_{d})(\mathbf{A}\mathbf{N}_T)\right\} -\lambda _T\mathbf{M}_T. \end{aligned}$$

Similarly, for the parameter vector \({\varvec{\gamma }}_S\) one has

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _S\gamma _S}= \frac{\partial ^2\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _S \partial \gamma _S^\top }= \frac{1}{\phi }\left\{ (\mathbf{A}\mathbf{N}_S)^\top (-\mathbf{D}_v+4\mathbf{D}_d)(\mathbf{A}\mathbf{N}_S)\right\} -\lambda _S\mathbf{M}_S. \end{aligned}$$

The second derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \(\phi \) and \(\rho _j\) yield

$$\begin{aligned} \ddot{\text {L}}_p^{\phi \phi }= & {} \frac{\partial ^2\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \phi ^2}= \frac{n}{2\phi ^2}+\frac{2}{\phi ^2}\sum _{i=1}^{n}W_g(\delta _i)\delta _i+\frac{1}{\phi ^2}\sum _{i=1}^{n}W'_g({\delta _i})\delta _i^2\\= & {} \frac{1}{\phi ^2}\left\{ \frac{n}{2}+{\varvec{\delta }}^\top \mathbf{D}_c{\varvec{\delta }}-{\varvec{\delta }}^\top \mathbf{D}_{v}{} \mathbf{1} _n\right\} \end{aligned}$$

and

$$\begin{aligned} \ddot{\text {L}}_p^{\rho _{j}\rho _{j}}= & {} \frac{\partial ^2\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho _j^2}= \frac{4}{\phi }\sum _{i=1}^{n}W'_g(\delta _i)\delta _i\epsilon _{i-j}^2+\frac{2}{\phi }\sum _{i=1}^{n}W_g(\delta _i)\epsilon _{i-j}^2\\= & {} \frac{1}{\phi }\left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top \left( -\mathbf{D}_v+4\mathbf{D}_d\right) (\mathbf{C}_j{\varvec{\epsilon }}) \right\} , \ \ (j=1,\ldots , p), \end{aligned}$$

where \(\mathbf{D}_c=\text {diag}\left\{ c_1,\ldots ,c_n\right\} \) with \(c_i=W'_g(\delta _i)\) and \(\mathbf{D}_d=\text {diag}\left\{ d_1,\ldots ,d_n\right\} \) with \(d_i=W'_g(\delta _i)\delta _i\).

The derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \((\gamma _{T_j},\gamma _{S_l})\) yield

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _{T_j}\gamma _{S_l}}= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{T_j}\partial \gamma _{S_l}^\top }\\\!\!=\!\! & {} \frac{1}{\phi }\sum _{i\!=\!1}^{n}(n_{T_{ij}}\!-\!\rho _1 n_{T_{(i\!-\!1)j}}\!-\!\cdots \!-\!\rho _p n_{T_{(i\!-\!p)j}}) (n_{S_{il}}\!-\!\rho _1 n_{S_{(i\!-\!1)l}}\!-\!\cdots \!-\!\rho _p n_{S_{(i\!-\!p)l}}) \times \\&\times \left\{ 4W'_g(\delta _i)\delta _i+ 2W_g(\delta _i)\right\} , (j=1,\ldots ,r_T) \quad \text {and} \quad (l=1,\ldots ,r_S-1). \end{aligned}$$

In matrix notation, we obtain

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _T\gamma _S}= \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial {\varvec{\gamma }}_T \partial {\varvec{\gamma }}_S^\top }= \frac{1}{\phi }\left\{ (\mathbf{A}\mathbf{N}_T)^\top (4\mathbf{D}_d-\mathbf{D}_v)(\mathbf{A}\mathbf{N}_S)\right\} . \end{aligned}$$

For the derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \((\gamma _{T_j},\phi )\) and \((\gamma _{T_j},\rho _{j'})\), we obtain

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _{T_j}\phi }= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{T_j}\partial \phi }\\= & {} \phi ^{-2}\left\{ \sum _{i=1}^{n}(n_{T_{ij}}-\rho _1 n_{T_{(i-1)j}}-\cdots -\rho _p n_{T_{(i-p)j}}) (\epsilon _i-\rho _1\epsilon _{i-1}-\cdots -\rho _p\epsilon _{i-p})\right\} \\&\times \left\{ 2W_g(\delta _i)+2W'_g(\delta _i)\delta _i\right\} \ \ \text {and} \\ \ddot{\mathbf{L }}_p^{\gamma _{T_j}\rho _{j'}}= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \gamma _{T_j}\partial \rho _{j'}} \\= & {} \frac{1}{\phi }\left\{ \sum _{i=1}^{n}\left\{ 4W'_g(\delta _i)\delta _i+2W_g(\delta _i)\right\} (n_{T_{ij}}-\rho _1 n_{T_{(i-1)j}}-\cdots -\rho _p n_{T_{(i-p)j}})(\epsilon _{i-j'})\right\} \\&+\frac{1}{\phi }\left\{ 2\sum _{i=1}^{n}W_g(\delta _i)(\epsilon _i-\rho _1\epsilon _{i-1}-\cdots -\rho _p\epsilon _{i-p}) (n_{T_{(i-j')j}})\right\} ,\ \ (j'=1,\ldots ,p), \end{aligned}$$

which in matrix form may be expressed as

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _T\phi }= \frac{1}{\phi }\left\{ (\mathbf{A}\mathbf{N}_T)^\top (2\mathbf{D}_d-\mathbf{D}_v)(\mathbf{A}{\varvec{\epsilon }})\right\} \end{aligned}$$

and

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _T\rho _{j'}}= \frac{1}{\phi }\left\{ (\mathbf{C}_{j'}{\varvec{\epsilon }})^\top (\mathbf{D}_v-4\mathbf{D}_d)(\mathbf{A}\mathbf{N}_T) +(\mathbf{C}_{j'}\mathbf{N}_T)^\top \mathbf{D}_v(\mathbf{A}{\varvec{\epsilon }})\right\} . \end{aligned}$$

Similarly, the derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \(({\varvec{\gamma }}_S,\phi )\) and \(({\varvec{\gamma }}_S,\rho _{j'})\) yield

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _S\phi }= \frac{1}{\phi }\left\{ (\mathbf{A}\mathbf{N}_S)^\top (2\mathbf{D}_d-\mathbf{D}_v)(\mathbf{A}{\varvec{\epsilon }})\right\} \end{aligned}$$

and

$$\begin{aligned} \ddot{\mathbf{L }}_p^{\gamma _S\rho _{j'}}= \frac{1}{\phi }\left\{ (\mathbf{C}_{j'}{\varvec{\epsilon }})^\top (\mathbf{D}_{v}-4\mathbf{D}_d) (\mathbf{A}\mathbf{N}_S)+(\mathbf{C}_{j'}\mathbf{N}_S)^\top \mathbf{D}_v(\mathbf{A}{\varvec{\epsilon }})\right\} . \end{aligned}$$

Finally, for the derivatives of \(\text {L}_p({\varvec{\theta }},{\varvec{\lambda }})\) with respect to \((\phi , \rho _j)\) and \((\rho _j, \rho _{j'})\) we obtain

$$\begin{aligned} \ddot{\text {L}}_p^{\phi \rho _j}= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \phi \partial \rho _j}\\= & {} \frac{1}{\phi ^2}\left\{ \sum _{i=1}^{n} \epsilon _{i-j}\left\{ 2W'_g(\delta _i)\delta _i+2W_g(\delta _i)\right\} (\epsilon _i-\rho _1\epsilon _{i-1}-\cdots -\rho _p\epsilon _{i-p})\right\} \\= & {} \frac{1}{\phi ^2}\left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{D}_v-2\mathbf{D}_{d})(\mathbf{A}{\varvec{\epsilon }})\right\} , \ \ (j=1,\ldots , p) \ \ \text {and}\\ \ddot{\text {L}}_p^{\rho _j\rho _{j'}}= & {} \frac{\partial ^2 \text {L}_p({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho _j \partial \rho _{j'}}\\= & {} -\frac{2}{\phi }\sum _{i=1}{n}(-\epsilon _{i-j})(\epsilon _{i-j'})\left\{ 2W'_g(\delta _i)\delta _i+W_g(\delta _i)\right\} \\= & {} \frac{1}{\phi }\left\{ (\mathbf{C}_{j'}{\varvec{\epsilon }})^\top (4\mathbf{D}_{d}-\mathbf{D}_v)(\mathbf{C}_j{\varvec{\epsilon }})\right\} , \ \ (j\ne j'=1,\ldots ,p). \end{aligned}$$

Appendix C: Penalized Fisher information matrix

Similarly to Relvas and Paula (2016) the penalized Fisher information matrix will be derived from the regularity conditions applied in the regular log-likelihood function L\(({\varvec{\theta }})\), namely E\(\{\partial \text {L}({\varvec{\theta }})/\partial {\varvec{\theta }}\}\!=\!\mathbf{0}\) and E\(\{\partial ^2 \text {L}({\varvec{\theta }})/\partial {\varvec{\theta }}\partial {\varvec{\theta }}^\top \}\!=\!\)  −E\(\{\left[ \partial \text {L}({\varvec{\theta }})\!/\!\partial {\varvec{\theta }}\right] \left[ \partial \text {L}({\varvec{\theta }})\!/\!\partial {\varvec{\theta }}^\top \right] \}\), and the results \(f_g=\text {E}\left\{ W_g^2(z^2)z^4\right\} \) and \(d_g=\text {E}\left\{ W_g^2(z^2)z^2\right\} \) with \(z\sim S(0,1)\).

For the parameter \({\varvec{\gamma }}_T\) it follows that

$$\begin{aligned} \mathbf{K}_p^{\gamma _T\gamma _T}=-\text {E}\Bigg \{\frac{1}{\phi }(\mathbf{A}\mathbf{N}_T)^\top \left( -\mathbf{D}_v+4\mathbf{D}_d\right) (\mathbf{A}\mathbf{N}_T)-\lambda _T\mathbf{M}_T\Bigg \}. \end{aligned}$$

We may show that E\(\left( -\mathbf{D}_v+4\mathbf{D}_d\right) =-4d_g\) and consequently the penalized Fisher information matrix for \({\varvec{\gamma }}_T\) is

$$\begin{aligned} \mathbf{K}_p^{\gamma _T\gamma _T}= \frac{4d_g}{\phi }(\mathbf{A}\mathbf{N}_T)^\top (\mathbf{A}\mathbf{N}_T) +\lambda _T\mathbf{M}_T. \end{aligned}$$

Similarly, we obtain

$$\begin{aligned} \mathbf{K}_p^{\gamma _S\gamma _S}= \frac{4d_g}{\phi }(\mathbf{A}\mathbf{N}_S)^\top (\mathbf{A}\mathbf{N}_S) +\lambda _S\mathbf{M}_S. \end{aligned}$$

From the regularity condition E\(\left( \text {U}_{\theta }^\phi \right) =0\) we obtain E\(\{W_g(\delta _i)\delta _i\}=-\frac{1}{2}, \ (i=1,\ldots ,n)\). Then,

$$\begin{aligned} \text {E}\Bigg \{\left( \frac{\partial \text {L}_{p_i}({\varvec{\theta }},{\varvec{\lambda }})}{\partial \phi }\right) ^2\Bigg \}= & {} \text {E}\Bigg \{\left( \frac{1}{2\phi }+\frac{1}{\phi }W_g(\delta _i)\delta _i\right) ^2\Bigg \}\\= & {} \frac{1}{\phi ^2}\left( fg-\frac{1}{4}\right) , \ (i=1,\ldots ,n), \end{aligned}$$

where \(\text {L}_{p_i}({\varvec{\theta }},{\varvec{\lambda }})\) denotes the ith element of the penalized log-likelihood function. Therefore, we obtain \(\text {K}_p^{\phi \phi }=\frac{n}{4\phi ^2}(4f_g-1)\).

For the parameter \(\rho _j\) one has that

$$\begin{aligned} \left( \frac{\partial \text {L}_{p_i}({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho _j}\right) ^2 =\left( \frac{2}{\phi }W_g(\delta _i)(\epsilon _i-\rho _1\epsilon _{i-1}-\cdots -\rho _p\epsilon _{i-p})(\epsilon _{i-j})\right) ^2 =\frac{4}{\phi }W_g^2(\delta _i)\delta _i\epsilon _{i-j}^2, \end{aligned}$$

which implies

$$\begin{aligned} \text {E}\left( \frac{4}{\phi }W_g^2(\delta _i)\delta _i\epsilon _{i-1}^2\right) =\text {E}\Bigg [\text {E}\left\{ \frac{4}{\phi }(W_g^2(\delta _i)\delta _i\epsilon _{i-j}^2)\big | (y_{i-1},\ldots ,y_{i-p}) \right\} \Bigg ]=\text {E}\left( \frac{4}{\phi }d_g\epsilon _{i-j}^2\right) . \end{aligned}$$

For AR(1) errors, we may express (see, for instance, Judge, 1982) the error \(\epsilon _i\) as

$$\begin{aligned} \begin{aligned} \epsilon _i=&\,e_{i}+\rho _1e_{i-1}+\rho _1^2e_{i-2}+\cdots \\ =&\sum _{j=0}^{\infty }\rho _1^j e_{i-j}, \end{aligned} \end{aligned}$$

where \(j\rightarrow \infty \) means that the series has a past over time. Since \(|\rho _1|<1\) the process is stationary. One obtains

$$\begin{aligned} \text {E}(\epsilon _i)=\sum _{j=1}^{\infty }\rho _1^j\text {E}(e_{i-j})=0 \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \phi _\epsilon \equiv \text {Var}(\epsilon _i)=\text {E}(\epsilon _i^2)&= \text {E}\left\{ (e_i+\rho _1e_{i-1}+\rho _1^2e_{i-2}+\ldots )^2\right\} \\&= \text {E}\left( e_i^2+\rho _1^2e_{i-1}^2+\rho _1^4e_{i-2}+\ldots +\rho _1e_ie_{i-1}+\rho _1^2e_ie_{i-2}+\ldots \right) \\&=\text {E}(e_i^2)+\rho _1^2\text {E}(e_{i-1}^2)+\rho _1^4\text {E}(e_{i-2}^2)+\ldots \\&=\phi \xi (1+\rho _1^2+\rho _1^4+\ldots )\\&=\frac{\phi \xi }{(1-\rho _1^2)}. \end{aligned} \end{aligned}$$

Then, the Fisher information of \(\rho _1\) reduces to

$$\begin{aligned} \text {K}_p^{\rho _1\rho _1}= & {} \frac{4}{\phi }d_g\sum _{i=2}^{n}E(\epsilon _{i-1}^2)=\frac{4}{\phi }d_g\sum _{i=1}^{n-1}\text {E} (\epsilon _{i}^2)=4d_g\xi \sum _{i=1}^{n-1}\frac{1}{(1-\rho _1^2)}\\= & {} \frac{4d_g\xi (n-1)}{1-\rho _1^2}. \end{aligned}$$

It is also a simple matter to find autocorrelation in lag s. For example, in lag 1, one has

$$\begin{aligned} \text {Cov}(\epsilon _i,\epsilon _{i-1})= & {} E(\epsilon _{i}\epsilon _{i-1})\\= & {} E\{(\rho _1\epsilon _{i-1}+e_i)\epsilon _{i-1}\}\\= & {} \rho _1\phi _\epsilon . \end{aligned}$$

Similarly, for lag 2, one obtains

$$\begin{aligned} \text {Cov}(\epsilon _i,\epsilon _{i-2})= & {} \text {E}(\epsilon _{i}\epsilon _{i-2})\\= & {} \text {E}\left[ \{\rho _1(\rho _1\epsilon _{i-2}+e_{i-1})+e_i\}\epsilon _{i-2}\right] \\= & {} \rho _1^2\phi _\epsilon , \end{aligned}$$

and the covariance between l periods of two errors is given by

$$\begin{aligned} \text {Cov}(\epsilon _i,\epsilon _{i-l})=\text {E}(\epsilon _i\epsilon _{i-l})=\text {E} (\epsilon _{i+l}\epsilon _i)=\frac{\rho _1^l\phi \xi }{1-\rho _1^2}. \end{aligned}$$

Thus, matrix \({\varvec{\varUpsilon }}_1\) may be constructed.

For AR(2) errors \(\epsilon _i\) may be expressed as

$$\begin{aligned} \begin{aligned} \epsilon _i&=\rho _1\epsilon _{i-1}+\rho _2\epsilon _{i-2}+e_i\\&=\rho _1(\rho _1\epsilon _{i-2}+\rho _2\epsilon _{i-3}+e_{i-1})+\rho _2(\rho _1\epsilon _{i-3}+\rho _2\epsilon _{i-4}+e_{i-2})+e_i\\&=e_i+\rho _1e_{i-1}+\rho _2e_{i-2}+\ldots ,\\ \end{aligned} \end{aligned}$$
(9)

where E\((e_i)=0\), E\((e_ie_{i'})=0\), for \(i\ne i'\), and E\((e_i^2) = \phi \xi \), thereby E\((\epsilon _i)=0\). This process is stationary if \(\rho _1 + \rho _2 <1 \), \(\rho _2-\rho _1<1 \) and \(-1<\rho _2<1\). According to Judge et al. (1985) and Fox (2015), the elements of the covariance matrix \({\varvec{\varUpsilon }}_2\) may be found from the variance

$$\begin{aligned} \phi _\epsilon \equiv \text {Var}(\epsilon _i)=\text {E}(\epsilon _i^2)=\phi \xi \frac{(1-\rho _2)}{(1+\rho _2) \{(1-\rho _2)^2-\rho _1^2\}}. \end{aligned}$$

Multiplying (9) by \(\epsilon _{i-1} \) and taking expectation, one obtains

$$\begin{aligned} \text {Cov}(\epsilon _i,\epsilon _{i-1})= & {} \rho _1\text {E}(\epsilon _{i-1}^2)+\rho _2\text {E}(\epsilon _{i-1}\epsilon _{i-2})\\= & {} \rho _1\phi _{\epsilon }+\rho _2\text {Cov}(\epsilon _i,\epsilon _{i-1}), \end{aligned}$$

and since E\((\epsilon _{i-1}^2)=\phi _\epsilon \) and E\((\epsilon _{i-1}\epsilon _{i-2})= \text {Cov}(\epsilon _{i-1},\epsilon _{i-2})=\text {Cov}(\epsilon _i,\epsilon _{i-1})\), solving for autocovariance one obtains

$$\begin{aligned} \phi _1\equiv \text {Cov}(\epsilon _i,\epsilon _{i-1})=\frac{\rho _1}{1-\rho _2}\phi _\epsilon . \end{aligned}$$

Similarly, for \(l>1\),

$$\begin{aligned} \phi _l\equiv \text {Cov}(\epsilon _i,\epsilon _{i-l})= & {} \rho _1\text {E} (\epsilon _{i-1}\epsilon _{i-l})+\rho _2\text {E}(\epsilon _{i-2}\epsilon _{i-l})\\= & {} \rho _1\phi _{l-1}+\rho _2\phi _{l-2}, \end{aligned}$$

so we may find the the autocovariance recursively. For example, for \(l=2\), one has

$$\begin{aligned} \phi _2= & {} \rho _1\phi _1+\rho _2\phi _0\\= & {} \rho _1\phi _{1}+\rho _2\phi _{\epsilon }, \end{aligned}$$

where \(\phi _0=\phi _{\epsilon }\), and for \(l=3\),

$$\begin{aligned} \phi _3=\rho _1\phi _2+\rho _2\phi _1. \end{aligned}$$

In general for AR(p) errors one may write

$$\begin{aligned} \text {K}_p^{\rho _j\rho _j}=\frac{4d_g}{\phi }\sum _{i=j+1}^{n}\text {E}\left( \epsilon _{i-j}^2\right) =\frac{4d_g}{\phi }\sum _{i=1}^{n-j} \text {E}\left( \epsilon _{i}^2\right) =\frac{4d_g(n-j)}{\phi }\phi _\epsilon , \end{aligned}$$

and the Fisher information for (\(\rho _j,\rho _{j'})\), \(j \ne j'=1,\ldots ,p\), is given by

$$\begin{aligned} \text {K}_p^{\rho _j\rho _j}= & {} \text {E}\left( -\frac{\partial ^2 \text {L}_{{p}_{i}}({\varvec{\theta }},{\varvec{\lambda }})}{\partial \rho _{j}\partial \rho _{j'}}\right) \\= & {} \text {E}\left\{ \frac{4}{\phi }W'_g(\delta _i)\delta _i(\epsilon _{i-j})(\epsilon _{i-j'})+\frac{2}{\phi }W_g(\delta _i) (\epsilon _{i-j})(\epsilon _{i-j'})\right\} \\= & {} \text {E}\left\{ \frac{1}{\phi }\left( 4W'_g(\delta _i)\delta _i+2W_g(\delta _i)\right) (\epsilon _{i-j})(\epsilon _{i-j'})\right\} \\= & {} -\frac{1}{\phi }\text {E}\left[ \text {E}\left\{ \left( 4W'_g(\delta _i)\delta _i+2W_g(\delta _i)\right) (\epsilon _{i-j}) (\epsilon _{i-j'})\mid y_{i-1},\ldots ,y_{i-p}\right\} \right] \\= & {} \frac{4d_g}{\phi }\text {E}(\epsilon _{i-j}\epsilon _{i-j'}). \end{aligned}$$

Considering \(j'<j\), we may write the Fisher information for \((\rho _j,\rho _{j'})\), \(j \ne j'\), as follows

$$\begin{aligned} \text {K}_p^{\rho _j\rho _{j'}}=\frac{4d_g}{\phi }\sum _{i=j+1}^{n+j'}\text {E}(\epsilon _{i-j}\epsilon _{i-j'})=\frac{4d_g}{\phi }\sum _{i=1}^{n-j+j'}\text {E}(\epsilon _{i}\epsilon _{i+j-j'})=\frac{4d_g(n-j+j')}{\phi }\phi _{j-j'}. \end{aligned}$$

The expression for the \(\varUpsilon _p\) becomes progressively more complicated. A general expression is given in Wise (1955).

The penalized Fisher information matrix for \(({\varvec{\gamma }}_T^\top ,{\varvec{\gamma }}_S^\top )^\top \) is given by

$$\begin{aligned} \mathbf{K}_p^{\gamma _T\gamma _S}=-\text {E}\left\{ \frac{1}{\phi }(\mathbf{A}\mathbf{N}_T)^\top (4\mathbf{D}_d-\mathbf{D}_v) (\mathbf{A}\mathbf{N}_S)\right\} =\frac{4d_g}{\phi }\left\{ (\mathbf{A}\mathbf{N}_T)^\top (\mathbf{A}\mathbf{N}_S)\right\} , \end{aligned}$$

and we can see that \({\varvec{\gamma }}_T\) and \({\varvec{\gamma }}_S\) are not orthogonal.

From the properties of the symmetric distributions we have that

$$\begin{aligned} \text {E}\left\{ \left( W_g(\delta _i)+W'_g(\delta _i)\delta _i\right) (\epsilon _i-\rho _1\epsilon _{i-1}-\cdots \rho _p\epsilon _{i-p})\mid y_{i-1},\ldots ,y_{i-p}\right\} =0. \end{aligned}$$

Then, we may obtain the following penalized Fisher information matrices:

$$\begin{aligned} \mathbf{K}_p^{\phi \gamma _T}= & {} -\text {E}\left[ \frac{1}{\phi ^2} \left\{ (\mathbf{A}\mathbf{N}_T)^\top (-\mathbf{D}_v+2\mathbf{D}_d)(\mathbf{A}{\varvec{\epsilon }})\right\} \right] =\mathbf{0} ,\\ \mathbf{K}_p^{\phi \gamma _S}= & {} -\text {E}\left[ \frac{1}{\phi ^2}\left\{ (\mathbf{A}\mathbf{N}_S)^\top (-\mathbf{D}_v+2\mathbf{D}_d)(\mathbf{A}{\varvec{\epsilon }})\right\} \right] =\mathbf{0} \ \ \text {and} \\ \text {K}_p^{\phi \rho _j}= & {} -\text {E}\left[ \frac{1}{\phi ^2}\left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{D}_v-2\mathbf{D}_d)(\mathbf{A}{\varvec{\epsilon }})\right\} \right] =0, \ \ \forall j=1,\ldots ,p. \end{aligned}$$

From E\((\mathbf{U}_p^{\gamma _T})=\mathbf{0}\) it follows that

$$\begin{aligned} \text {E}\left\{ \frac{1}{\phi }(\mathbf{A}\mathbf{N}_T)^\top \mathbf{D}_{v}\mathbf{A}{\varvec{\epsilon }}\right\} =\mathbf{0} , \end{aligned}$$

then \(\text {E}(\mathbf{D}_v\mathbf{A}{\varvec{\epsilon }})=\mathbf{0} \), so we may obtain

$$\begin{aligned} \mathbf{K}_p^{\rho _j\gamma _T}= & {} -\frac{1}{\phi }\text {E} \left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{D}_v-4\mathbf{D}_d) (\mathbf{A}\mathbf{N}_T)+(\mathbf{C}_j\mathbf{N}_T)^\top \mathbf{D}_v(\mathbf{A}{\varvec{\epsilon }})\right\} \\= & {} -\frac{1}{\phi }\text {E}\left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{D}_v-4\mathbf{D}_d) (\mathbf{A}\mathbf{N}_T)\right\} , \end{aligned}$$

and since E\(\left\{ (\mathbf{C}_j{\varvec{\epsilon }})^\top \right\} =\mathbf{0} \) we have that

$$\begin{aligned} \mathbf{K}_p^{\rho _j \gamma _T}= & {} -\text {E}\left[ \text {E} \left\{ \frac{1}{\phi }\left( (\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{D}_v-4\mathbf{D}_d)\mathbf{A}\mathbf{N}_T\right) \right\} \mid y_{i-1}-\cdots -y_{i-p}\right] \\= & {} \text {E}\left\{ \frac{4d_g}{\phi }(\mathbf{C}_j{\varvec{\epsilon }})^\top (\mathbf{A}\mathbf{N}_T)\right\} =\mathbf{0} . \end{aligned}$$

Similarly, we may show that \(\mathbf{K}_p^{\rho _j\gamma _S}=\mathbf{0} \).

Appendix D: Case-weight perturbation scheme

Consider the attributed weights in the penalized log-likelihood function as

$$\begin{aligned} \text {L}_p({\varvec{\theta }},{\varvec{\lambda }}\mid {\varvec{\omega }})= \text {L}_p({\varvec{\theta }}\mid {\varvec{\omega }}) - \frac{\lambda _T}{2}{\varvec{\gamma }}_T^\top \mathbf{M}_T {\varvec{\gamma }}_T -\frac{\lambda _S}{2}{\varvec{\gamma }}_S^\top \mathbf{M}_S {\varvec{\gamma }}_S, \end{aligned}$$

where \(\text {L}_p({\varvec{\theta }}\mid {\varvec{\omega }})= \sum _{i=1}^{n}\omega _i\text {L}_i({\varvec{\theta }})\), \({\varvec{\omega }}=(\omega _1,\ldots ,\omega _n)^\top \) is the vector of weights, with \(0~\le ~\omega _i~\le ~1\). In this case the vector of no perturbation is given by \({\varvec{\omega }}_0=\mathbf{1} _{n}\).

For this perturbation scheme we obtain

$$\begin{aligned} {\varvec{\varDelta }}_1= & {} \frac{1}{\hat{\phi }}(\widehat{\mathbf{A}}\mathbf{N}_T)^\top \widehat{\mathbf{D}}_v \widehat{\mathbf{D}}_{(\text {A}_{\epsilon })},\\ {\varvec{\varDelta }}_2= & {} \frac{1}{\hat{\phi }}(\widehat{\mathbf{A}}\mathbf{N}_S)^\top \widehat{\mathbf{D}}_v \widehat{\mathbf{D}}_{(\text {A}_{\epsilon })},\\ {\varvec{\varDelta }}_3= & {} \frac{1}{2\hat{\phi }}{} \mathbf{1} _n^\top \left( \widehat{\mathbf{D}}_m-\mathbf{I} _n\right) \ \ \text {and} \\ {\varvec{\varDelta }}_{4_j}= & {} \frac{1}{\hat{\phi }}(\widehat{\mathbf{C}}_j\widehat{{\varvec{\epsilon }}})^\top \widehat{\mathbf{D}}_v \widehat{\mathbf{D}}_{(\text {A}_{\epsilon })}, \ \ \forall j=1,\ldots , p, \end{aligned}$$

where \(\mathbf{D}_v=\text {diag}\left\{ v_1,\ldots ,v_n\right\} \) with \(v_i=-2W_g(\delta _i)\), \(\mathbf{D}_m=\text {diag}\left\{ m_1,\ldots ,m_n\right\} \) with \(m_i~=~v_i~\delta _i\), for \(i=1,\ldots ,n\), \(\epsilon _0=0\) and \(\mathbf{D}_{(\text {A}_{\epsilon })}\) is a diagonal matrix with elements given by \(\mathbf{A}{\varvec{\epsilon }}\) evaluated at \(\widehat{{\varvec{\theta }}}\). In addition,

$$\begin{aligned} {\varvec{\epsilon }}=\mathbf{y}-\mathbf{N}_T{\varvec{\gamma }}_T-\mathbf{N}_S{\varvec{\gamma }}_S \ \ \text {and} \ \ \delta _i=\frac{(\epsilon _i-\rho _1\epsilon _{i-1}-\ldots -\rho _p\epsilon _{i-p})^2}{\phi }. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, R.A., Paula, G.A. Additive models with autoregressive symmetric errors based on penalized regression splines. Comput Stat 36, 2435–2466 (2021). https://doi.org/10.1007/s00180-021-01106-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01106-2

Keywords

Mathematics Subject Classification

Navigation