Abstract
In this paper we discuss estimation and diagnostic procedures in semiparametric additive models with symmetric errors in order to permit distributions with heavier and lighter tails than the normal ones, such as Student-t, Pearson VII, power exponential, logistics I and II, and contaminated normal, among others. Such models belong to the general class of statistical models GAMLSS proposed by Rigby and Stasinopoulos (Appl. Stat. 54:507–554, 2005). A back-fitting algorithm to attain the maximum penalized likelihood estimates (MPLEs) by using natural cubic smoothing splines is presented. In particular, the score functions and Fisher information matrices for the parameters of interest are expressed in a similar notation of that used in parametric symmetric models. Sufficient conditions on the existence of the MPLEs are presented as well as some inferential results and discussions on degrees of freedom and smoothing parameter estimation. Diagnostic quantities such as leverage, standardized residual and normal curvatures of local influence under two perturbation schemes are derived. A real data set previously analyzed under normal linear models is reanalyzed under semiparametric additive models with symmetric errors.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csàki F (eds) International symposium on information theory, Akadémiai Kiadó, Budapest, Hungary, pp 267–281
Atkinson AC (1981) Two graphical display for outlying and influential observations in regression. Biometrika 68:13–20
Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics. Identifying influential data and sources of collinearity. Wiley, New York
Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17:453–555
Cook RD (1986) Assessment of local influence (with discussion). J R Stat Soc B 48:133–169
Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Numer Math 31:377–403
Cysneiros FJA, Paula GA (2005) Restricted methods in symmetrical linear regression models. Comput Stat Data Anal 49:689–708
Cysneiros FJA, Paula GA, Galea M (2007) Heteroscedastic symmetrical linear models. Stat Probab Lett 77:1084–1090
Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2):89–121
Eubank RL (1984) The hat matrix for smoothing splines. Stat Probab Lett 2:9–14
Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distribution. Chapman and Hall, London
Fung WK, Zhu ZY, Wei BC, He X (2002) Influence diagnostics and outlier tests for semiparametric mixed models. J R Stat Soc B 64:565–579
Galea M, Paula GA, Uribe-Opazo M (2003) On influence diagnostic in univariate elliptical linear regression models. Stat Pap 44:23–45
Galea M, Paula GA, Cysneiros FJA (2005) On diagnostics in symmetrical nonlinear models. Stat Probab Lett 73:459–467
Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models. Chapman and Hall, Boca Raton
Gourieroux C, Monford A (1995) Statistics and econometric models, vols 1 and 2. Cambridge University Press, Cambridge
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
Hurvich CM, Simonoff JS, Tsai C-L (1998) Smoothing parameter selection in nonparametric regression using an improved akaike information criterion. J R Stat Soc B 60:271–293
Ibacache-Pulgar G, Paula GA (2011) Local influence for Student-t partially linear models. Comput Stat Data Anal 55:1462–1478
Ibacache-Pulgar G, Paula GA, Galea M (2012) Influence diagnostics for elliptical semiparametric mixed models. Stat Model 12:165–193
Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84:881–896
Poon W, Poon YS (1999) Conformal normal curvature and assessment of local influence. J R Stat Soc B 61:51–61
Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape. Appl Stat 54:507–554
Schwarz C (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc B 47:1–52
Simonoff JS, Tsai CL (1999) Semiparametric and additive model selection using an improved akaike information criterion. J Comput Graph Stat 8:22–40
Wahba G (1983) Bayesian confidence intervals for the cross-validated smoothing spline. J R Stat Soc B 45:133–150
Acknowledgements
The authors are grateful to the editor, associate editor and reviewers for their helpful comments. This work was supported by CAPES, CNPq and FAPESP, Brazil.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Let D a =diag1≤i≤n (a i ), with \(a_{i}=- 2 (\zeta_{i} + 2 \zeta_{i}' \delta_{i} )\) and \(\zeta _{i}'=\frac{\mathrm{d}\zeta_{i}}{\mathrm{d}\delta_{i}}\). Below we derive sufficient conditions to guarantee the concavity of the penalized log-likelihood function \(L_{\mathrm{p}}(\mbox {\boldmath \(\beta\)\unboldmath }, \mathbf {f}_{1},\mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) in \(\mbox {\boldmath \(\beta\)\unboldmath }\), f 1, f 2 and ϕ. In effect, we have the following.
- (a′):
-
In the step (a) (Sect. 3.1), the concavity (in \(\mbox {\boldmath \(\beta\)\unboldmath }\)) of \(L_{\mathrm{p}}(\mbox {\boldmath \(\beta\)\unboldmath }, \mathbf {f}_{1}, \mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) is guaranteed if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{\beta\beta} = -\frac{1}{\phi} \mathbf {X}^{T} \mathbf {D}_{a} \mathbf {X}\leq0\) (negative semidefinite) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{\beta\beta}\geq0\) (non-negative definite). One has \(- \mathbf {L}_{\mathrm{p}}^{\beta\beta}\geq0\) if the matrix D a ≥0, that is, if a i ≥0, ∀i=1,…,n.
- (b′):
-
Then, in the step (b) (Sect. 3.1), one has concavity (in f 1) of \(L_{\mathrm{p}}^{c}(\mathbf {f}_{1}, \mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath }) \) if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}} = - ( \frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1} + \alpha_{1} \mathbf {K}_{1} ) \leq0\) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}}\geq0\). Consequently, \(- \mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}} \geq 0\) if only if \(\frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1}\geq0\) and α 1 K 1≥0. Since α 1 is a positive scalar and K 1≥0, we have α 1 K 1≥0. On the other hand, \(\frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1}\geq 0\) if D a ≥0, that is, if a i ≥0, ∀i=1,…,n.
- (c′):
-
Analogously, in the step (c) (Sect. 3.1), one has concavity (in f 2) of \(L_{\mathrm{p}}^{c}(\mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath }) \) if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}} = - ( \frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2} + \alpha_{2} \mathbf {K}_{2} )\leq0\) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}}\geq0\). Consequently, \(- \mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}}\geq0\) if only if \(\frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2}\geq 0\) and α 2 K 2≥0. Since α 2 is a positive scalar and K 2≥0, we have α 2 K 2≥0. On the other hand, \(\frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2} \geq0\) if D a ≥0, that is, if a i ≥0, ∀i=1,…,n.
- (d′):
-
Finally, in the step (d) (Sect. 3.1), the concavity (in ϕ) of \(L_{\mathrm{p}}^{c}(\phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) is guaranteed if only if \(\partial^{2} L_{\mathrm{p}}^{c}(\phi, \mbox {\boldmath \(\alpha\)\unboldmath }) / \partial \phi^{2} < 0\), ∀ϕ.
Appendix B
2.1 B.1 Score function
Consider the penalized log-likelihood function given by (4). The score function of θ is given by \(\mbox {\boldmath \(\mathrm{U}\)\unboldmath }_{\mathrm{p}} = \partial L_{\mathrm{p}}(\boldsymbol{\theta}, \mbox {\boldmath \(\alpha\)\unboldmath })/ \partial \mbox {\boldmath \(\theta\)\unboldmath }\). In particular, we obtain
where D v is defined in Sect. 3.3 and \(\mbox {\boldmath \(\epsilon \)\unboldmath }=\mathbf {y}-\mbox {\boldmath \(\mu\)\unboldmath }\), with \(\mbox {\boldmath \(\mu\)\unboldmath }= \mathbf {X}\mbox {\boldmath \(\beta\)\unboldmath } + \sum_{k=1}^{s} \mathbf {N}_{k} \mathbf {f}_{k}\).
2.2 B.2 Hessian matrix
Let L p (p ∗×p ∗) be the Hessian matrix with (j ∗,ℓ ∗)-element given by \(\partial^{2} L_{\mathrm {p}}(\boldsymbol{\theta}, \mbox {\boldmath \(\alpha\)\unboldmath })/ \partial\theta_{j^{*}} \theta_{\ell^{*}}\), for j ∗,ℓ ∗=1,…,p ∗. After some algebraic manipulations we find
where \(\mathbf {D}_{\zeta'}=\mathrm{diag}_{1\leq i \leq n} (\zeta_{i}' )\), b=(b 1,…,b n )T, \(\mbox {\boldmath \(\delta\)\unboldmath }=(\delta_{1},\ldots,\delta_{n})^{T}\), and b i =(ζ i +ζ′ i δ i )ϵ i , for i=1,…,n.
2.3 B.3 Expected information matrix
Let \(\mathbf {D}_{d} =\frac{4 d_{g}}{\phi} \mathbf {I}_{(n,n)}\), with \(d_{g}= \mathrm{E}(\zeta^{2}(\epsilon_{i}^{2}) \epsilon_{i}^{2})\), \(f_{g}= \mathrm{E}(\zeta^{2}(\epsilon_{i}^{2}) \epsilon_{i}^{4})\) and ϵ i ∼S(0,ϕ,g). By calculating the expectation of the −L p we find that the (p ∗×p ∗) expected information matrix takes the following block-diagonal form:
Now, taking \(\mathcal{I}_{\mathrm{p}}^{\phi\phi}=\frac{n (4f_{g} - 1 )}{4\phi^{2}}\), \(\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_{\mathrm{p}}^{\beta\beta} = \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {X}\), \(\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_{\mathrm{p}}^{\beta\mathrm{f}} = (\begin{array}{c@{\ }c@{\ }c} \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {N}_{1} & \ldots& \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {N}_{s} \\ \end{array} )\) and
we have
Appendix C
In this appendix we present the expressions of the \(\mbox {\boldmath \(\varDelta \)\unboldmath }_{\mathrm{p}} =\partial^{2} L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath }, \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) / \partial \mbox {\boldmath \(\theta\)\unboldmath } \partial \mbox {\boldmath \(\omega\)\unboldmath }^{T}\) matrix for case-weight and explanatory variable perturbation schemes.
3.1 C.1 Cases-weight perturbation
Let us consider the attributed weights for the observations in the penalized log-likelihood function as \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })= L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) - \sum_{k=1}^{s} \frac{\alpha_{k}}{2} \mathbf {f}_{k}^{T} \mathbf {K}_{k} \mathbf {f}_{k}\), where \(L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })=\sum_{i=1}^{n} \omega_{i} L_{i}(\mbox {\boldmath \(\theta\)\unboldmath })\), \(\mbox {\boldmath \(\omega\)\unboldmath }=(\omega_{1},\ldots,\omega_{n})^{T}\) is the vector of weights, with 0≤ω i ≤1. In this case, the vector of no perturbation is given by \(\mbox {\boldmath \(\omega\)\unboldmath }_{0}=\mathbf {1}_{(n \times 1)}\). Differentiating \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })\) with respect to the elements of \(\mbox {\boldmath \(\theta\)\unboldmath }\) and ω i , we obtain
3.2 C.2 Explanatory variable perturbation
Here the dth explanatory variable, assumed continuous, is perturbed by considering the additive perturbation scheme, namely x idω =x id +ω i , where \(\mbox {\boldmath \(\omega\)\unboldmath }=(\omega_{1},\ldots,\omega_{n})^{T}\) is the vector of perturbations such as \(\omega_{i}\in\mathcal{R}\). In this case, the vector of no perturbation is given by \(\mbox {\boldmath \(\omega\)\unboldmath }_{0}=\mathbf {0}_{(n \times1)}\) and the perturbed penalized log-likelihood function is given by \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })= L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) - \sum_{k=1}^{s} \frac{\alpha_{k}}{2} \mathbf {f}_{k}^{T} \mathbf {K}_{k} \mathbf {f}_{k}\), where L(⋅) is given by (3) with δ iω =ϕ −1(y i −μ iω )2 in the place of δ i and μ iω =μ i +ω i β d . Differentiating \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })\) with respect to the elements of \(\mbox {\boldmath \(\theta\)\unboldmath }\) and ω i , we obtain, after some algebraic manipulation,
Here \(\mbox {\boldmath \(\mathrm{z}\)\unboldmath }_{d}\) denotes a (p×1) vector with 1 in the dth position and zero elsewhere, and \(\widehat{\beta}_{d}\) denotes the dth element of \(\widehat{\mbox {\boldmath \(\beta\)\unboldmath }}\).
Rights and permissions
About this article
Cite this article
Ibacache-Pulgar, G., Paula, G.A. & Cysneiros, F.J.A. Semiparametric additive models under symmetric distributions. TEST 22, 103–121 (2013). https://doi.org/10.1007/s11749-012-0309-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-012-0309-z