Skip to main content
Log in

Semiparametric additive models under symmetric distributions

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In this paper we discuss estimation and diagnostic procedures in semiparametric additive models with symmetric errors in order to permit distributions with heavier and lighter tails than the normal ones, such as Student-t, Pearson VII, power exponential, logistics I and II, and contaminated normal, among others. Such models belong to the general class of statistical models GAMLSS proposed by Rigby and Stasinopoulos (Appl. Stat. 54:507–554, 2005). A back-fitting algorithm to attain the maximum penalized likelihood estimates (MPLEs) by using natural cubic smoothing splines is presented. In particular, the score functions and Fisher information matrices for the parameters of interest are expressed in a similar notation of that used in parametric symmetric models. Sufficient conditions on the existence of the MPLEs are presented as well as some inferential results and discussions on degrees of freedom and smoothing parameter estimation. Diagnostic quantities such as leverage, standardized residual and normal curvatures of local influence under two perturbation schemes are derived. A real data set previously analyzed under normal linear models is reanalyzed under semiparametric additive models with symmetric errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csàki F (eds) International symposium on information theory, Akadémiai Kiadó, Budapest, Hungary, pp 267–281

    Google Scholar 

  • Atkinson AC (1981) Two graphical display for outlying and influential observations in regression. Biometrika 68:13–20

    Article  MathSciNet  MATH  Google Scholar 

  • Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics. Identifying influential data and sources of collinearity. Wiley, New York

    Book  MATH  Google Scholar 

  • Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17:453–555

    Article  MathSciNet  MATH  Google Scholar 

  • Cook RD (1986) Assessment of local influence (with discussion). J R Stat Soc B 48:133–169

    MATH  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Numer Math 31:377–403

    Article  MathSciNet  MATH  Google Scholar 

  • Cysneiros FJA, Paula GA (2005) Restricted methods in symmetrical linear regression models. Comput Stat Data Anal 49:689–708

    Article  MathSciNet  MATH  Google Scholar 

  • Cysneiros FJA, Paula GA, Galea M (2007) Heteroscedastic symmetrical linear models. Stat Probab Lett 77:1084–1090

    Article  MathSciNet  MATH  Google Scholar 

  • Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2):89–121

    Article  MathSciNet  MATH  Google Scholar 

  • Eubank RL (1984) The hat matrix for smoothing splines. Stat Probab Lett 2:9–14

    Article  MathSciNet  MATH  Google Scholar 

  • Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distribution. Chapman and Hall, London

    Google Scholar 

  • Fung WK, Zhu ZY, Wei BC, He X (2002) Influence diagnostics and outlier tests for semiparametric mixed models. J R Stat Soc B 64:565–579

    Article  MathSciNet  MATH  Google Scholar 

  • Galea M, Paula GA, Uribe-Opazo M (2003) On influence diagnostic in univariate elliptical linear regression models. Stat Pap 44:23–45

    Article  MathSciNet  MATH  Google Scholar 

  • Galea M, Paula GA, Cysneiros FJA (2005) On diagnostics in symmetrical nonlinear models. Stat Probab Lett 73:459–467

    Article  MathSciNet  MATH  Google Scholar 

  • Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Gourieroux C, Monford A (1995) Statistics and econometric models, vols 1 and 2. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London

    MATH  Google Scholar 

  • Hurvich CM, Simonoff JS, Tsai C-L (1998) Smoothing parameter selection in nonparametric regression using an improved akaike information criterion. J R Stat Soc B 60:271–293

    Article  MathSciNet  MATH  Google Scholar 

  • Ibacache-Pulgar G, Paula GA (2011) Local influence for Student-t partially linear models. Comput Stat Data Anal 55:1462–1478

    Article  MathSciNet  Google Scholar 

  • Ibacache-Pulgar G, Paula GA, Galea M (2012) Influence diagnostics for elliptical semiparametric mixed models. Stat Model 12:165–193

    Article  MathSciNet  Google Scholar 

  • Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J Am Stat Assoc 84:881–896

    MathSciNet  Google Scholar 

  • Poon W, Poon YS (1999) Conformal normal curvature and assessment of local influence. J R Stat Soc B 61:51–61

    Article  MathSciNet  MATH  Google Scholar 

  • Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape. Appl Stat 54:507–554

    MathSciNet  MATH  Google Scholar 

  • Schwarz C (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc B 47:1–52

    MATH  Google Scholar 

  • Simonoff JS, Tsai CL (1999) Semiparametric and additive model selection using an improved akaike information criterion. J Comput Graph Stat 8:22–40

    MathSciNet  Google Scholar 

  • Wahba G (1983) Bayesian confidence intervals for the cross-validated smoothing spline. J R Stat Soc B 45:133–150

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor, associate editor and reviewers for their helpful comments. This work was supported by CAPES, CNPq and FAPESP, Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilberto A. Paula.

Appendices

Appendix A

Let D a =diag1≤in (a i ), with \(a_{i}=- 2 (\zeta_{i} + 2 \zeta_{i}' \delta_{i} )\) and \(\zeta _{i}'=\frac{\mathrm{d}\zeta_{i}}{\mathrm{d}\delta_{i}}\). Below we derive sufficient conditions to guarantee the concavity of the penalized log-likelihood function \(L_{\mathrm{p}}(\mbox {\boldmath \(\beta\)\unboldmath }, \mathbf {f}_{1},\mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) in \(\mbox {\boldmath \(\beta\)\unboldmath }\), f 1, f 2 and ϕ. In effect, we have the following.

(a′):

In the step (a) (Sect. 3.1), the concavity (in \(\mbox {\boldmath \(\beta\)\unboldmath }\)) of \(L_{\mathrm{p}}(\mbox {\boldmath \(\beta\)\unboldmath }, \mathbf {f}_{1}, \mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) is guaranteed if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{\beta\beta} = -\frac{1}{\phi} \mathbf {X}^{T} \mathbf {D}_{a} \mathbf {X}\leq0\) (negative semidefinite) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{\beta\beta}\geq0\) (non-negative definite). One has \(- \mathbf {L}_{\mathrm{p}}^{\beta\beta}\geq0\) if the matrix D a ≥0, that is, if a i ≥0, ∀i=1,…,n.

(b′):

Then, in the step (b) (Sect. 3.1), one has concavity (in f 1) of \(L_{\mathrm{p}}^{c}(\mathbf {f}_{1}, \mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath }) \) if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}} = - ( \frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1} + \alpha_{1} \mathbf {K}_{1} ) \leq0\) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}}\geq0\). Consequently, \(- \mathbf {L}_{\mathrm{p}}^{f_{1} f_{1}} \geq 0\) if only if \(\frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1}\geq0\) and α 1 K 1≥0. Since α 1 is a positive scalar and K 1≥0, we have α 1 K 1≥0. On the other hand, \(\frac{1}{\phi} \mathbf {N}_{1}^{T} \mathbf {D}_{a} \mathbf {N}_{1}\geq 0\) if D a ≥0, that is, if a i ≥0, ∀i=1,…,n.

(c′):

Analogously, in the step (c) (Sect. 3.1), one has concavity (in f 2) of \(L_{\mathrm{p}}^{c}(\mathbf {f}_{2}, \phi, \mbox {\boldmath \(\alpha\)\unboldmath }) \) if only if the matrix \(\mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}} = - ( \frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2} + \alpha_{2} \mathbf {K}_{2} )\leq0\) or, equivalently, if only if \(- \mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}}\geq0\). Consequently, \(- \mathbf {L}_{\mathrm{p}}^{f_{2} f_{2}}\geq0\) if only if \(\frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2}\geq 0\) and α 2 K 2≥0. Since α 2 is a positive scalar and K 2≥0, we have α 2 K 2≥0. On the other hand, \(\frac{1}{\phi} \mathbf {N}_{2}^{T} \mathbf {D}_{a} \mathbf {N}_{2} \geq0\) if D a ≥0, that is, if a i ≥0, ∀i=1,…,n.

(d′):

Finally, in the step (d) (Sect. 3.1), the concavity (in ϕ) of \(L_{\mathrm{p}}^{c}(\phi, \mbox {\boldmath \(\alpha\)\unboldmath })\) is guaranteed if only if \(\partial^{2} L_{\mathrm{p}}^{c}(\phi, \mbox {\boldmath \(\alpha\)\unboldmath }) / \partial \phi^{2} < 0\), ∀ϕ.

Appendix B

2.1 B.1 Score function

Consider the penalized log-likelihood function given by (4). The score function of θ is given by \(\mbox {\boldmath \(\mathrm{U}\)\unboldmath }_{\mathrm{p}} = \partial L_{\mathrm{p}}(\boldsymbol{\theta}, \mbox {\boldmath \(\alpha\)\unboldmath })/ \partial \mbox {\boldmath \(\theta\)\unboldmath }\). In particular, we obtain

where D v is defined in Sect. 3.3 and \(\mbox {\boldmath \(\epsilon \)\unboldmath }=\mathbf {y}-\mbox {\boldmath \(\mu\)\unboldmath }\), with \(\mbox {\boldmath \(\mu\)\unboldmath }= \mathbf {X}\mbox {\boldmath \(\beta\)\unboldmath } + \sum_{k=1}^{s} \mathbf {N}_{k} \mathbf {f}_{k}\).

2.2 B.2 Hessian matrix

Let L p (p ×p ) be the Hessian matrix with (j , )-element given by \(\partial^{2} L_{\mathrm {p}}(\boldsymbol{\theta}, \mbox {\boldmath \(\alpha\)\unboldmath })/ \partial\theta_{j^{*}} \theta_{\ell^{*}}\), for j , =1,…,p . After some algebraic manipulations we find

where \(\mathbf {D}_{\zeta'}=\mathrm{diag}_{1\leq i \leq n} (\zeta_{i}' )\), b=(b 1,…,b n )T, \(\mbox {\boldmath \(\delta\)\unboldmath }=(\delta_{1},\ldots,\delta_{n})^{T}\), and b i =(ζ i +ζ i δ i )ϵ i , for i=1,…,n.

2.3 B.3 Expected information matrix

Let \(\mathbf {D}_{d} =\frac{4 d_{g}}{\phi} \mathbf {I}_{(n,n)}\), with \(d_{g}= \mathrm{E}(\zeta^{2}(\epsilon_{i}^{2}) \epsilon_{i}^{2})\), \(f_{g}= \mathrm{E}(\zeta^{2}(\epsilon_{i}^{2}) \epsilon_{i}^{4})\) and ϵ i S(0,ϕ,g). By calculating the expectation of the −L p we find that the (p ×p ) expected information matrix takes the following block-diagonal form:

$$\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p} = \left (\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} \mathbf {X}^{T}\mathbf {D}_{d}\mathbf {X}& & \mathbf {X}^{T}\mathbf {D}_{d}\mathbf {N}_{1} & \ldots& \mathbf {X}^{T}\mathbf {D}_{d}\mathbf {N}_{s} & \mathbf {0}\\[3pt] \mathbf {N}_{1}^{T}\mathbf {D}_{d}\mathbf {X}&& \mathbf {N}_{1}^{T}\mathbf {D}_{d}\mathbf {N}_{1} + \alpha_{1} \mathbf {K}_{1} & \ldots& \mathbf {N}_{1}^{T}\mathbf {D}_{d}\mathbf {N}_{s} & \mathbf {0}\\[3pt] \vdots&& \vdots& \ddots& \vdots& \vdots\\[3pt] \mathbf {N}_{s}^{T}\mathbf {D}_{d}\mathbf {X}&& \mathbf {N}_{s}^{T}\mathbf {D}_{d}\mathbf {N}_{1}& \ldots& \mathbf {N}_{s}^{T}\mathbf {D}_{d}\mathbf {N}_{s} + \alpha_{s} \mathbf {K}_{s} & \vdots\\[3pt] \mathbf {0}&& \mathbf {0}& \ldots& \mathbf {0}& \mathcal{I}_\mathrm{p}^{\phi\phi} \end{array} \right ). $$

Now, taking \(\mathcal{I}_{\mathrm{p}}^{\phi\phi}=\frac{n (4f_{g} - 1 )}{4\phi^{2}}\), \(\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_{\mathrm{p}}^{\beta\beta} = \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {X}\), \(\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_{\mathrm{p}}^{\beta\mathrm{f}} = (\begin{array}{c@{\ }c@{\ }c} \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {N}_{1} & \ldots& \mathbf {X}^{T}\mathbf {D}_{d} \mathbf {N}_{s} \\ \end{array} )\) and

$$\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\mathrm{f} \mathrm{f}} = \left (\begin{array}{c@{\quad }c@{\quad }c@{\quad }c} \mathbf {N}_{1}^{T}\mathbf {D}_{d} \mathbf {N}_{1} + \alpha_{1} \mathbf {K}_{1} & \ldots& \mathbf {N}_{1}^{T}\mathbf {D}_{d} \mathbf {N}_{s} \\[3pt] \vdots& \ddots& \vdots\\[3pt] \mathbf {N}_{s}^{T}\mathbf {D}_{d} \mathbf {N}_{1}& \ldots& \mathbf {N}_{s}^{T}\mathbf {D}_{d} \mathbf {N}_{s} + \alpha_{s} \mathbf {K}_{s} \end{array} \right ) , $$

we have

$$\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{-1}= \left (\begin{array}{c@{\quad }c@{\quad }c@{\quad }c} ( \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\beta} - \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\mathrm{f}} \mbox {\boldmath \(\mathcal {I}\)\unboldmath }_\mathrm{p}^{\mathrm{f}\mathrm{f}^{-1}} \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm {p}^{\beta\mathrm{f}^{T}} )^{-1} & \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\beta} \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\mathrm{f}} \mbox {\boldmath \(\mathcal {I}\)\unboldmath }_\mathrm{p}^{\mathrm{f} \mathrm{f}^{-1}} & \mathbf {0}\\[3pt] (\mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\beta} \mbox {\boldmath \(\mathcal {I}\)\unboldmath }_\mathrm{p}^{\beta\mathrm{f}} \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\mathrm{f} \mathrm{f}^{-1}} )^{T} & ( \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\mathrm{f} \mathrm{f}} - \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm{p}^{\beta\mathrm{f}^{T}} \mbox {\boldmath \(\mathcal {I}\)\unboldmath }_\mathrm{p}^{\beta\beta^{-1}} \mbox {\boldmath \(\mathcal{I}\)\unboldmath }_\mathrm {p}^{\beta\mathrm{f}} )^{-1} & \mathbf {0}\\[3pt] \mathbf {0}& \mathbf {0}& \mathcal{I}_\mathrm{p}^{\phi\phi^{-1}} \end{array} \right ). $$

Appendix C

In this appendix we present the expressions of the \(\mbox {\boldmath \(\varDelta \)\unboldmath }_{\mathrm{p}} =\partial^{2} L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath }, \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) / \partial \mbox {\boldmath \(\theta\)\unboldmath } \partial \mbox {\boldmath \(\omega\)\unboldmath }^{T}\) matrix for case-weight and explanatory variable perturbation schemes.

3.1 C.1 Cases-weight perturbation

Let us consider the attributed weights for the observations in the penalized log-likelihood function as \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })= L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) - \sum_{k=1}^{s} \frac{\alpha_{k}}{2} \mathbf {f}_{k}^{T} \mathbf {K}_{k} \mathbf {f}_{k}\), where \(L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })=\sum_{i=1}^{n} \omega_{i} L_{i}(\mbox {\boldmath \(\theta\)\unboldmath })\), \(\mbox {\boldmath \(\omega\)\unboldmath }=(\omega_{1},\ldots,\omega_{n})^{T}\) is the vector of weights, with 0≤ω i ≤1. In this case, the vector of no perturbation is given by \(\mbox {\boldmath \(\omega\)\unboldmath }_{0}=\mathbf {1}_{(n \times 1)}\). Differentiating \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })\) with respect to the elements of \(\mbox {\boldmath \(\theta\)\unboldmath }\) and ω i , we obtain

3.2 C.2 Explanatory variable perturbation

Here the dth explanatory variable, assumed continuous, is perturbed by considering the additive perturbation scheme, namely x idω =x id +ω i , where \(\mbox {\boldmath \(\omega\)\unboldmath }=(\omega_{1},\ldots,\omega_{n})^{T}\) is the vector of perturbations such as \(\omega_{i}\in\mathcal{R}\). In this case, the vector of no perturbation is given by \(\mbox {\boldmath \(\omega\)\unboldmath }_{0}=\mathbf {0}_{(n \times1)}\) and the perturbed penalized log-likelihood function is given by \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })= L(\mbox {\boldmath \(\theta\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath }) - \sum_{k=1}^{s} \frac{\alpha_{k}}{2} \mathbf {f}_{k}^{T} \mathbf {K}_{k} \mathbf {f}_{k}\), where L(⋅) is given by (3) with δ =ϕ −1(y i μ )2 in the place of δ i and μ =μ i +ω i β d . Differentiating \(L_{\mathrm{p}}(\mbox {\boldmath \(\theta\)\unboldmath } , \mbox {\boldmath \(\alpha\)\unboldmath } | \mbox {\boldmath \(\omega\)\unboldmath })\) with respect to the elements of \(\mbox {\boldmath \(\theta\)\unboldmath }\) and ω i , we obtain, after some algebraic manipulation,

Here \(\mbox {\boldmath \(\mathrm{z}\)\unboldmath }_{d}\) denotes a (p×1) vector with 1 in the dth position and zero elsewhere, and \(\widehat{\beta}_{d}\) denotes the dth element of \(\widehat{\mbox {\boldmath \(\beta\)\unboldmath }}\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ibacache-Pulgar, G., Paula, G.A. & Cysneiros, F.J.A. Semiparametric additive models under symmetric distributions. TEST 22, 103–121 (2013). https://doi.org/10.1007/s11749-012-0309-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-012-0309-z

Keywords

Mathematics Subject Classification (2010)

Navigation