A semiparametric approach for joint modeling of median and skewness

Vanegas , Luis Hernando; Paula, Gilberto A.

doi:10.1007/s11749-014-0401-7

A semiparametric approach for joint modeling of median and skewness

Original Paper
Published: 22 October 2014

Volume 24, pages 110–135, (2015)
Cite this article

TEST Aims and scope Submit manuscript

Luis Hernando Vanegas ^1,2 &
Gilberto A. Paula¹

320 Accesses
24 Citations
Explore all metrics

Abstract

We motivate this paper by showing through Monte Carlo simulation that ignoring the skewness of the response variable distribution in non-linear regression models may introduce biases on the parameter estimates and/or on the estimation of the associated variability measures. Then, we propose a semiparametric regression model suitable for data set analysis in which the distribution of the response is strictly positive and asymmetric. In this setup, both median and skewness of the response variable distribution are explicitly modeled, the median using a parametric non-linear function and the skewness using a semiparametric function. The proposed model allows for the description of the response using the log-symmetric distribution, which is a generalization of the log-normal distribution and is flexible enough to consider bimodal distributions in special cases as well as distributions having heavier or lighter tails than those of the log-normal one. An iterative estimation process as well as some diagnostic methods are derived. Two data sets previously analyzed under parametric models are reanalyzed using the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stochastic approximation ECME algorithm to semi-parametric scale mixtures of centred skew normal regression models

Article 06 March 2023

The unit generalized half-normal quantile regression model: formulation, estimation, diagnostics, and numerical applications

Article 27 July 2022

Normal-Power-Logistic Distribution: Properties and Application in Generalized Linear Model

Article 19 November 2022

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) International symposium on information theory. Akademiai Kiado Budapest, Hungary, pp 267–281
Google Scholar
Andrews DR, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102
MATH MathSciNet Google Scholar
Barndoff-Nielsen O (1977) Exponentially decreasing distributions for the logarithm of particle size. Proc R Soc Lond A Math 353:401–419
Article Google Scholar
Birnbaum ZW, Saunders SC (1969) A new family of life distributions. J Appl Probab 6:319–327
Article MATH MathSciNet Google Scholar
Box GEP, Tiao GC (1973) Bayesian inference in statistical analysis. Addison-Wesley, USA
Cancho VG, Lachos VH, Ortega EMM (2010) A nonlinear regression model with skew-normal errors. Stat Pap 51:547–558
Article MATH MathSciNet Google Scholar
Conover WJ (1971) Practical nonparametric statistics. Wiley, New York
Google Scholar
Cook RD (1986) Assessment local influence (with discussion). J R Stat Soc Ser B 48:133–169
MATH Google Scholar
Cysneiros FJA, Paula GA, Galea M (2007) Heteroscedastic symmetrical linear models. Stat Prob lett 77:1084–1090
Article MATH MathSciNet Google Scholar
Cysneiros FJA, Vanegas LH (2008) Residuals and their statistical properties in symmetrical nonlinear models. Stat Prob lett 78:3269–3273
Article MATH MathSciNet Google Scholar
Cysneiros FJA, Cordeiro GM, Cysneiros AHMA (2010) Corrected maximum likelihood estimators in heteroscedastic symmetric nonlinear models. J Stat Comput Simul 80:451–461
Article MATH MathSciNet Google Scholar
Diaz-Garcia JA, Leiva V (2005) A new family of life distributions based on elliptically contoured distributions. J Stat Plan Inference 128:445–457
Article MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MATH MathSciNet Google Scholar
Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, London
Book MATH Google Scholar
Galea M, Paula GA, Cysneiros FJA (2005) On diagnostic in symmetrical nonlinear models. Stat Prob lett 73:459–467
Article MATH MathSciNet Google Scholar
Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models. Chapman and Hall, Boca Raton
Book MATH Google Scholar
Groeneveld RA, Meeden G (1984) Measuring skewness and kurtosis. Statistician 33:391–399
Article Google Scholar
Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, London
MATH Google Scholar
Ibacache-Pulgar G, Paula GA, Cysneiros FJA (2013) Semiparametric additive models under symmetric distributions. Test 22:103–121
Article MATH MathSciNet Google Scholar
Labra FV, Garay AM, Lachos VH, Ortega EMM (2012) Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions. J Stat Plan Inference 142:2149–2165
Lachos VH, Bandyopadhyay D, Garay AM (2011) Heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions. Stat Prob lett 81:1208–1217
Article MATH MathSciNet Google Scholar
Lange KL, Little RJ, Taylor J (1989) Robust statistical modelling using the $t$-distribution. J Am Stat Assoc 84:881–896
MathSciNet Google Scholar
Lemonte AJ, Cordeiro GM (2009) Birnbaum–Saunders nonlinear regression models. Comput Stat Data Anal 53:4441–4452
Leiva V, Riquelme M, Balakrishnan N, Sanhueza A (2008) Lifetime analysis based on the generalized Birnbaum–Saunders distribution. Comput Stat Data Anal 52:2079–2097
Li AP, Chen ZX, Xie FC (2012) Diagnostics analysis for heterogeneous log-Birnbaum–Saunders regression models. Stat Prob lett 82:1690–1698
Article MATH MathSciNet Google Scholar
Lin JG, Xie FC, Wei BC (2009) Statistical diagnostics for skew-$t$-normal nonlinear models. Commun Stat Simul Comput 38:2096–2110
Article MATH MathSciNet Google Scholar
Lucas A (1997) Robustness of the Student-$t$ based M-estimator. Commun Stat Theory Methods 26:1165–1182
Article MATH Google Scholar
Paula GA, Leiva V, Barros M, Liu S (2012) Robust statistical modeling using the Birnbaum–Saunders-$t$ distribution applied to insurance distribution. Appl Stoch Models Bus Ind 28:16–34
Article MATH MathSciNet Google Scholar
Poon WY, Poon YS (1999) Conformal normal curvature and assessment of local influence. J R Stat Soc Ser B 61:51–61
Article MATH MathSciNet Google Scholar
Rieck JR, Nedelman JR (1991) A log-linear model for the Birnbaum–Saunders distribution. Technometrics 33:51–60
MATH Google Scholar
Rigby RA, Stasinopoulos DM (2005) Generalized additive models for location, scale and shape. J Appl Stat 54:507–554
MATH MathSciNet Google Scholar
Rigby RA, Stasinopoulos DM (2006) Using the Box-cox $t$ distribution in GAMLSS to model skewness and kurtosis. Stat Model 6:209–229
Article MathSciNet Google Scholar
Rigby RA, Stasinopoulos DM (2007) Generalized additive models for location, scale and shape (GAMLSS) in R. J Stat Softw 23:1–46
Google Scholar
Rogers WH, Tukey JW (1972) Understanding some long-tailed symmetrical distributions. Stat Neerl 26:211–226
Article MATH MathSciNet Google Scholar
Vanegas LH, Paula GA (2014) ssym: Fitting semiparametric symmetric regression models. R package version 1.5. http://CRAN.R-project.org/package=ssym
Villegas C, Paula GA, Cysneiros FJA, Galea M (2013) Influence diagnostics in generalized symmetric linear models. Comput Stat Data Anal 59:161–170
West M (1987) On scale mixtures of normal distributions. Biometrika 74:646–648
Article MATH MathSciNet Google Scholar
Wood SN (2006) Generalized additive models: an introduction with R. Chapman and Hall, Boca Raton
Google Scholar

Download references

Acknowledgments

The authors are grateful to the editors and two anonymous referees for useful comments and suggestions. This research project was partially supported by CNPq, FAPESP and CAPES, Brazil.

Author information

Authors and Affiliations

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
Luis Hernando Vanegas & Gilberto A. Paula
Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia
Luis Hernando Vanegas

Authors

Luis Hernando Vanegas
View author publications
You can also search for this author in PubMed Google Scholar
Gilberto A. Paula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilberto A. Paula.

Appendices

Appendix A. Log-symmetric distributions

The absolutely continuous and positive random variable $T$ follows log-symmetric distribution with density generator $g(\cdot )$, scale parameter $\eta $ and power parameter $\phi $ if its density function can be written as

$$\begin{aligned} f_{_T}(t;\eta ,\phi ,g(\cdot ))=\frac{1}{\sqrt{\phi }\,t}\,g\!\left( \tilde{t}\,^2\right) ,\quad t>0, \end{aligned}$$

(7)

where $\tilde{t}=\log \left[ \!\left( {t}/{\eta }\right) ^{\frac{1}{\sqrt{\phi }}}\!\right] $, $\eta > 0$, $\phi >0$, and $g(u)>0$ for $u>0$ and $\int _{0}^{\infty }\!\!\!u^{\!-\!\frac{1}{2}}\!g(u)\partial u=1$. If this condition is satisfied then it is written $T\sim {\mathcal {LS}}(\eta ,\phi ,g(\cdot ))$. For example, using $g(u)\propto \exp (-u/2)$, $g(u)\propto (1+\frac{u}{\nu })^{-\frac{\nu +1}{2}}$, $g(u)\propto \exp [-\frac{1}{2}u^{\frac{1}{1+\varrho }}]$, $g(u)\propto \exp [-\varsigma \sqrt{1+u}]$, $g(u)\propto \mathrm{IGF}\left( \iota + \frac{1}{2},\frac{u}{2}\right) $, $g(u)\propto \mathrm{cosh}({u}^{\frac{1}{2}})\exp [-\frac{2}{\alpha ^2}\,\mathrm{sinh}^2({u}^{\frac{1}{2}})]$ (for $\phi =4$) and $g(u)\propto \mathrm{cosh}({u}^{\frac{1}{2}})\times [1 + 4\,\mathrm{sinh}^2({u}^{\frac{1}{2}})/\nu \alpha ^2]^{-\frac{\nu +1}{2}}$ (for $\phi =4$) as density generator one obtains a random variable $T$ following log-normal, log-Student-$t$ (having $\nu >0$ degrees of freedom), log-power-exponential (having shape parameter $-1<\varrho <1$), log-hyperbolic (having shape parameter $\varsigma >0$), log-slash (having shape parameter $\iota >0$), Birnbaum–Saunders (having shape parameter $\alpha >0$) and Birnbaum–Saunders-$t$ (having shape parameter $\alpha >0$ and $\nu >0$ degrees of freedom) distributions, respectively, in which $\mathrm{IGF}(a,x)=\frac{1}{x^a}\int \nolimits _{0}^{x}\exp (-t)t^{a-1}\partial t$ for $a>0$ and $x\ge 0$ is the incomplete gamma function. In fact, it is possible to show that the generalized Birnbaum–Saunders distribution (Diaz-Garcia and Leiva 2005; Leiva et al. 2008) belongs to the class of log-symmetric distributions with $g(u;\alpha ,\bar{\zeta })\propto \bar{g}\left( 4\,\mathrm{sinh}^2({u}^{\frac{1}{2}})/\alpha ^2;\bar{\zeta }\right) \times \mathrm{cosh}({u}^{\frac{1}{2}})$, where $\phi =4$ and $\bar{g}(u;\bar{\zeta })$ is the kernel that characterizes the generalized Birnbaum–Saunders distribution.

1.1 Some statistical properties

If $T\sim {\mathcal {LS}}(\eta ,\phi ,g(\cdot ))$ then it is possible to verify that

$Y=\log (T) \sim {\mathcal S}(\mu ,\phi ,g(\cdot ))$, i.e., the distribution of $Y$ belongs to the class of symmetric distributions (Fang et al. 1990) having density generator $g(\cdot )$, location parameter $\mu =\log (\eta )$ and dispersion parameter $\phi $.
${c}\,T \sim {\mathcal {LS}}(c\,\eta ,\phi ,g(\cdot ))$ for all constant $c>0$.
$T^{c}\sim {\mathcal {LS}}(\eta ^c,c^2\,\phi ,g(\cdot ))$ for all constant $c\ne 0$.
The median of $T$ is $\eta $.
The quantile function of $T$ is given by $\vartheta (q)=\eta \,\exp (\sqrt{\phi }\,Z^{(q)})$, where $Z^{(q)}$ is the $100(q)\,\%$ quantile of $Z=({Y-\mu })/{\sqrt{\phi }}\sim \mathcal {S}(0,1,g(\cdot ))$ distribution.
The skewness measure proposed by Groeneveld and Meeden (1984) is given by $\varkappa ^*({q})=[{\vartheta (q)\!+\!\vartheta (1\!-\!q)\!-\!2\vartheta (1/2)}]/[{\vartheta (1\!-\!q)\!-\!\vartheta (q)}]= \mathrm{cosech}\left( \sqrt{\phi }Z^{(q)}\right) -\mathrm{cotanh}\left( \sqrt{\phi }Z^{(q)}\right) $, where $q \in (0,1/2)$ and $\mathrm{cotanh}(\cdot )$ and $\mathrm{cosech}(\cdot )$ are the hyperbolic cotangent and cosecant functions, respectively. It is possible to verify that, for every $q \in (0,1/2)$ and for fixed $\zeta $: (i) $\varkappa ^*({q})>0$, (ii) $\varkappa ^*({q})$ does not depend on $\eta $, (iii) higher is the skewness of $T$ higher is the value of $\varkappa ({q})$, and (iv) $\varkappa ^*({q})$ is a monotone increasing function of $\phi $. As a consequence, the parameter $\phi $ may be interpreted as the skewness of the $T$ distribution (for fixed $\zeta $), which is always positive.

Appendix B. ${v}(z_k)$, $d_g(\zeta )$ and $f_g(\zeta )$ expressions

For example, when it is assumed that $\xi $ follows log-normal, log-Student-$t$, log-hyperbolic, log-power-exponential and Birnbaum–Saunders distributions one obtains ${v}(z_k)=1$, ${v}(z_k)=(\nu +1)/(\nu +z^2_k)$, ${v}(z_k)=\varsigma /\sqrt{1+z^2_k}$, ${v}(z_k)=|z_k|^{-\frac{2\varrho }{\varrho +1}}/(1+\varrho )$ and ${v}(z_k)=4\, \mathrm{sinh}(z_k)\mathrm{cosh}(z_k) /\alpha ^2z_k- \mathrm{tanh}(z_k)/z_k$, respectively. Similarly, the quantity $d_g(\zeta )$ is equal to 1, $(\nu +1)/(\nu +3)$, $\{2^{1-\varrho }\Gamma [(3-\varrho )/2]\}/\{(1+\varrho )^2\Gamma [(1+\varrho )/2]\}$ and $2+\frac{4}{\alpha ^2}-\frac{\sqrt{2\pi }}{\alpha }\left\{ 1-\mathrm{erf}\left( \frac{\sqrt{2}}{\alpha }\right) \right\} \mathrm{exp}\left( \frac{2}{\alpha ^2}\right) $ when it is assumed that $\xi $ follows log-normal, log-Student-$t$, log-power-exponential and Birnbaum–Saunders distributions, respectively, where $\Gamma (\cdot )$ represents the gamma function and $\mathrm{erf}(x)=(2/\sqrt{\pi })\int _0^x e^{-t^2}dt$. Also, the quantity $f_g(\zeta )$ is equal to 3, $3(\nu +1)/(\nu +3)$ and $(\varrho +3)/(\varrho +1)$ when $\xi $ follows log-normal, log-Student-$t$ and log-power-exponential distributions, respectively.

Appendix C. Expressions of $d_k(\hat{\varvec{\mu }}|\hat{\varvec{\phi }})$ and $d_k(\hat{\varvec{\phi }}|\hat{\varvec{\mu }})$

We list in the Tables 5 and 6 the expressions of the individual contribution to the deviances for some log-symmetric distributions.

Appendix D. Observed information matrix of $\varvec{\theta }$

The observed information matrix of $\varvec{\theta }$ becomes $-\ddot{\mathbf{L}}_{_{\theta \theta }}$, where

$$\begin{aligned} \ddot{\mathbf{L}}_{_{\theta \theta }}=\frac{\partial ^2 \ell ^*({\varvec{\theta }})}{\partial \varvec{\theta }\partial \varvec{\theta }^{\top }}= \begin{bmatrix} \ddot{\mathbf{L}}_{_{\beta \beta }}&\quad \ddot{\mathbf{L}}_{_{\beta \gamma }}&\ddot{\mathbf{L}}_{_{\beta \mathrm{f}}}\\ \ddot{\mathbf{L}}_{_{\gamma \beta }}&\quad \ddot{\mathbf{L}}_{_{\gamma \gamma }}&\quad \ddot{\mathbf{L}}_{_{\gamma \mathrm{f}}}\\ \ddot{\mathbf{L}}_{_{\mathrm{f}\beta }}&\quad \ddot{\mathbf{L}}_{_{\mathrm{f} \gamma }}&\quad \ddot{\mathbf{L}}_{_{\mathrm{f}\, \mathrm{f}}} \end{bmatrix}, \end{aligned}$$

with

$$\begin{aligned} \ddot{\mathbf{L}}_{_{\beta \beta }}&= -\mathbf{D}_{_{\beta }}^{\top }\varvec{\Omega }^{-1}\mathbf{D}_{({a})}\mathbf{D}_{_{\beta }} + \sum _{k=1}^{n}\frac{z_k}{\sqrt{\phi _k}}{v}_k\mathbf{D}_{_{\beta \beta }}^{(k)},\quad \ddot{\mathbf{L}}_{_{\gamma \gamma }}=-\mathbf{W}^{\top }\mathbf{D}_{({c})}\mathbf{W},\\ \ddot{\mathbf{L}}_{_{{f}\, {f}}}&= -\mathbf{N}^{\top }\mathbf{D}_{(\mathrm{c})}\mathbf{N}-\lambda \mathbf{M}, \quad \ddot{\mathbf{L}}_{_{\gamma \beta }}^{\top }=\ddot{\mathbf{L}}_{_{\beta \gamma }}=-\mathbf{D}_{_{\beta }}^{\top }\varvec{\Omega }^{-\frac{1}{2}}\mathbf{D}_{(\mathrm{h})}\mathbf{W},\\ \ddot{\mathbf{L}}_{_{{f}\beta }}^{\top }&= \ddot{\mathbf{L}}_{_{\beta \mathrm{f}}}=-\mathbf{D}_{_{\beta }}^{\top }\varvec{\Omega }^{-\frac{1}{2}}\mathbf{D}_{({h})}\mathbf{N}\quad \text {and}\quad \ddot{\mathbf{L}}^{\top }_{_{\mathrm{f}\gamma }}=\ddot{\mathbf{L}}_{_{\gamma {f} }}=-\mathbf{W}^{\top }\mathbf{D}_{(\mathrm{c})}\mathbf{N}, \end{aligned}$$

in which $\mathbf{D}_{_{\beta \beta }}^{(k)}\!\!=\!\!\Bigl [{\partial ^2\mu _k}/{\partial \beta _i\partial \beta _j}\Bigr ]$ for $i,j=1,\ldots ,p$, $\mathbf{D}_{(a)}\!=\!\mathrm{diag}\{\mathrm{a}_1,\ldots ,{a}_n\}$, $\mathbf{D}_{(h)}\!=\!\mathrm{diag}\{\mathrm{h}_1,\ldots ,{h}_n\}$ and $\mathbf{D}_{(c)}\!=\!\mathrm{diag}\{\mathrm{c}_1,\ldots ,{c}_n\}$, being ${a}_k\!=\!{v}_k'z_k\! +\! {v}_k$ (${v}_k'$ is the first derivative of ${v}_k$ regarding $z_k$), ${h}_k\!=\! z_k({{v}_k'z_k}\!+\!{2\mathrm{v}_k})/2$ and ${c}_k=z_k {h}_k/2$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vanegas , L.H., Paula, G.A. A semiparametric approach for joint modeling of median and skewness. TEST 24, 110–135 (2015). https://doi.org/10.1007/s11749-014-0401-7

Download citation

Received: 29 May 2013
Accepted: 25 August 2014
Published: 22 October 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11749-014-0401-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semiparametric approach for joint modeling of median and skewness

Abstract

Access this article

Similar content being viewed by others

A stochastic approximation ECME algorithm to semi-parametric scale mixtures of centred skew normal regression models

The unit generalized half-normal quantile regression model: formulation, estimation, diagnostics, and numerical applications

Normal-Power-Logistic Distribution: Properties and Application in Generalized Linear Model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A. Log-symmetric distributions

1.1 Some statistical properties

Appendix B. \({v}(z_k)\), \(d_g(\zeta )\) and \(f_g(\zeta )\) expressions

Appendix C. Expressions of \(d_k(\hat{\varvec{\mu }}|\hat{\varvec{\phi }})\) and \(d_k(\hat{\varvec{\phi }}|\hat{\varvec{\mu }})\)

Appendix D. Observed information matrix of \(\varvec{\theta }\)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A semiparametric approach for joint modeling of median and skewness

Abstract

Access this article

Similar content being viewed by others

A stochastic approximation ECME algorithm to semi-parametric scale mixtures of centred skew normal regression models

The unit generalized half-normal quantile regression model: formulation, estimation, diagnostics, and numerical applications

Normal-Power-Logistic Distribution: Properties and Application in Generalized Linear Model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A. Log-symmetric distributions

1.1 Some statistical properties

Appendix B. \({v}(z_k)\), \(d_g(\zeta )\) and \(f_g(\zeta )\) expressions

Appendix C. Expressions of \(d_k(\hat{\varvec{\mu }}|\hat{\varvec{\phi }})\) and \(d_k(\hat{\varvec{\phi }}|\hat{\varvec{\mu }})\)

Appendix D. Observed information matrix of \(\varvec{\theta }\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation