Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Ferreira, Clécio S.; Lachos, Víctor H.; Bolfarine, Heleno

doi:10.1007/s10182-016-0266-z

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Original Paper
Published: 19 January 2016

Volume 100, pages 421–441, (2016)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Clécio S. Ferreira¹,
Víctor H. Lachos² &
Heleno Bolfarine³

850 Accesses
15 Citations
Explore all metrics

Abstract

Scale mixtures of normal distributions are often used as a challenging class for statistical analysis of symmetrical data. Recently, Ferreira et al. (Stat Methodol 8:154–171, 2011) defined the univariate skew scale mixtures of normal distributions that offer much needed flexibility by combining both skewness with heavy tails. In this paper, we develop a multivariate version of the skew scale mixtures of normal distributions, with emphasis on the multivariate skew-Student-t, skew-slash and skew-contaminated normal distributions. The main virtue of the members of this family of distributions is that they are easy to simulate from and they also supply genuine expectation/conditional maximisation either algorithms for maximum likelihood estimation. The observed information matrix is derived analytically to account for standard errors. Results obtained from real and simulated datasets are reported to illustrate the usefulness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

Mixture Models: Latent Profile and Latent Class Analysis

References

Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. Ser. B 36, 99–102 (1974)
MathSciNet MATH Google Scholar
Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Skew-normal linear mixed models. J. Data Sci. 3, 415–438 (2005)
MATH Google Scholar
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
MathSciNet MATH Google Scholar
Azzalini, A., Capitanio, A.: Distributions generated and perturbation of symmetry with emphasis on the multivariate skew-$t$ distribution. J. R. Stat. Soc. Ser. B 61, 367–389 (2003)
Article MathSciNet MATH Google Scholar
Azzalini, A., Dalla-Valle, A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Article MathSciNet MATH Google Scholar
Azzalini, A., Capello, T.D., Kotz, S.: Log-skew-normal and log-skew-$t$ distributions as models for family income data. J. Income Distrib. 11, 13–21 (2003)
Google Scholar
Bolfarine, H., Lachos, V.: Skew probit error-in-variables models. Stat. Methodol. 3, 1–12 (2007)
Article MathSciNet MATH Google Scholar
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Article MathSciNet MATH Google Scholar
Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56(1), 126–142 (2012)
Article MathSciNet MATH Google Scholar
Cabral, C.R.B., Lachos, V.H., Zeller, C.B.: Multivariate measurement error models using finite mixtures of skew-Student $t$ distributions. J. Multivar. Anal. 124, 179–198 (2014)
Article MathSciNet MATH Google Scholar
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, Hoboken (1994)
Book MATH Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Ferreira, C.S., Bolfarine, H., Lachos, V.H.: Skew scale mixtures of normal distributions: properties and estimation. Stat. Methodol. 8, 154–171 (2011)
Article MathSciNet MATH Google Scholar
Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the normal distribution function. Environmetrics 18, 395–407 (2007)
Article MathSciNet Google Scholar
Harville, D.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997)
Book MATH Google Scholar
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 1. Wiley, New York (1994)
MATH Google Scholar
Lachos, V.H., Vilca, L.F., Bolfarine, H., Ghosh, P.: Robust multivariate measurement error models with scale mixtures of skew-normal distributions. Statistics 44(6), 541–556 (2009)
Article Google Scholar
Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew-normal independent linear mixed models. Stat. Sin. 20(1), 303 (2010)
MathSciNet MATH Google Scholar
Lange, K.L., Sinsheimer, J.S.: Normal/independent distributions and their applications in robust regression. J. Comput. Graph. Stat. 2, 175–198 (1993)
MathSciNet Google Scholar
Lange, K.L., Little, R., Taylor, J.: Robust statistical modeling using $t$ distribution. J. Am. Stat. Assoc. 84, 881–896 (1989)
MathSciNet Google Scholar
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-$t$-normal distribution. Stat. Comput. 24, 531–546 (2013)
Article MathSciNet MATH Google Scholar
Little, R.J.A.: Robust estimation of the mean and covariance matrix from data with missing values. Appl. Stat. 37, 23–38 (1988)
Article MathSciNet MATH Google Scholar
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80, 267–278 (1994)
MathSciNet MATH Google Scholar
Osorio, F., Paula, G.A., Galea, M.: Assessment of local influence in elliptical linear models with longitudinal structure. Comput. Stat. Data Anal. 51(9), 4354–4368 (2007)
Article MathSciNet MATH Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). http://www.R-project.org/
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Article MathSciNet MATH Google Scholar
Wang, J., Boyer, J., Genton, M.: A skew-symmetric representation of multivariate distributions. Stat. Sin. 14, 1259–1270 (2004)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank the editor, associate editor and two referees whose constructive comments led to an improved presentation of the paper. C.S. acknowledges support from FAPEMIG (Minas Gerais State Foundation for Research Development), Grant CEX APQ 01845/14. V.H. acknowledges support from CNPq-Brazil (Grant 305054/2011-2) and FAPESP-Brazil (Grant 2014/02938-9).

Author information

Authors and Affiliations

Department of Statistics, Federal University of Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
Clécio S. Ferreira
Departamento de Estatística, Universidade Estadual de Campinas, Cidade Universitaria “Zeferino Vaz”, Campinas, São Paulo, Brazil
Víctor H. Lachos
Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
Heleno Bolfarine

Authors

Clécio S. Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Víctor H. Lachos
View author publications
You can also search for this author in PubMed Google Scholar
Heleno Bolfarine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Víctor H. Lachos.

Appendices

Appendix 1: Details of the observed information matrix

Considering ${\varvec{\alpha }}=\mathrm{Vech}({\mathbf {B}})$, where ${\varvec{\varSigma }}^{1/2}={\mathbf {B}}={\mathbf {B}}({\varvec{\alpha }})$, the first and second derivatives of $\log |{\varvec{\varSigma }}|$, $A_i$ and $d_i$ are obtained. The notation used is that of Sect. 2 and for a p-dimensional vector ${\varvec{\rho }}=(\rho _1\ldots ,\rho _p)^{\top }$, we will use the notation $\dot{{\mathbf {B}}}_r=\partial {{\mathbf {B}}({\varvec{\alpha }})}/\partial {\alpha _r}$, with $r=1,2,\ldots ,p(p+1)/2$. Thus,

${\varvec{\varSigma }}$
$$\begin{aligned} \frac{\partial ^2 \log {|{\varvec{\varSigma }}|}}{\partial \alpha _k\partial \alpha _s}=-2 \text {tr}({\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_{s}{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_{k}), \end{aligned}$$
$A_i$
$$\begin{aligned} \frac{\partial A_i}{\partial {\varvec{\mu }}}= & {} -{\mathbf {B}}^{-1}{\varvec{\lambda }},\quad \frac{\partial A_i}{\partial \alpha _{k}}=-{\varvec{\lambda }}^{\top }{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}),\quad \frac{\partial A_i}{\partial {\varvec{\lambda }}}={\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}), \\ \frac{\partial ^2 A_i}{\partial {\varvec{\mu }}\partial {\varvec{\mu }}^{\top }}= & {} {\mathbf {0}},\quad \frac{\partial ^2 A_i}{\partial {\varvec{\mu }}\partial \alpha _k}={\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}{\varvec{\lambda }},\quad \frac{\partial ^2 A_i}{\partial {\varvec{\mu }}\partial {\varvec{\lambda }}^{\top }}=-{\mathbf {B}}^{-1},\\ \frac{\partial ^2 A_i}{\partial \alpha _k\partial \alpha _s}= & {} -{\varvec{\lambda }}^{\top }{\mathbf {B}}^{-1} [\dot{{\mathbf {B}}}_s{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k+\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_s]{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}),\\ \frac{\partial ^2 A_i}{\partial \alpha _k\partial {\varvec{\lambda }}}= & {} -{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}),\quad \frac{\partial ^2 A_i}{\partial {\varvec{\lambda }}\partial {\varvec{\lambda }}^{\top }}={\mathbf {0}}, \end{aligned}$$
$d_i$
$$\begin{aligned} \frac{\partial d_i}{\partial {\varvec{\mu }}}= & {} -2{\mathbf {B}}^{-2}({\mathbf {y}}_i-{\varvec{\mu }}),\quad \frac{\partial d_i}{\partial \alpha _k}=-({\mathbf {y}}_i-{\varvec{\mu }})^{\top }{\mathbf {B}}^{-1} [\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}+{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k]{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}),\\ \frac{\partial d_i}{\partial {\varvec{\lambda }}}= & {} {\mathbf {0}}, \frac{\partial ^2 d_i}{\partial {\varvec{\mu }}\partial {\varvec{\mu }}^{\top }}=2{\mathbf {B}}^{-2},\quad \frac{\partial ^2 d_i}{\partial {\varvec{\mu }}\partial \alpha _k}=2{\mathbf {B}}^{-1} [\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}+{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k]{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}),\\ \frac{\partial ^2 d_i}{\partial {\varvec{\mu }}\partial {\varvec{\lambda }}^{\top }}= & {} {\mathbf {0}},\quad \frac{\partial ^2 d_i}{\partial \alpha _k\partial {\varvec{\lambda }}^{\top }}={\mathbf {0}},\quad \frac{\partial ^2 d_i}{\partial {\varvec{\lambda }}\partial {\varvec{\lambda }}^{\top }}={\mathbf {0}},\\ \frac{\partial ^2 d_i}{\partial \alpha _k\partial \alpha _s}= & {} ({\mathbf {y}}_i-{\varvec{\mu }})^{\top }{\mathbf {B}}^{-1} [\dot{{\mathbf {B}}}_s{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}+\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_s{\mathbf {B}}^{-1}+\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-2}\dot{{\mathbf {B}}}_s+\dot{{\mathbf {B}}}_s{\mathbf {B}}^{-2}\dot{{\mathbf {B}}}_k\\&+{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_s{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k+{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_k{\mathbf {B}}^{-1}\dot{{\mathbf {B}}}_s]{\mathbf {B}}^{-1}({\mathbf {y}}_i-{\varvec{\mu }}). \end{aligned}$$

Appendix 2: Joint, conditional and marginal distributions of $({\mathbf {Y}},U,T)$

Note first that from (7), it follows that

$$\begin{aligned} \begin{array}{rcl} {\mathbf {Y}}|T=t, U= u&{}\sim &{} N_p({\varvec{\mu }}+ \frac{t}{u^{1/2}}{\varvec{\varSigma }}^{1/2}{\varvec{\delta }}_u,\frac{1}{u}{\varvec{\varSigma }}^{1/2}({\mathbf {I}}_p+{\varvec{\lambda }}_u{{\varvec{\lambda }}_u}^\top )^{-1}{\varvec{\varSigma }}^{1/2}),\\ U&{} \sim &{}H({\varvec{\tau }}),\quad T\sim TN(0,1;(0,+\infty )),\end{array} \end{aligned}$$

(23)

with U and T independent, ${\varvec{\delta }}_u=\frac{{\varvec{\lambda }}}{\sqrt{u+{\varvec{\lambda }}^{\top }{\varvec{\lambda }}}}$, ${\varvec{\lambda }}_u={\varvec{\lambda }}/\sqrt{u}$.

Using some results given in Lachos et al. (2010), it follows that the joint distribution of $({\mathbf {Y}},U,T)$ is given by

$$\begin{aligned} f({\mathbf {y}},u,t)= & {} 2\phi _p\left( {\mathbf {y}}|{\varvec{\mu }}+{\mathbf {A}}t,{\varvec{\varSigma }}_a\right) \phi _1(t|0,1)h(u;{\varvec{\tau }})\\= & {} 2\phi _p({\mathbf {y}}|{\varvec{\mu }},{\varvec{\varSigma }}_a+{\mathbf {A}}{\mathbf {A}}^\top )\phi _1(t|\varLambda {\mathbf {A}}^\top {\varvec{\varSigma }}_a^{-1}({\mathbf {y}}-{\varvec{\mu }}),\varLambda )h(u;{\varvec{\tau }}),\\&{\mathbf {y}}\in {\mathbb {R}}^p,\;t >0,\;u>0, \end{aligned}$$

where ${\mathbf {A}}=\frac{{\varvec{\varSigma }}^{1/2}{\varvec{\delta }}_u}{u^{1/2}}$, ${\varvec{\varSigma }}_a=\frac{1}{u}{\varvec{\varSigma }}^{1/2}({\mathbf {I}}_p+{\varvec{\lambda }}_u{\varvec{\lambda }}_u^\top )^{-1}{\varvec{\varSigma }}^{1/2}$ and $\varLambda =(1+{\mathbf {A}}^\top {\varvec{\varSigma }}_a^{-1}{\mathbf {A}})^{-1}$. Using the results given in Harville (1997), and after some algebraic manipulations, it follows that ${\varvec{\varSigma }}_a+{\mathbf {A}}{\mathbf {A}}^\top =\frac{1}{u}{\varvec{\varSigma }}$, $\varLambda =\frac{u}{u+{\varvec{\lambda }}^\top {\varvec{\lambda }}}$ and $\varLambda {\mathbf {A}}^\top {\varvec{\varSigma }}_a^{-1}=\varLambda ^{1/2}{\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}$.

Thus, the marginal distribution of ${\mathbf {Y}}\sim \mathrm{SSMN}_p({\varvec{\mu }},{\varvec{\varSigma }},{\varvec{\lambda }};H)$ is given by

$$\begin{aligned} f({\mathbf {y}})= & {} 2\int _0^{+\infty }\int _0^{+\infty }\phi _p \left( {\mathbf {y}}|{\varvec{\mu }},\frac{{\varvec{\varSigma }}}{u}\right) \phi _1 (t|\varLambda {\mathbf {A}}^\top {\varvec{\varSigma }}_a^{-1}({\mathbf {y}}-{\varvec{\mu }}),\varLambda )h(u;{\varvec{\tau }})\mathrm{d}t\mathrm{d}u\\= & {} 2\int _0^{+\infty }\phi _p\left( {\mathbf {y}}|{\varvec{\mu }},\frac{{\varvec{\varSigma }}}{u}\right) h(u;{\varvec{\tau }})\int _0^{+\infty }\phi _1(t|\varLambda ^{1/2}{\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }}),\varLambda )\mathrm{d}t\mathrm{d}u\\= & {} 2\int _0^{+\infty }\int _0^{+\infty }\phi _p\left( {\mathbf {y}}|{\varvec{\mu }},\frac{{\varvec{\varSigma }}}{u}\right) h(u;{\varvec{\tau }})\phi _1(t|{\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }}),1)\mathrm{d}t\mathrm{d}u\\= & {} 2\int _0^{+\infty }\phi _p\left( {\mathbf {y}}|{\varvec{\mu }},\frac{{\varvec{\varSigma }}}{u}\right) h(u;{\varvec{\tau }})\mathrm{d}u\varPhi _1({\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }})). \end{aligned}$$

Then the joint distribution of $({\mathbf {Y}},T)$ is given by

$$\begin{aligned} f({\mathbf {y}},t)=2f_0({\mathbf {y}}|{\varvec{\mu }},{\varvec{\varSigma }})\phi _1 (t|{\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }}),1),\quad {\mathbf {y}}\in {\mathbb {R}}^p,\;t>0, \end{aligned}$$

(24)

and

$$\begin{aligned} f(t|{\mathbf {y}})=\frac{\phi _1(t|{\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2} ({\mathbf {y}}-{\varvec{\mu }}),1)}{\varPhi _1({\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }}))}, \end{aligned}$$

(25)

so that, $T|{\mathbf {Y}}={\mathbf {y}}\sim TN({\varvec{\lambda }}^\top {\varvec{\varSigma }}^{-1/2}({\mathbf {y}}-{\varvec{\mu }}),1;(0,+\infty ))$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferreira, C.S., Lachos, V.H. & Bolfarine, H. Likelihood-based inference for multivariate skew scale mixtures of normal distributions. AStA Adv Stat Anal 100, 421–441 (2016). https://doi.org/10.1007/s10182-016-0266-z

Download citation

Received: 29 March 2015
Accepted: 05 January 2016
Published: 19 January 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10182-016-0266-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Abstract

Access this article

Similar content being viewed by others

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Mixture Models: Latent Profile and Latent Class Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Details of the observed information matrix

Appendix 2: Joint, conditional and marginal distributions of \(({\mathbf {Y}},U,T)\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Likelihood-based inference for multivariate skew scale mixtures of normal distributions

Abstract

Access this article

Similar content being viewed by others

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Mixture Models: Latent Profile and Latent Class Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Details of the observed information matrix

Appendix 2: Joint, conditional and marginal distributions of \(({\mathbf {Y}},U,T)\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation