Abstract
We introduce a multivariate version of the modified skew-normal distribution, which contains the multivariate normal distribution as a special case. Unlike the Azzalini multivariate skew-normal distribution, this new distribution has a nonsingular Fisher information matrix when the skewness parameters are all zero, and its profile log-likelihood of the skewness parameters is always a non-monotonic function. We study some basic properties of the proposed family of distributions and present an expectation-maximization (EM) algorithm for parameter estimation that we validate through simulation studies. Finally, we apply the proposed model to the univariate frontier data and to a trivariate wind speed data, and compare its performance with the Azzalini skew-normal model.
Similar content being viewed by others
References
Adcock CJ (2004) Capital asset pricing for UK stocks under the multivariate skew-normal distribution. In: Genton MG (ed) Skew elliptical distributions and their applications: a journey beyond normality. Chapman and Hall, London
Adcock CJ (2005) Exploiting skewness to build an optimal hedge fund with a currency overlay. Eur J Financ 11(5):445–462
Adcock, CJ, Shutes K (2001) Portfolio selection based on the multivariate skew normal distribution. In: A. Skulimowski, Ed., Financial Modelling, Progress & Business Publishers, Krakow, pp 167–177
Arellano-Valle RB, Azzalini A (2008) The centred parametrization for the multivariate skew-normal distribution. J Multivar Anal 99(7):1362–1382
Arellano-Valle RB, Genton MG (2010) An invariance property of quadratic forms in random vectors with a selection distribution, with application to sample variogram and covariogram estimators. Ann Inst Stat Math 62(2):363–381
Arellano-Valle RB, Gómez HW, Quintana FA (2004) A new class of skew-normal distributions. Commun Stat Theory Methods 33(7):1465–1480
Arellano-Valle RB, Bolfarine H, Lachos V (2005) Skew-normal linear mixed models. J Data Sci 3(4):415–438
Arellano-Valle RB, Contreras-Reyes JE, Stehlík M (2017) Generalized skew-normal negentropy and its application to fish condition factor time series. Entropy 19(10):528
Arrué J, Arellano-Valle RB, Gómez HW (2016) Bias reduction of maximum likelihood estimates for a modified skew-normal distribution. J Stat Comput Simul 86(15):2967–2984
Arrué J, Arellano-Valle RB, Gómez HW, Leiva V (2020) On a new type of Birnbaum–Saunders models and its inference and application to fatigue data. J Appl Stat 47(13–15):2690–2710
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32(2):159–188
Azzalini A, Arellano-Valle RB (2013) Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J Stat Plan Inference 143(2):419–433
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61(3):579–602
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc Ser B Stat Methodol 65(2):367–389
Azzalini A, Capitanio A (2014) The skew-normal and related families, vol 3. Cambridge University Press, Cambridge
Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83(4):715–726
Bayes CL, Branco MD (2007) Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Braz J Probab Stat 21(2):141–163
Chiogna M (2005) A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Stat Methods Appl 14(3):331–341
Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. CRC Press, Boca Raton
Genton MG, Loperfido NM (2005) Generalized skew-elliptical distributions and their quadratic forms. Ann Inst Stat Math 57(2):389–401
Ghosh P, Branco MD, Chakraborty H (2007) Bivariate random effect model using skew-normal distribution with application to HIV-RNA. Stat Med 26(6):1255–1267
Gómez HW, Venegas O, Bolfarine H (2007) Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18(4):395–407
Hallin M, Paindaveine D (2006) Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann Stat 34(6):2707–2756
Henze N (1986) A probabilistic representation of the ‘skew-normal’ distribution. Scand J Stat 13(4):271–275
Jin L, Xu W, Zhu L, Zhu L (2016) Penalized maximum likelihood estimator for skew normal mixtures. arXiv:1608.01513
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20(1):303–322
Ley C, Paindaveine D (2010) On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical models. Metron 68(3):235–250
Lin TI, Lee JC, Yen SY (2007) Finite mixture modelling using the skew normal distribution. Stat Sin 17(3):909–927
Magnus JR, Neudecker H (1979) The commutation matrix: some properties and applications. Ann Stat 7(2):381–394
Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3):519–530
McNeil AJ, Frey R, Embrechts P (2015) Quantitative risk management: concepts, techniques and tools-revised edition. Princeton University Press, Princeton
Pewsey A (2000) Problems of inference for Azzalini’s skew-normal distribution. J Appl Stat 27(7):859–870
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Rotnitzky A, Cox DR, Bottai M, Robins J (2000) Likelihood-based inference with singular information matrix. Bernoulli 6(2):243–284
Sartori N (2006) Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. J Stat Plan Inference 136(12):4259–4275
Yip CMA (2018) Statistical characteristics and mapping of near-surface and elevated wind resources in the middle east. Ph.D. Thesis, KAUST
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by King Abdullah University of Science and Technology (KAUST).
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A
We discuss the benefits of using the alternative parameterization for defining the \(\mathcal{S}\mathcal{N}\) distribution compared to the \(\mathcal {ASN}\) distribution. We start with a preliminary stochastic representation of the \(\mathcal{S}\mathcal{N}\) distribution, from which we can derive most of its main basic properties.
Proposition 10
(Stochastic representation of \(\mathcal{S}\mathcal{N}\) distribution) If \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), then \(\varvec{X} \,{\buildrel d \over =}\, \varvec{\xi }+T \varvec{\eta }+ \varvec{V}\), where T and \(\varvec{V}\) are independently distributed, with half-normal T denoted by \(T \sim \mathcal{H}\mathcal{N}(0,1)\), and \(\varvec{V} \sim {\mathcal {N}}_p(\varvec{0},\varvec{\Psi })\).
Proof
Let \(\tilde{\varvec{X}} = \varvec{\xi }+T \varvec{\eta }+ \varvec{V}\). Since the conditional mpdf of \(\tilde{\varvec{X}}| T = t\) is \( f_{\tilde{\varvec{X}}|T = t}(\varvec{x}) = \phi _p(\varvec{x};\varvec{\xi }+ t \varvec{\eta },\varvec{\Psi })\) for \(t>0 \), and T has marginal density \(f_T(t) = 2\phi (t)I_{(t>0)}\), then for the mpdf of \(\tilde{\varvec{X}}\) we have
where we have used the identity (see Lemma 2 in Arellano-Valle et al. (2005))
Finally, using the following result:
it can be easily established that \(\tilde{\varvec{X}} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\). \(\square \)
Further basic properties As immediate consequences of the above stochastic representation of a random vector \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), we have the following basic properties:
1) Expectation and covariance: The mean vector and covariance matrix of \(\varvec{X}\) are
2) Distribution of an affine transformation: For any fixed vector \(\varvec{a} \in {\mathbb {R}}^q\) and any fixed matrix \(\varvec{B} \in {\mathbb {R}}^{q \times p}\) of full row rank and \(q \le p\), we have \( \varvec{a} + \varvec{B} \varvec{X} \,{\buildrel d \over =}\, \varvec{a} + \varvec{B} \varvec{\xi }+ T \varvec{B} \varvec{\eta }+ \varvec{B} \varvec{V}\sim \mathcal{S}\mathcal{N}_q(\varvec{a} + \varvec{B} \varvec{\xi }, \varvec{B} \varvec{\Psi }\varvec{B}^\top , \varvec{B} \varvec{\eta })\), since, by assumption, T and \(\varvec{V}\) are independently distributed, with \(T \sim \mathcal{H}\mathcal{N}(0,1)\) and \(\varvec{V} \sim {\mathcal {N}}_p(\varvec{0},\varvec{\Psi })\).
3) Marginal distributions: Partition now \(\varvec{X}\) in two sub-vectors of sizes \(p_1\) and \(p_2\) such that \(p_1+p_2 = p\), with corresponding partitions of the parameters in blocks of matching sizes, as follows
Thus, by using property 2) with \(\varvec{a}=\varvec{0}\) and \(\varvec{B} = (I _{p_1}, \varvec{0})\) for \(\varvec{X}_1\) and \(\varvec{B} = (\varvec{0},I _{p_2} )\) for \(\varvec{X}_2\) it follows for their respective marginals that \(\varvec{X}_1 \sim \mathcal{S}\mathcal{N}_{p_1} (\varvec{\xi }_1,\varvec{\Psi }_{11}, \varvec{\eta }_1)\) and \(\varvec{X}_2 \sim \mathcal{S}\mathcal{N}_{p_2} (\varvec{\xi }_2,\varvec{\Psi }_{22}, \varvec{\eta }_2)\).
4) Moment generating function: The multivariate moment generating function (mmgf) of the \(\mathcal{S}\mathcal{N}\) distribution can be derived in closed form. We present the mmgf of the of \(\mathcal{S}\mathcal{N}\) distribution in the next proposition.
Proposition 11
The mmgf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is
where \(\varvec{\Omega }= \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \).
Proof
The mpdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) in (15) can be rewritten as
where \(\phi _p(\varvec{z})=\phi _p(\varvec{z}; \varvec{0}, I _p)\), \(\varvec{\Omega }= \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \), \(\varvec{\gamma }= (1- \varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2}\varvec{\Omega }^{-1/2} \varvec{\eta }\), and, conversely, we have \(\varvec{\eta }=(1 + \varvec{\gamma }^\top \varvec{\gamma })^{-1/2}\varvec{\Omega }^{1/2}\varvec{\gamma }\) and \(\varvec{\Psi }= \varvec{\Omega }-(1+\varvec{\gamma }^\top \varvec{\gamma })^{-1}\varvec{\Omega }^{1/2} \varvec{\gamma }\varvec{\gamma }^\top \varvec{\Omega }^{1/2}\). From Eq. (18) and by using the change of variable \(\varvec{z}=\varvec{\Omega }^{-1/2}(\varvec{x}-\varvec{\xi })\) we have that the mmgf of \(\varvec{X}\), \(M_{\varvec{X}}(\varvec{t})={\mathbb {E}}\{\exp (\varvec{t}^\top \varvec{X}) \}\), is given by
\(\square \)
5) Cumulative distribution function: In the next proposition we present the exact functional form of the multivariate cumulative distribution function (mcdf) of the \(\mathcal{S}\mathcal{N}\) distribution.
Proposition 12
The mcdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is
where, \(\varvec{\xi }_* = (\varvec{\xi }^\top ,0)^\top \) and \(\varvec{\Omega }_* = \begin{pmatrix}\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top &{} -\varvec{\eta }\\ -\varvec{\eta }^\top &{} 1 \end{pmatrix}\).
Proof
The mpdf of \(\varvec{X}\) is
where we use the change of variable \(z_0 = u - \varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })/\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}\). Now, from the marginal-conditional factorization of a \((p+1)\)-variate normal mpdf we can write
where \(\varvec{z}_* = (\varvec{z}^\top ,z_0)^\top \) and \(\varvec{\Omega }_{**} = \begin{pmatrix}\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top &{} - \sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \varvec{\eta }\\ - \sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \varvec{\eta }^\top &{} 1 \end{pmatrix}\). Moreover, we have \(\varvec{\Omega }_{**} = \varvec{D}_{*} \varvec{\Omega }_{*} \varvec{D}_*\), with \(D_* = \text {diag} \left( {\textbf {I}}_p,\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \right) \). Then, the mcdf of \(\varvec{X}\) is
But, since \(\varvec{x}_* - \varvec{\xi }_* = ((\varvec{x} - \varvec{\xi })^\top , 0)^\top \), then \(\varvec{D}_*^{-1} (\varvec{x}_* - \varvec{\xi }_*) = \varvec{x}_* - \varvec{\xi }_*\) and so
\(\square \)
Behavior of the likelihood function In the following two results, we show that the alternative parameterization used in (3) for defining the \(\mathcal{S}\mathcal{N}\) mpdf fixes the problem of the infinite maximum likelihood estimate of the skewness parameter, unlike the original parameterization used in (2) for defining the \(\mathcal {ASN}\) mpdf, but it also fails to resolve the problem of the singular Fisher information matrix. In fact, let \(\varvec{x}_1,\ldots ,\varvec{x}_n\) be an observed random sample from \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\). The corresponding likelihood function for \((\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) is
For fixed \(\varvec{\xi }\) and \(\varvec{\Psi }\), (19) becomes the profile likelihood function of \(\varvec{\eta }\), which we denote by \(L(\varvec{\eta })\), \(\varvec{\eta }\in {\mathbb {R}}^p\).
Proposition 13
The maximum likelihood estimator of the skewness parameter \(\varvec{\eta }\) of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family is always finite.
Proof
We find the limit of \(L(\varvec{\eta })\) when some (or all) components of \(\varvec{\eta }\) tend to \(+\infty \) or \(-\infty \). For this, first note from the Cauchy–Schwarz inequality that
for \(i=1,\ldots ,n\). From this inequality, it clearly follows, for all \(\varvec{\eta }\in {\mathbb {R}}^p\) and each \(i=1,\ldots ,n\), that
and
These results hold whatever fixed value of \((\varvec{\xi },\varvec{\Psi })\). Thus, noting also that whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \), then \(\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }\rightarrow \infty ,\) we can easily deduce from the first of the previous inequalities that \(\phi _p (\varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \rightarrow 0\) for each \(i=1,\ldots ,n\) as some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). Thus, now taking into account the second inequality, we find that \(L(\varvec{\eta }) \rightarrow 0\) whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \) and whatever fixed value of \((\varvec{\xi },\varvec{\Psi })\). This result leads us to the conclusion that \(L(\varvec{\eta })\) is not a monotonically increasing or decreasing function of any of the components of \(\varvec{\eta }\). This means that the profile likelihood of the skewness parameter \(\varvec{\eta }\) is always maximized at a finite point for the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) family.\(\square \)
Remark Proposition 13 shows that the MLE of the skewness parameter of the \(\mathcal{S}\mathcal{N}\) distribution is always finite for the new parameterization. It was not the case for the \(\mathcal {ASN}\) parameterization. So, if we use the new parameterization for a particular data, then we will get a finite MLE of the skewness parameter, whereas we might get infinite MLE for the skewness parameter in the \(\mathcal {ASN}\) parameterization. Because these two parameterizations are a one-to-one transformation of each other, due to the invariance property of the MLE, if we transform back the MLE from the new-parameterization to the \(\mathcal {ASN}\) parameterization, then we will get back the old results. In other words, although the MLE of the skewness parameter \(\varvec{\eta }\) is always finite, it may sometimes correspond to an MLE of the skewness parameter \(\varvec{\alpha }\) that is infinite.
In what follows, we use the notation of Magnus and Neudecker (1979) related to the Kronecker product and matrix vectorization. For instance, let \(\text{ vech }(\varvec{\Psi })\) be the \(p(p + 1)/2\)-subvector of \(\text{ vec }(\varvec{\Psi })\), where only upper-diagonal entries of \(\varvec{\Psi }\) are considered. Also, let \(\varvec{K}_p\) be the \(p^2 \times p^2\) commutation matrix, i.e., \(\varvec{K}_p \text{ vec }(\varvec{A})=\text{ vec }(\varvec{A}^\top )\) for any \(p\times q\) matrix \(\varvec{A}\), and let \(\varvec{D}_p\) be the \(p^2\times p(p+1)/2\) duplication matrix, i.e., \(\varvec{D}_p \text{ vech }(\varvec{A}) = \text{ vec }(\varvec{A})\) for any \(p\times p\) symmetric matrix \(\varvec{A}\).
Proposition 14
The Fisher information matrix of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family is singular when the skewness parameter \(\varvec{\eta }\) is set to zero.
Proof
The score vector and Fisher information matrix for the \(\mathcal {ASN}_p(\varvec{\xi }, \varvec{\Omega }, \varvec{\alpha })\) family are derived by Arellano-Valle and Azzalini (2008) in terms of the reparametrization \((\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\), where \(\varvec{\lambda }=\varvec{\omega }^{-1}\varvec{\alpha }\). These author also showed that the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\) is singular at \(\varvec{\lambda }=\varvec{0}\):
Since the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family corresponds to a reparametrization of the \(\mathcal {ASN}_p(\varvec{\xi }, \varvec{\Omega }, \varvec{\alpha })\) family, its Fisher information matrix becomes \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{\eta })=\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })^\top \varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\), where \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\) denotes the Jacobian matrix of the transformation from \((\varvec{\xi },vech (\varvec{\Omega }),\varvec{\lambda })\) to \((\varvec{\xi },vech (\varvec{\Psi }),\varvec{\eta })\). Thus, since the inverse transformation from \((\varvec{\xi },vech (\varvec{\Psi }),\varvec{\eta })\) to \((\varvec{\xi },vech (\varvec{\Omega }),\varvec{\lambda })\) turns out to be \(\varvec{\xi }=\varvec{\xi },\) \(\varvec{\Omega }=\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^{\top }\) and \(\varvec{\lambda }= (1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}\varvec{\Psi }^{-1} \varvec{\eta },\) for the Jacobian matrix we have
where \(\varvec{J}_{23}=\varvec{D}_p^+(I _p\otimes \varvec{\eta }+\varvec{\eta }\otimes I _p)\), with \(\varvec{D}_p^+=(\varvec{D}_p^\top \varvec{D}_p)^{-1}\varvec{D}_p^\top \), \(\varvec{J}_{32}=(1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}\{\frac{1}{2}(\varvec{\eta }^{\top }\varvec{\Psi }^{-1}\otimes \varvec{\Psi }^{-1}\varvec{\eta }\varvec{\eta }^{\top }\varvec{\Psi }^{-1})-(\varvec{\eta }^{\top }\varvec{\Psi }^{-1}\otimes \varvec{\Psi }^{-1}\varvec{)}\}\varvec{D}_p\) and \(\varvec{J}_{33}=(1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}(\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^{\top })^{-1}\). When \(\varvec{\eta }=\varvec{0}\) we have that \(\varvec{\lambda }=\varvec{0}\), \(\varvec{\Omega }=\varvec{\Psi }\) and the Jacobian matrix \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{0})=diag (I _p,I _{p(p+1)/2},\varvec{\Psi }^{-1})\). Hence, at \(\varvec{\eta }=\varvec{0}\), the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family becomes
which is clearly singular. \(\square \)
Remark Since \(\varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{0})\) is singular, in the proof that \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{0})\) is also singular it is enough to prove that the Jacobian matrix \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{0})\) is finite (in the matrix sense).
Obviously, the singularity of the Fisher information matrix of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) family when \(\varvec{\eta }= \varvec{0}\) is due to the fact that the score vectors corresponding to the location vector \(\varvec{\xi }\) and the skewness vector \(\varvec{\eta }\) are linearly dependent at \(\varvec{\eta }= \varvec{0}\). In fact, the score vector of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) for \((\varvec{\xi }^\top ,\text{ vech }(\varvec{\Psi })^\top ,\varvec{\eta }^\top )^\top \), at \(\varvec{\eta }= \varvec{0}\), becomes
It is evident from (20) that, at \(\varvec{\eta }= \varvec{0}\), the score vectors of \(\varvec{\xi }\) and \(\varvec{\eta }\) are linearly related. Consequently, the Fisher information matrix, which is the covariance matrix of the score vector, is singular at \(\varvec{\eta }= \varvec{0}\). The score vector in (20) can be obtained from a direct differentiation of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) log-likelihood function, or from the results in Arellano-Valle and Azzalini (2008) as well as from Ley and Paindaveine (2010) since the \(\mathcal{S}\mathcal{N}\) family belongs to the generalized skew-normal family (see Genton and Loperfido (2005)). This last fact can be easily verified by the form of the mpdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) in Equation (18), and from there it is clear that \(\mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is a generalized skew-normal distribution with location parameter \(\varvec{\xi }\), dispersion matrix \(\varvec{\Omega }\), density generator \(\phi _p (\varvec{z})\), and skewing function \(\pi (\varvec{z}): {\mathbb {R}}^p \rightarrow [0,1]\) with \(\pi (\varvec{z}) = \Phi \left( \varvec{\gamma }^\top \varvec{z} \right) .\)
Appendix B
1.1 Proof of Proposition 1
Let \(\tilde{\varvec{X}} = \varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{\Omega }\varvec{Y}\,T + \varvec{U}\). Since, by assumption, T is independent of \((\varvec{U}^\top ,\varvec{Y}^\top )^\top \), it is then clear that, conditionally on \(\varvec{Y} = \varvec{y}\), \(\tilde{\varvec{X}}\) has the same distribution as \(\varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y}\,T + \varvec{U}_{\varvec{y}}\), where \(\varvec{U}_{\varvec{y}} \,{\buildrel d \over =}\, \varvec{U}| \varvec{Y} = \varvec{y} \sim {\mathcal {N}}_p \left( \varvec{0}, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y} ^\top )^{-1} \right) \), independent of T. By Proposition 10, this means \(\tilde{\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1},\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \), and hence, by Proposition 10, the mpdf of \(\tilde{\varvec{X}}| \varvec{Y} = \varvec{y}\) becomes \(f_{\tilde{\varvec{X}} | \varvec{Y} = \varvec{y}}(\varvec{x}) = 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \Phi \{ \varvec{y}^\top (\varvec{x}- \varvec{\xi }) \}.\) Thus, since the mpdf of \(\tilde{\varvec{X}}\) is \(f_{\tilde{\varvec{X}}}(\varvec{x}) = \int _{{\mathbb {R}}^p} f_{\tilde{\varvec{X}} | \varvec{Y}}(\varvec{x}|\varvec{y}) f_{\varvec{Y}} (\varvec{y}) d \varvec{y}\), we have
Now, using the fact that, if \(X \sim {\mathcal {N}}(\mu , \sigma ^2)\), then \({\mathbb {E}} \{ \Phi (X) \}=\Phi \left( \dfrac{\mu }{\sqrt{1+\sigma ^2}}\right) \), we get the above result. \(\square \)
1.2 Proof of Corollary 1
Since, by assumption, \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), we have by Proposition 1 that \(\varvec{X}\) can be represented as \({\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1},\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \) and \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\). Combining this statement with the stochastic representation of the \(\mathcal{S}\mathcal{N}\) distribution as given in Proposition 10, the mpdf of \(\varvec{X}\) can be expressed as
Now, using Lemma 2 in Arellano-Valle et al. (2005) and that \((\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1} = \varvec{\Omega }- \dfrac{1}{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y} ^\top \varvec{\Omega }\), we have the identity given by
Thus, we have
Considering the transformation \(w = \sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,t\), we have, for the joint mpdf of \((\varvec{X}, \varvec{Y},W)\):
Again, using Lemma 2 in Arellano-Valle et al. (2005), we have
and using this result in (21), we get
for \( \varvec{x} \in {\mathbb {R}}^p\), \(\varvec{y} \in {\mathbb {R}}^p\), and \(w>0\). The rest of the proof is trivial from (22). \(\square \)
1.3 Proof of Corollary 2
From Eq. (21), we get that the joint mpdf of \((\varvec{X},\varvec{Y})\) is given by
From this mpdf it follows that the marginal mpdf of \(\varvec{X}\) and the conditional mpdf of \(\varvec{Y}|\varvec{X}=\varvec{x}\) are:
Also, the conditional multivariate moment generating function of \(\varvec{Y}|\varvec{X}=\varvec{x}\) is
where the last step follows from Lemma 5.3 in Azzalini and Capitanio (2014). Thus, we have
Now, since \({\mathbb {E}}(W\varvec{Y}|\varvec{X} =\varvec{x}) = {\mathbb {E}}\{ W {\mathbb {E}}(\varvec{Y}|W,\varvec{X}=\varvec{x})|\varvec{X}=\varvec{x}\}\), then for the evaluation of this quantity we need the conditional mpdfs of \(\varvec{Y}|W=w,\varvec{X}=\varvec{x}\) and \(W|\varvec{X} = \varvec{x}\). Again, from (21):
where, from Lemma 2 in Arellano-Valle et al. (2005), the product \(\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\} \) is equal to \(\phi \{ w;(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi }) \} \phi _p[\varvec{y};\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi }) \{ w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi }) \},\varvec{\Lambda }]\), where \(\varvec{\Lambda }= \{\varvec{\Omega }+(\varvec{x}- \varvec{\xi })(\varvec{x}- \varvec{\xi })^\top \}^{-1}\). Thus,
That is, \(W|\varvec{X}=\varvec{x} \sim \mathcal{T}\mathcal{N} \{ (\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi });(0,\infty ) \}\), and hence
Furthermore, from Eq. (21), we have \( f_{\varvec{Y}|\varvec{X} = \varvec{x},W = w}(\varvec{y}) \propto \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\}\), where again, from Lemma 2 in Arellano-Valle et al. (2005), the product \(\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\} \) is equal to \(\phi \{w;(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })\}\phi _p[\varvec{y};\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi })\{w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })\},\varvec{\Lambda }]\), with \(\varvec{\Lambda }= \{\varvec{\Omega }+(\varvec{x}- \varvec{\xi })(\varvec{x}- \varvec{\xi })^\top \}^{-1}\). Therefore,
That is, \(\varvec{Y}|\varvec{X} = \varvec{x},W = w\sim {\mathcal {N}}_p(\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi })\{w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })\},\varvec{\Lambda })\). Hence,
\(\square \)
1.4 Proof of Proposition 2
From Proposition 1, we have that \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) can be represented as \({\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, \varvec{\Omega }- \dfrac{1}{1 + \varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y}^\top \varvec{\Omega }, \dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \) and \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\). Now, using the results in Eq. (16), we get \({\mathbb {E}}(\varvec{X}|\varvec{Y} = \varvec{y}) = \varvec{\xi }+ \sqrt{\dfrac{2}{\pi }}\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y}\) and \({\mathbb {V}}ar (\varvec{X}|\varvec{Y} = \varvec{y}) = \varvec{\Omega }- \dfrac{2}{\pi }\dfrac{1}{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y} ^\top \varvec{\Omega }\), where \(\varvec{Y} \sim {\mathcal {N}}_p\left( \varvec{\lambda },\varvec{\Omega }^{-1}\right) \), so that
and
Thus,
\(\square \)
1.5 Proof of Proposition 3
Let \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\). Then, by Proposition 1 we have that \(\varvec{X}|\varvec{Y}= \varvec{y} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\), with \(\varvec{\Psi }_{\varvec{y}}=\varvec{\Omega }- \varvec{\eta }_{\varvec{y}} \varvec{\eta }_{\varvec{y}}^\top \) and \(\varvec{\eta }_{\varvec{y}}=(1+\varvec{y}^\top \varvec{\Omega }\varvec{y})^{-1/2} \varvec{\Omega }\varvec{y}\), where \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\), with \(\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\Omega }^{-1} \varvec{\eta }\) and \(\varvec{\Omega }=\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top \) as defined in Eq. (6). Hence by adapting the result of the \(\mathcal{S}\mathcal{N}\)-mmgf from Proposition 11 to the conditional mmgf of \(\varvec{X}|\varvec{Y}=\varvec{y}\sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\), we have
since \(\varvec{\Omega }_{\varvec{y}}=\varvec{\Psi }_{\varvec{y}}+ \varvec{\eta }_{\varvec{y}} \varvec{\eta }_{\varvec{y}}^\top =\varvec{\Omega }\). Hence, the mmgf of \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) becomes
where as before \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\), and we note that
where \(\varvec{Z}=\varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_p(\bar{\varvec{\lambda }},\varvec{\Omega })\), with \(\bar{\varvec{\lambda }}=\varvec{\Omega }\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\eta }\). \(\square \)
1.6 Proof of Proposition 4
From Proposition 12, the conditional mcdf of \(\varvec{X}|\varvec{Y}=\varvec{y} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\) is given by
where
since \(\varvec{\Omega }_{\varvec{y}}=\varvec{\Omega }\), \(\varvec{\gamma }_{\varvec{y}}=(1-\varvec{\eta }_{\varvec{y}}^\top \varvec{\Omega }_{\varvec{y}}^{-1} \varvec{\eta }_{\varvec{y}})^{-1/2} \varvec{\Omega }_{\varvec{y}}^{-1/2} \varvec{\eta }_{\varvec{y}}=\varvec{\Omega }^{1/2} \varvec{y}\) and \(\varvec{\eta }_{\varvec{y}}=(1+\varvec{y}^\top \varvec{\Omega }\varvec{y})^{-1/2} \varvec{\Omega }\varvec{y}\). Therefore, the mcdf of \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) becomes
where as before \(\varvec{Z}=\varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_p(\bar{\varvec{\lambda }},\varvec{\Omega })\), with \(\bar{\varvec{\lambda }}=\varvec{\Omega }\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\eta }\). \(\square \)
1.7 Proof of Proposition 5
By assumption \(\widetilde{\varvec{X}}=\varvec{a} + \varvec{B} \varvec{X}\), with \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), and therefore from the stochastic representation of \(\varvec{X}\), given in Proposition 10, we have \(\widetilde{\varvec{X}}\) has stochastic representation given by \(\widetilde{\varvec{X}}\buildrel d\over = \varvec{a} + \varvec{B} \varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{B}\varvec{\Omega }\varvec{Y}\, T + \varvec{B}\varvec{U},\) where \(T\sim \mathcal{H}\mathcal{N}(0,1)\) and \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\) are independent, and \(\varvec{U}|\varvec{Y}=\varvec{y}\sim {\mathcal {N}}_p\left( \varvec{0},(\varvec{\Omega }^{-1}+\varvec{y}\varvec{y}^\top )^{-1}\right) \). Also, by conditioning \(\widetilde{\varvec{X}}\) on \(\varvec{Y}=\varvec{y}\) from this stochastic representation, we have that \(\widetilde{\varvec{X}}|\varvec{Y}=\varvec{y}\sim \mathcal{S}\mathcal{N}_k(\varvec{a} + \varvec{B} \varvec{\xi }, \varvec{\Psi }_{\varvec{y}}, \varvec{\eta }_{\varvec{y}})\), with conditional mpdf
where \(\varvec{Y} \sim {\mathcal {N}}_p\left( \varvec{\lambda },\varvec{\Omega }^{-1}\right) \), \(\varvec{\Psi }_{\varvec{y}}=\varvec{B}(\varvec{\Omega }^{-1}+\varvec{y}\varvec{y}^\top )^{-1}\varvec{B}^\top \) and \(\varvec{\eta }_{\varvec{y}}=\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{B} \varvec{\Omega }\varvec{y}.\) Thus, since \(f_{\widetilde{\varvec{X}}}(\widetilde{\varvec{x}}) = \int _{{\mathbb {R}}^p} f_{\widetilde{\varvec{X}} | \varvec{Y}}(\widetilde{\varvec{x}}) f_{\varvec{Y}}(\varvec{y}) d \varvec{y}\), we have, after some extensive but straightforward algebra, that the mpdf of \(\widetilde{\varvec{X}}\) becomes
where \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda }, \varvec{\Omega }^{-1})\).
Now, let \(\varvec{Y}_B = \varvec{B} \varvec{\Omega }\varvec{Y}\) and \(\varvec{Y}_C = \varvec{C} \varvec{Y}\), where \(\varvec{C} = I _p - \varvec{B}^\top \varvec{(}\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\). Note that \(\varvec{B} \varvec{C}^\top = \varvec{0}\) and \( \varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }=\varvec{C}^\top \varvec{\Omega }\varvec{C}\) and so \(\varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\}\varvec{Y}=\varvec{Y}^\top \varvec{C}^\top \varvec{\Omega }\varvec{C} \varvec{Y}=\varvec{Y}_C^\top \varvec{\Omega }\varvec{Y}_C\). Since \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda }, \varvec{\Omega }^{-1})\), it follows, from the properties of the multivariate normal distribution, that \(\varvec{Y}_B \sim {\mathcal {N}}_k(\varvec{B} \varvec{\Omega }\varvec{\lambda }, \varvec{B} \varvec{\Omega }\varvec{B}^\top )\) and \(\varvec{Y}_C \sim {\mathcal {N}}_k(\varvec{C} \varvec{\lambda }, \varvec{C}\varvec{\Omega }^{-1} \varvec{C}^\top )\) and they are independent since \(cov (\varvec{Y}_B,\varvec{Y}_C)=\varvec{B} \varvec{C}^\top = \varvec{0}\). In turn, this means that \((\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{Y}_B = (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{B} \varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_k((\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{B} \varvec{\Omega }\varvec{\lambda },(\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1})\) and it is independent of \(\varvec{Y}_C^\top \varvec{\Omega }\varvec{Y}_C= \varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\}\varvec{Y}\). Thus, for the above expectation, we have, by using the same arguments as in the proof of Proposition 1, that
When \(\varvec{B}\) is a nonsingular square matrix, this expectation, which corresponds to the skewing function of the mpdf of \(\widetilde{\varvec{X}}\), reduces to \(\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}} \bigg \},\) thus it follows that \(\widetilde{\varvec{X}} \sim \mathcal {MSN}_p(\varvec{a} + \varvec{B} \varvec{\xi },\varvec{B} \varvec{\Psi }\varvec{B} ^\top , \varvec{B} \varvec{\eta })\). \(\square \)
1.8 Proof of Corollary 3
By considering the partitions of \(\varvec{X}, \varvec{\xi },\varvec{\Psi }\), and \(\varvec{\eta }\), as in Eq. (17), the mpdf of \(\varvec{X}_1\) can be found using Proposition 5, putting \(\varvec{a} = \varvec{0}\) and \(\varvec{B} = (I _{p_1}\, \varvec{0})\) to obtain
where \(\varvec{y} = (\varvec{y}_1^\top , \varvec{y}_2^\top )^\top \), with \(\varvec{y}_i \in {\mathbb {R}}^{p_i}\), and \(\varvec{\lambda }= (\varvec{\lambda }_1^\top , \varvec{\lambda }_2^\top )^\top \), with \(\varvec{y}_i, \varvec{\lambda }_i \in {\mathbb {R}}^{p_i}\), for \(i = 1,2\), and \(\varvec{\Omega }= (\varvec{\Omega }_{ij})\), with \(\varvec{\Omega }_{ij} = \varvec{\Psi }_{ij} + \varvec{\eta }_i \varvec{\eta }_j ^\top \), \(i,j =1,2\). Finally, by using the relations described at the beginning of the corollary we have, after some algebra, that \(\dfrac{\varvec{\lambda }_1 + \varvec{\Omega }_{11}^{-1} \varvec{\Omega }_{12} \varvec{\lambda }_2}{\sqrt{1 + \varvec{y}_2^\top (\varvec{\Omega }_{22}- \varvec{\Omega }_{21} \varvec{\Omega }_{11} ^{-1} \varvec{\Omega }_{12} ) \varvec{y}_2}}=\dfrac{1}{\sqrt{1 + \varvec{y}_2^\top \varvec{\Omega }_{22\cdot 1} \varvec{y}_2 }\sqrt{1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta }}}\,\varvec{\Omega }_{11}^{-1}\varvec{\eta }_1=\sqrt{\dfrac{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}{1 + \varvec{y}_2^\top \varvec{\Omega }_{22\cdot 1}\varvec{y}_2}}\dfrac{1}{\sqrt{1+\varvec{\eta }_1^\top \varvec{\Psi }_{11}^{-1} \varvec{\eta }_1}}\,\varvec{\Psi }_{11}^{-1}\varvec{\eta }_1=\varvec{\lambda }_{1\cdot 2}\). \(\square \)
1.9 Proof of Proposition 6
To prove this, we will show that the density function of \(\mathcal {MSN}_1(0,1, \eta )\) is not a log-concave function. The log-density of \(\mathcal {MSN}_1(0,1, \eta )\) is
The second derivative of \(\log \{f(x)\}\) with respect to x is
where \(t = x/\sqrt{1+\eta ^2}\). It has been found numerically that the sign of \(\dfrac{\textrm{d}^2 \log \{f(x)\}}{\textrm{d}x^2}\) changes for \(x \in (-10,10)\) for various values of \(\eta \). Thus, \(\log \{f(x)\}\) is not always log-concave. \(\square \)
1.10 Proof of Proposition 7
Let \(X \sim \mathcal {MSN}_1(\xi ,\Psi ,\eta )\). Then, the pdf of X is
For \(x < \xi \) and for \(\eta > 0, \dfrac{\eta }{\sqrt{\Psi }} \dfrac{(x - \xi )}{\sqrt{\Psi + \eta ^2 + (x- \xi )^2}} \rightarrow -\infty \), and for \(x > \xi \), \( \dfrac{\eta }{\sqrt{\Psi }} \dfrac{(x - \xi )}{\sqrt{\Psi + \eta ^2 + (x- \xi )^2}} \rightarrow \infty \), when \(\Psi \rightarrow 0\). As a consequence, we have, as \(\Psi \rightarrow 0\): \(f_X(x) \rightarrow 0\) if \(x < \xi \) and \(f_X(x) \rightarrow 2 \phi (x;\xi , \eta ^2)\) if \(x > \xi \), which completes the proof for \(\eta >0\). The proof for \(\eta < 0\) is similar. \(\square \)
1.11 Proof of Proposition 8
The likelihood function for \((\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) based on a random sample \(\varvec{x}_1,\ldots ,\varvec{x}_n\) from \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) is
which becomes the profile likelihood function for the skewness parameter \(\varvec{\eta }\) when \(\varvec{\xi }\) and \(\varvec{\Psi }\) are fixed. Similar to the \(\mathcal{S}\mathcal{N}\), it can be argued exactly in the same way as in the proof of Proposition 13 that the maximum likelihood estimator of the skewness parameter is always finite for the \(\mathcal {MSN}\) distribution as well. Indeed, as in the \(\mathcal{S}\mathcal{N}\) case, for the skewing function of the \(\mathcal {MSN}\) distribution we have
Also, as we already have established in the proof of Proposition 13, \(\phi _p (\varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \rightarrow 0\) for each \(~i=1,\ldots ,n\) as some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). Hence, for any fixed value of \(\varvec{\xi }\in {\mathbb {R}}^p\) and \(\varvec{\Psi }>0\), we can say that \(L(\varvec{\xi },\varvec{\Psi },\varvec{\eta }) \rightarrow 0\) whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). This observation leads us to the conclusion that \(L(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is not a monotonically increasing or decreasing function of any of the components of \(\varvec{\eta }\). Thus, the profile likelihood of the skewness parameter \(\varvec{\eta }\) is always maximized at a finite point for the \(\mathcal {MSN}\) family.
1.12 Proof of Proposition 9
The non-singularity of the matrix \(\varvec{i}_{\varvec{\Psi }\varvec{\Psi }}\) is just a special case of a more general result proven in Hallin and Paindaveine (2006). Thus, the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is nonsingular if
is nonsingular. Let \({\mathbb {E}} \left( \dfrac{\varvec{Z} \varvec{Z}^\top }{\sqrt{1+\varvec{Z}^\top \varvec{Z}}} \right) = \varvec{U}\) and \({\mathbb {E}} \left( \dfrac{\varvec{Z} \varvec{Z}^\top }{1+\varvec{Z}^\top \varvec{Z}}\right) = \varvec{V}\). Then, \(\varvec{I}_{\varvec{\xi }\varvec{\eta }}\) can be written as
Thus, we conclude that \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is nonsingular iff the matrix \(\varvec{V} - \varvec{U}^2\) is nonsingular.
Let \(R=|\varvec{Z}|\), \(\varvec{W}=\varvec{Z}/R\) and \(\varvec{Z}_*=\varvec{Z}/\sqrt{1+R^2}.\) Since \(\varvec{Z}= R \varvec{W}\sim {\mathcal {N}}_p(\varvec{0},I _p)\), R and \(\varvec{W}\) are independent. Also, we know that \({\mathbb {E}}(\varvec{W})=\varvec{0}\) and \({\mathbb {V}} ar (\varvec{W})={\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p) I _p\). Now, note that \(\varvec{Z}_*=R_* \varvec{W}\), where \(R_*=R/\sqrt{1+R^2}\), and so is independent of \(\varvec{W}\). Then, we have \(\varvec{U}={\mathbb {C}} ov (\varvec{Z},\varvec{Z}_*)={\mathbb {C}} ov (R \varvec{W},R_* \varvec{W})={\mathbb {E}}(RR_*){\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p){\mathbb {E}}(R^2/\sqrt{1+R^2})I _p\) and \(\varvec{V}={\mathbb {V}}ar (\varvec{Z}_*)={\mathbb {V}}ar (R_* \varvec{W})={\mathbb {E}}(R_*^2){\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p)E(R_*^2)I _p\). Hence, \(\varvec{V}- \varvec{U}^2\) is positive definite iff \(p{\mathbb {E}}(R_*^2)-\{{\mathbb {E}}(RR_*)\}^2>0\), i.e., iff \(p{\mathbb {E}}\{R^2/(1+R^2)\}-\{{\mathbb {E}}(R^2/\sqrt{1+R^2})\}^2>0\).
Now, by the Cauchy–Schwartz inequality, we get
Since equality in the previous inequality cannot be achieved in this case, we conclude that \(\varvec{V}- \varvec{U}^2\) is positive definite, and, consequently, \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is also positive definite. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mondal, S., Arellano-Valle, R.B. & Genton, M.G. A multivariate modified skew-normal distribution. Stat Papers 65, 511–555 (2024). https://doi.org/10.1007/s00362-023-01397-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-023-01397-1