Skip to main content
Log in

A multivariate modified skew-normal distribution

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We introduce a multivariate version of the modified skew-normal distribution, which contains the multivariate normal distribution as a special case. Unlike the Azzalini multivariate skew-normal distribution, this new distribution has a nonsingular Fisher information matrix when the skewness parameters are all zero, and its profile log-likelihood of the skewness parameters is always a non-monotonic function. We study some basic properties of the proposed family of distributions and present an expectation-maximization (EM) algorithm for parameter estimation that we validate through simulation studies. Finally, we apply the proposed model to the univariate frontier data and to a trivariate wind speed data, and compare its performance with the Azzalini skew-normal model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adcock CJ (2004) Capital asset pricing for UK stocks under the multivariate skew-normal distribution. In: Genton MG (ed) Skew elliptical distributions and their applications: a journey beyond normality. Chapman and Hall, London

  • Adcock CJ (2005) Exploiting skewness to build an optimal hedge fund with a currency overlay. Eur J Financ 11(5):445–462

    Article  Google Scholar 

  • Adcock, CJ, Shutes K (2001) Portfolio selection based on the multivariate skew normal distribution. In: A. Skulimowski, Ed., Financial Modelling, Progress & Business Publishers, Krakow, pp 167–177

  • Arellano-Valle RB, Azzalini A (2008) The centred parametrization for the multivariate skew-normal distribution. J Multivar Anal 99(7):1362–1382

    Article  MathSciNet  Google Scholar 

  • Arellano-Valle RB, Genton MG (2010) An invariance property of quadratic forms in random vectors with a selection distribution, with application to sample variogram and covariogram estimators. Ann Inst Stat Math 62(2):363–381

    Article  MathSciNet  Google Scholar 

  • Arellano-Valle RB, Gómez HW, Quintana FA (2004) A new class of skew-normal distributions. Commun Stat Theory Methods 33(7):1465–1480

    Article  MathSciNet  Google Scholar 

  • Arellano-Valle RB, Bolfarine H, Lachos V (2005) Skew-normal linear mixed models. J Data Sci 3(4):415–438

  • Arellano-Valle RB, Contreras-Reyes JE, Stehlík M (2017) Generalized skew-normal negentropy and its application to fish condition factor time series. Entropy 19(10):528

    Article  ADS  Google Scholar 

  • Arrué J, Arellano-Valle RB, Gómez HW (2016) Bias reduction of maximum likelihood estimates for a modified skew-normal distribution. J Stat Comput Simul 86(15):2967–2984

    Article  MathSciNet  Google Scholar 

  • Arrué J, Arellano-Valle RB, Gómez HW, Leiva V (2020) On a new type of Birnbaum–Saunders models and its inference and application to fatigue data. J Appl Stat 47(13–15):2690–2710

    Article  MathSciNet  PubMed  Google Scholar 

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  Google Scholar 

  • Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32(2):159–188

    Article  MathSciNet  Google Scholar 

  • Azzalini A, Arellano-Valle RB (2013) Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J Stat Plan Inference 143(2):419–433

    Article  MathSciNet  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61(3):579–602

    Article  MathSciNet  Google Scholar 

  • Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J R Stat Soc Ser B Stat Methodol 65(2):367–389

    Article  MathSciNet  Google Scholar 

  • Azzalini A, Capitanio A (2014) The skew-normal and related families, vol 3. Cambridge University Press, Cambridge

    Google Scholar 

  • Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83(4):715–726

    Article  MathSciNet  Google Scholar 

  • Bayes CL, Branco MD (2007) Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Braz J Probab Stat 21(2):141–163

    MathSciNet  Google Scholar 

  • Chiogna M (2005) A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Stat Methods Appl 14(3):331–341

    Article  MathSciNet  Google Scholar 

  • Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. CRC Press, Boca Raton

    Book  Google Scholar 

  • Genton MG, Loperfido NM (2005) Generalized skew-elliptical distributions and their quadratic forms. Ann Inst Stat Math 57(2):389–401

    Article  MathSciNet  Google Scholar 

  • Ghosh P, Branco MD, Chakraborty H (2007) Bivariate random effect model using skew-normal distribution with application to HIV-RNA. Stat Med 26(6):1255–1267

    Article  MathSciNet  PubMed  Google Scholar 

  • Gómez HW, Venegas O, Bolfarine H (2007) Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18(4):395–407

    Article  MathSciNet  Google Scholar 

  • Hallin M, Paindaveine D (2006) Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann Stat 34(6):2707–2756

    MathSciNet  Google Scholar 

  • Henze N (1986) A probabilistic representation of the ‘skew-normal’ distribution. Scand J Stat 13(4):271–275

    MathSciNet  Google Scholar 

  • Jin L, Xu W, Zhu L, Zhu L (2016) Penalized maximum likelihood estimator for skew normal mixtures. arXiv:1608.01513

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20(1):303–322

    MathSciNet  Google Scholar 

  • Ley C, Paindaveine D (2010) On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical models. Metron 68(3):235–250

    Article  MathSciNet  Google Scholar 

  • Lin TI, Lee JC, Yen SY (2007) Finite mixture modelling using the skew normal distribution. Stat Sin 17(3):909–927

    MathSciNet  Google Scholar 

  • Magnus JR, Neudecker H (1979) The commutation matrix: some properties and applications. Ann Stat 7(2):381–394

    Article  MathSciNet  Google Scholar 

  • Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3):519–530

    Article  MathSciNet  Google Scholar 

  • McNeil AJ, Frey R, Embrechts P (2015) Quantitative risk management: concepts, techniques and tools-revised edition. Princeton University Press, Princeton

    Google Scholar 

  • Pewsey A (2000) Problems of inference for Azzalini’s skew-normal distribution. J Appl Stat 27(7):859–870

    Article  Google Scholar 

  • R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/

  • Rotnitzky A, Cox DR, Bottai M, Robins J (2000) Likelihood-based inference with singular information matrix. Bernoulli 6(2):243–284

    Article  MathSciNet  Google Scholar 

  • Sartori N (2006) Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. J Stat Plan Inference 136(12):4259–4275

    Article  MathSciNet  Google Scholar 

  • Yip CMA (2018) Statistical characteristics and mapping of near-surface and elevated wind resources in the middle east. Ph.D. Thesis, KAUST

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagnik Mondal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by King Abdullah University of Science and Technology (KAUST).

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 207 KB)

Appendices

Appendix A

We discuss the benefits of using the alternative parameterization for defining the \(\mathcal{S}\mathcal{N}\) distribution compared to the \(\mathcal {ASN}\) distribution. We start with a preliminary stochastic representation of the \(\mathcal{S}\mathcal{N}\) distribution, from which we can derive most of its main basic properties.

Proposition 10

(Stochastic representation of \(\mathcal{S}\mathcal{N}\) distribution) If \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), then \(\varvec{X} \,{\buildrel d \over =}\, \varvec{\xi }+T \varvec{\eta }+ \varvec{V}\), where T and \(\varvec{V}\) are independently distributed, with half-normal T denoted by \(T \sim \mathcal{H}\mathcal{N}(0,1)\), and \(\varvec{V} \sim {\mathcal {N}}_p(\varvec{0},\varvec{\Psi })\).

Proof

Let \(\tilde{\varvec{X}} = \varvec{\xi }+T \varvec{\eta }+ \varvec{V}\). Since the conditional mpdf of \(\tilde{\varvec{X}}| T = t\) is \( f_{\tilde{\varvec{X}}|T = t}(\varvec{x}) = \phi _p(\varvec{x};\varvec{\xi }+ t \varvec{\eta },\varvec{\Psi })\) for \(t>0 \), and T has marginal density \(f_T(t) = 2\phi (t)I_{(t>0)}\), then for the mpdf of \(\tilde{\varvec{X}}\) we have

$$\begin{aligned} \begin{aligned} f_{\tilde{\varvec{X}}}( \varvec{x})&= \int _{0}^{\infty } f_{\tilde{\varvec{X}}|T=t}(\varvec{x}) f_T(t) d t = 2 \int _{0}^{\infty } \phi _p(\varvec{x};\varvec{\xi }+ t \varvec{\eta },\varvec{\Psi }) \phi (t)d t \\&= 2 \phi _p\left( \varvec{x}; \varvec{\xi }, \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \right) \int _{0}^{\infty }\phi \{ t; \varvec{\eta }^\top (\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top )^{-1}(\varvec{x} -\varvec{\xi }), \\&\quad 1 - \varvec{\eta }^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1}\varvec{\eta }\} d t\\&= 2\phi _p(\varvec{x}; \varvec{\xi }, \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \Phi \Bigg \{ \dfrac{\varvec{\eta }^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1}(\varvec{x} - \varvec{\xi })}{\sqrt{1 - \varvec{\eta }^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1} \varvec{\eta }}} \Bigg \}, \quad \varvec{x} \in {\mathbb {R}}^p, \end{aligned} \end{aligned}$$
(15)

where we have used the identity (see Lemma 2 in Arellano-Valle et al. (2005))

$$\begin{aligned} \phi _p(\varvec{x},\varvec{\xi }+ t \varvec{\eta },\varvec{\Psi }) \phi (t)&= \phi _p(\varvec{x}; \varvec{\xi }, \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \\&\quad \times \phi \{ t; \varvec{\eta }^\top (\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top )^{-1}(\varvec{x} -\varvec{\xi }), 1 - \varvec{\eta }^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1}\varvec{\eta }\}. \end{aligned}$$

Finally, using the following result:

$$\begin{aligned} (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1}= & {} \varvec{\Psi }^{-1} - \dfrac{\varvec{\Psi }^{-1} \varvec{\eta }\varvec{\eta }^\top \varvec{\Psi }^{-1}}{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }} \Rightarrow \\ \dfrac{\varvec{\eta }^\top (\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top )^{-1}(\varvec{x} -\varvec{\xi }) }{\sqrt{1-\varvec{\eta }^\top (\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top )^{-1} \varvec{\eta }}}= & {} \dfrac{\varvec{\eta }^\top \Psi ^{-1}(\varvec{x}- \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}}, \end{aligned}$$

it can be easily established that \(\tilde{\varvec{X}} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\). \(\square \)

Further basic properties As immediate consequences of the above stochastic representation of a random vector \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), we have the following basic properties:

1) Expectation and covariance: The mean vector and covariance matrix of \(\varvec{X}\) are

$$\begin{aligned} {\mathbb {E}}(\varvec{X}) = \varvec{\xi }+ \sqrt{\frac{2}{\pi }} \varvec{\eta }\quad and \quad {\mathbb {V}}(\varvec{X}) = \varvec{\Psi }+ \left( 1-\frac{2}{\pi }\right) \varvec{\eta }\varvec{\eta }^\top . \end{aligned}$$
(16)

2) Distribution of an affine transformation: For any fixed vector \(\varvec{a} \in {\mathbb {R}}^q\) and any fixed matrix \(\varvec{B} \in {\mathbb {R}}^{q \times p}\) of full row rank and \(q \le p\), we have \( \varvec{a} + \varvec{B} \varvec{X} \,{\buildrel d \over =}\, \varvec{a} + \varvec{B} \varvec{\xi }+ T \varvec{B} \varvec{\eta }+ \varvec{B} \varvec{V}\sim \mathcal{S}\mathcal{N}_q(\varvec{a} + \varvec{B} \varvec{\xi }, \varvec{B} \varvec{\Psi }\varvec{B}^\top , \varvec{B} \varvec{\eta })\), since, by assumption, T and \(\varvec{V}\) are independently distributed, with \(T \sim \mathcal{H}\mathcal{N}(0,1)\) and \(\varvec{V} \sim {\mathcal {N}}_p(\varvec{0},\varvec{\Psi })\).

3) Marginal distributions: Partition now \(\varvec{X}\) in two sub-vectors of sizes \(p_1\) and \(p_2\) such that \(p_1+p_2 = p\), with corresponding partitions of the parameters in blocks of matching sizes, as follows

$$\begin{aligned} \varvec{X} = \begin{pmatrix} \varvec{X}_1\\ \varvec{X}_2 \end{pmatrix},\quad \varvec{\xi }= \begin{pmatrix} \varvec{\xi }_1 \\ \varvec{\xi }_2 \end{pmatrix},\quad \varvec{\Psi }= \begin{pmatrix} \varvec{\Psi }_{11} &{} \varvec{\Psi }_{12}\\ \varvec{\Psi }_{21} &{} \varvec{\Psi }_{22} \end{pmatrix},\quad \varvec{\eta }= \begin{pmatrix} \varvec{\eta }_1 \\ \varvec{\eta }_2 \end{pmatrix}. \end{aligned}$$
(17)

Thus, by using property 2) with \(\varvec{a}=\varvec{0}\) and \(\varvec{B} = (I _{p_1}, \varvec{0})\) for \(\varvec{X}_1\) and \(\varvec{B} = (\varvec{0},I _{p_2} )\) for \(\varvec{X}_2\) it follows for their respective marginals that \(\varvec{X}_1 \sim \mathcal{S}\mathcal{N}_{p_1} (\varvec{\xi }_1,\varvec{\Psi }_{11}, \varvec{\eta }_1)\) and \(\varvec{X}_2 \sim \mathcal{S}\mathcal{N}_{p_2} (\varvec{\xi }_2,\varvec{\Psi }_{22}, \varvec{\eta }_2)\).

4) Moment generating function: The multivariate moment generating function (mmgf) of the \(\mathcal{S}\mathcal{N}\) distribution can be derived in closed form. We present the mmgf of the of \(\mathcal{S}\mathcal{N}\) distribution in the next proposition.

Proposition 11

The mmgf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is

$$\begin{aligned} M_{\varvec{X}}(\varvec{t})&=2\exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{t}^\top \varvec{\Omega }\varvec{t}\right) \Phi (\varvec{\eta }^\top \varvec{t}),\quad \varvec{t} \in {\mathbb {R}}^p, \end{aligned}$$

where \(\varvec{\Omega }= \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \).

Proof

The mpdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) in (15) can be rewritten as

$$\begin{aligned} f_{\varvec{X}} (\varvec{x}) = \dfrac{2}{|\varvec{\Omega }|^{1/2}} \phi _p\left\{ \varvec{\Omega }^{-1/2}(\varvec{x} -\varvec{\xi })\right\} \Phi \left\{ \varvec{\gamma }^\top \varvec{\Omega }^{-1/2}(\varvec{x}-\varvec{\xi }) \right\} , \quad \varvec{x}\in {\mathbb {R}}^p, \end{aligned}$$
(18)

where \(\phi _p(\varvec{z})=\phi _p(\varvec{z}; \varvec{0}, I _p)\), \(\varvec{\Omega }= \varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \), \(\varvec{\gamma }= (1- \varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2}\varvec{\Omega }^{-1/2} \varvec{\eta }\), and, conversely, we have \(\varvec{\eta }=(1 + \varvec{\gamma }^\top \varvec{\gamma })^{-1/2}\varvec{\Omega }^{1/2}\varvec{\gamma }\) and \(\varvec{\Psi }= \varvec{\Omega }-(1+\varvec{\gamma }^\top \varvec{\gamma })^{-1}\varvec{\Omega }^{1/2} \varvec{\gamma }\varvec{\gamma }^\top \varvec{\Omega }^{1/2}\). From Eq. (18) and by using the change of variable \(\varvec{z}=\varvec{\Omega }^{-1/2}(\varvec{x}-\varvec{\xi })\) we have that the mmgf of \(\varvec{X}\), \(M_{\varvec{X}}(\varvec{t})={\mathbb {E}}\{\exp (\varvec{t}^\top \varvec{X}) \}\), is given by

$$\begin{aligned} M_{\varvec{X}}(\varvec{t})&=\int _{{\mathbb {R}}^p} \exp (\varvec{t}^\top \varvec{x}) 2\phi _p(\varvec{x};\varvec{\xi },\varvec{\Omega })\Phi \{\varvec{\gamma }^\top \varvec{\Omega }^{-1/2}(\varvec{x}-\varvec{\xi })\} \text {d} \varvec{x}\\&=2 \exp (\varvec{t}^\top \varvec{\xi }) \int _{{\mathbb {R}}^p} \exp (\varvec{s}^\top \varvec{z}) \phi _p(\varvec{z};\varvec{0},{\textbf {I}}_p)\Phi \{ \varvec{\gamma }^\top \varvec{z}\} \text {d} \varvec{z},\quad (\varvec{s}=\varvec{\Omega }^{1/2} \varvec{t})\\&=2 \exp (\varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{s}^\top \varvec{s}) \int _{{\mathbb {R}}^p}\phi _p(\varvec{z};\varvec{s},{\textbf {I}}_p)\Phi \{ \varvec{\gamma }^\top \varvec{z}\} \text {d} \varvec{z}\\&=2 \exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{s}^\top \varvec{s}\right) {\mathbb {E}}\{\Phi (\varvec{\gamma }^\top \varvec{Z})\},\quad \varvec{\gamma }^\top \varvec{Z}\sim {\mathcal {N}}_1(\varvec{\gamma }^\top \varvec{s},\varvec{\gamma }^\top \varvec{\gamma })\\&=2 \exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{s}^\top \varvec{s}\right) \Phi \left( \frac{\varvec{\gamma }^\top \varvec{s}}{\sqrt{1+ \varvec{\gamma }^\top \varvec{\gamma }}}\right) ,\\&=2\exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{t}^\top \varvec{\Omega }\varvec{t}\right) \Phi (\varvec{\eta }^\top \varvec{t}). \end{aligned}$$

\(\square \)

5) Cumulative distribution function: In the next proposition we present the exact functional form of the multivariate cumulative distribution function (mcdf) of the \(\mathcal{S}\mathcal{N}\) distribution.

Proposition 12

The mcdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is

$$\begin{aligned} F_{\varvec{X}}(\varvec{x}) = 2 \Phi _{p+1}(\varvec{x}_*;\varvec{\xi }_*,\varvec{\Omega }_{*}), \quad \text {with} \quad \varvec{x}_* = (\varvec{x}^\top ,0)^\top , \quad \varvec{x} \in {\mathbb {R}}^p, \end{aligned}$$

where, \(\varvec{\xi }_* = (\varvec{\xi }^\top ,0)^\top \) and \(\varvec{\Omega }_* = \begin{pmatrix}\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top &{} -\varvec{\eta }\\ -\varvec{\eta }^\top &{} 1 \end{pmatrix}\).

Proof

The mpdf of \(\varvec{X}\) is

$$\begin{aligned} f_{\varvec{X}}(\varvec{z})&= 2 \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \Phi \left\{ \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}}\right\} \\&= 2 \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \int _{-\infty }^{\frac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}}} \phi (u;0,1) \text {d} u\\&=2 \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \int _{-\infty }^{0} \phi \left( z_0+ \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}};0,1\right) \text {d} z_0\\&=2 \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \int _{-\infty }^{0} \phi \left( z_0;- \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}},1\right) \text {d} z_0 \end{aligned}$$

where we use the change of variable \(z_0 = u - \varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })/\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}\). Now, from the marginal-conditional factorization of a \((p+1)\)-variate normal mpdf we can write

$$\begin{aligned} \phi _p(\varvec{z};\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \phi \left( z_0;- \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}},1\right) = \phi _{p+1}(\varvec{z}_*;\varvec{\xi }_*,\varvec{\Omega }_{**}), \end{aligned}$$

where \(\varvec{z}_* = (\varvec{z}^\top ,z_0)^\top \) and \(\varvec{\Omega }_{**} = \begin{pmatrix}\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top &{} - \sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \varvec{\eta }\\ - \sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \varvec{\eta }^\top &{} 1 \end{pmatrix}\). Moreover, we have \(\varvec{\Omega }_{**} = \varvec{D}_{*} \varvec{\Omega }_{*} \varvec{D}_*\), with \(D_* = \text {diag} \left( {\textbf {I}}_p,\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }} \right) \). Then, the mcdf of \(\varvec{X}\) is

$$\begin{aligned} F_{\varvec{X}} (\varvec{x})&= 2 \int _{(- \varvec{\infty }, \varvec{x}]} \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \Phi \left\{ \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}}\right\} \text {d} \varvec{z}\\&= 2 \int _{(- \varvec{\infty }, \varvec{x}]} \int _{-\infty }^0 \phi _p (\varvec{z};\varvec{\xi },\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top ) \phi \left( z_0; -\dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{z} - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1}\varvec{\eta }}} ,1\right) \text {d}z_0 \text {d} \varvec{z}\\&= 2 \int _{(- \varvec{\infty }, \varvec{x}_*]} \phi _{p+1}(\varvec{z}_*;\varvec{\xi }_*,\varvec{\Omega }_{**}) \text {d} \varvec{z}_*\\&= 2 \Phi _{p+1}(\varvec{x}_*;\varvec{\xi }_*,\varvec{\Omega }_{**}),\quad \text {with} \quad \varvec{x}_* = (\varvec{x}^\top ,0)^\top , \quad \varvec{x} \in {\mathbb {R}}^p. \end{aligned}$$

But, since \(\varvec{x}_* - \varvec{\xi }_* = ((\varvec{x} - \varvec{\xi })^\top , 0)^\top \), then \(\varvec{D}_*^{-1} (\varvec{x}_* - \varvec{\xi }_*) = \varvec{x}_* - \varvec{\xi }_*\) and so

$$\begin{aligned} \Phi _{p+1} (\varvec{x}_*;\varvec{\xi }_*,\varvec{\Omega }_{**})= & {} \Phi _{p+1} (\varvec{x}_*;\varvec{\xi }_*,\varvec{D}_* \varvec{\Omega }_{*} \varvec{D}_*) = \Phi _{p+1}(\varvec{D}_*^{-1}(\varvec{x}_* - \varvec{\xi }_*);\varvec{0}, \varvec{\Omega }_*)\\= & {} \Phi _{p+1}(\varvec{x}_*;\varvec{\xi }_*, \varvec{\Omega }_*). \end{aligned}$$

\(\square \)

Behavior of the likelihood function In the following two results, we show that the alternative parameterization used in (3) for defining the \(\mathcal{S}\mathcal{N}\) mpdf fixes the problem of the infinite maximum likelihood estimate of the skewness parameter, unlike the original parameterization used in (2) for defining the \(\mathcal {ASN}\) mpdf, but it also fails to resolve the problem of the singular Fisher information matrix. In fact, let \(\varvec{x}_1,\ldots ,\varvec{x}_n\) be an observed random sample from \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\). The corresponding likelihood function for \((\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) is

$$\begin{aligned} L(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta }) = \prod _{i=1}^n 2 \phi _p\left( \varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \right) \Phi \bigg \{ \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}}\bigg \}. \end{aligned}$$
(19)

For fixed \(\varvec{\xi }\) and \(\varvec{\Psi }\), (19) becomes the profile likelihood function of \(\varvec{\eta }\), which we denote by \(L(\varvec{\eta })\), \(\varvec{\eta }\in {\mathbb {R}}^p\).

Proposition 13

The maximum likelihood estimator of the skewness parameter \(\varvec{\eta }\) of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family is always finite.

Proof

We find the limit of \(L(\varvec{\eta })\) when some (or all) components of \(\varvec{\eta }\) tend to \(+\infty \) or \(-\infty \). For this, first note from the Cauchy–Schwarz inequality that

$$\begin{aligned}&\dfrac{ |\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi }) |}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }} } \\ {}&\le \sqrt{\dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}}\,\sqrt{(\varvec{x}_i-\varvec{\xi })^\top \varvec{\Psi }^{-1}(\varvec{x}_i-\varvec{\xi })} \le \sqrt{(\varvec{x}_i-\varvec{\xi })^\top \varvec{\Psi }^{-1}(\varvec{x}_i-\varvec{\xi })},\quad \forall \,\varvec{\eta }\in {\mathbb {R}}^p, \end{aligned}$$

for \(i=1,\ldots ,n\). From this inequality, it clearly follows, for all \(\varvec{\eta }\in {\mathbb {R}}^p\) and each \(i=1,\ldots ,n\), that

$$\begin{aligned} \phi _p (\varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )&= \dfrac{\exp \bigg \{ -\dfrac{1}{2} (\varvec{x}_i -\varvec{\xi })^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top )^{-1} (\varvec{x}_i- \varvec{\xi }) \bigg \}}{(2 \pi )^{p/2} |\varvec{\Psi }|^{1/2}\sqrt{1 + \varvec{\eta }^ \top \varvec{\Psi }^{-1} \varvec{\eta }}} \\&= \dfrac{\exp \bigg \{ -\dfrac{1}{2} (\varvec{x}_i - \varvec{\xi })^\top \varvec{\Psi }^{-1} (\varvec{x}_i -\varvec{\xi }) \bigg \}}{(2 \pi )^{p/2} |\varvec{\Psi }|^{1/2}}\,\,\\&\quad \times \dfrac{\exp \bigg [ - \dfrac{1}{2} \dfrac{ \{ \varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi })\}^2}{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }} \bigg ]}{\sqrt{1 + \varvec{\eta }^ \top \varvec{\Psi }^{-1} \varvec{\eta }}}\\&\le \dfrac{1}{(2 \pi )^{p/2} |\varvec{\Psi }|^{1/2}\sqrt{1 + \varvec{\eta }^ \top \varvec{\Psi }^{-1} \varvec{\eta }}}, \end{aligned}$$

and

$$\begin{aligned} 0\le \Phi \bigg \{-\sqrt{(\varvec{x}_i - \varvec{\xi })^\top \varvec{\Psi }^{-1} (\varvec{x}_i -\varvec{\xi })} \bigg \} \le \Phi \bigg \{ \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi })}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}}\bigg \}\le \Phi \bigg \{\sqrt{(\varvec{x}_i - \varvec{\xi })^\top \varvec{\Psi }^{-1} (\varvec{x}_i -\varvec{\xi })} \bigg \} \le 1. \end{aligned}$$

These results hold whatever fixed value of \((\varvec{\xi },\varvec{\Psi })\). Thus, noting also that whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \), then \(\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }\rightarrow \infty ,\) we can easily deduce from the first of the previous inequalities that \(\phi _p (\varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \rightarrow 0\) for each \(i=1,\ldots ,n\) as some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). Thus, now taking into account the second inequality, we find that \(L(\varvec{\eta }) \rightarrow 0\) whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \) and whatever fixed value of \((\varvec{\xi },\varvec{\Psi })\). This result leads us to the conclusion that \(L(\varvec{\eta })\) is not a monotonically increasing or decreasing function of any of the components of \(\varvec{\eta }\). This means that the profile likelihood of the skewness parameter \(\varvec{\eta }\) is always maximized at a finite point for the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) family.\(\square \)

Remark Proposition 13 shows that the MLE of the skewness parameter of the \(\mathcal{S}\mathcal{N}\) distribution is always finite for the new parameterization. It was not the case for the \(\mathcal {ASN}\) parameterization. So, if we use the new parameterization for a particular data, then we will get a finite MLE of the skewness parameter, whereas we might get infinite MLE for the skewness parameter in the \(\mathcal {ASN}\) parameterization. Because these two parameterizations are a one-to-one transformation of each other, due to the invariance property of the MLE, if we transform back the MLE from the new-parameterization to the \(\mathcal {ASN}\) parameterization, then we will get back the old results. In other words, although the MLE of the skewness parameter \(\varvec{\eta }\) is always finite, it may sometimes correspond to an MLE of the skewness parameter \(\varvec{\alpha }\) that is infinite.

In what follows, we use the notation of Magnus and Neudecker (1979) related to the Kronecker product and matrix vectorization. For instance, let \(\text{ vech }(\varvec{\Psi })\) be the \(p(p + 1)/2\)-subvector of \(\text{ vec }(\varvec{\Psi })\), where only upper-diagonal entries of \(\varvec{\Psi }\) are considered. Also, let \(\varvec{K}_p\) be the \(p^2 \times p^2\) commutation matrix, i.e., \(\varvec{K}_p \text{ vec }(\varvec{A})=\text{ vec }(\varvec{A}^\top )\) for any \(p\times q\) matrix \(\varvec{A}\), and let \(\varvec{D}_p\) be the \(p^2\times p(p+1)/2\) duplication matrix, i.e., \(\varvec{D}_p \text{ vech }(\varvec{A}) = \text{ vec }(\varvec{A})\) for any \(p\times p\) symmetric matrix \(\varvec{A}\).

Proposition 14

The Fisher information matrix of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family is singular when the skewness parameter \(\varvec{\eta }\) is set to zero.

Proof

The score vector and Fisher information matrix for the \(\mathcal {ASN}_p(\varvec{\xi }, \varvec{\Omega }, \varvec{\alpha })\) family are derived by Arellano-Valle and Azzalini (2008) in terms of the reparametrization \((\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\), where \(\varvec{\lambda }=\varvec{\omega }^{-1}\varvec{\alpha }\). These author also showed that the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\) is singular at \(\varvec{\lambda }=\varvec{0}\):

$$\begin{aligned} \varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{0})= \begin{bmatrix}\varvec{\omega }^{-1}&{}\varvec{0}&{}\sqrt{\frac{2}{\pi }}I _p\\ \varvec{0}&{}\frac{1}{2}\varvec{D}_p^\top (\varvec{\Omega }^{-1}\otimes \varvec{\Omega }^{-1})\varvec{D}_p&{}\varvec{0}\\ \sqrt{\frac{2}{\pi }}I _p&{}\varvec{0}&{}\frac{2}{\pi }\varvec{\Omega }\end{bmatrix}. \end{aligned}$$

Since the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family corresponds to a reparametrization of the \(\mathcal {ASN}_p(\varvec{\xi }, \varvec{\Omega }, \varvec{\alpha })\) family, its Fisher information matrix becomes \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{\eta })=\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })^\top \varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\), where \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })\) denotes the Jacobian matrix of the transformation from \((\varvec{\xi },vech (\varvec{\Omega }),\varvec{\lambda })\) to \((\varvec{\xi },vech (\varvec{\Psi }),\varvec{\eta })\). Thus, since the inverse transformation from \((\varvec{\xi },vech (\varvec{\Psi }),\varvec{\eta })\) to \((\varvec{\xi },vech (\varvec{\Omega }),\varvec{\lambda })\) turns out to be \(\varvec{\xi }=\varvec{\xi },\) \(\varvec{\Omega }=\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^{\top }\) and \(\varvec{\lambda }= (1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}\varvec{\Psi }^{-1} \varvec{\eta },\) for the Jacobian matrix we have

$$\begin{aligned} \varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{\lambda })=\begin{bmatrix}\frac{\partial \varvec{\xi }}{\partial \varvec{\xi }^\top }&{} \frac{\partial vec (\varvec{\Omega })}{\partial \varvec{\xi }^\top }&{} \frac{\partial \varvec{\lambda }}{\partial \varvec{\xi }^\top }\\ \frac{\partial \varvec{\xi }}{\partial vec (\varvec{\Psi })^\top } &{} \frac{\partial vec (\varvec{\Omega })}{\partial vec (\varvec{\Psi })^\top } &{} \frac{\partial \varvec{\lambda }}{\partial vec (\varvec{\Psi })^\top }\\ \frac{\partial \varvec{\xi }}{\partial \varvec{\eta }^\top } &{} \frac{\partial vec (\varvec{\Omega })}{\partial \varvec{\eta }^\top } &{}\frac{\partial \varvec{\lambda }}{\partial \varvec{\eta }^\top } \end{bmatrix}=\begin{bmatrix}I _p &{} \varvec{0} &{} \varvec{0}\\ \varvec{0} &{} I _{p(p+1)/2} &{} \varvec{J}_{23}\\ \varvec{0} &{} \varvec{J}_{32} &{} \varvec{J}_{33}\end{bmatrix}, \end{aligned}$$

where \(\varvec{J}_{23}=\varvec{D}_p^+(I _p\otimes \varvec{\eta }+\varvec{\eta }\otimes I _p)\), with \(\varvec{D}_p^+=(\varvec{D}_p^\top \varvec{D}_p)^{-1}\varvec{D}_p^\top \), \(\varvec{J}_{32}=(1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}\{\frac{1}{2}(\varvec{\eta }^{\top }\varvec{\Psi }^{-1}\otimes \varvec{\Psi }^{-1}\varvec{\eta }\varvec{\eta }^{\top }\varvec{\Psi }^{-1})-(\varvec{\eta }^{\top }\varvec{\Psi }^{-1}\otimes \varvec{\Psi }^{-1}\varvec{)}\}\varvec{D}_p\) and \(\varvec{J}_{33}=(1 + \varvec{\eta }\varvec{\Psi }^{-1} \varvec{\eta })^{-1/2}(\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^{\top })^{-1}\). When \(\varvec{\eta }=\varvec{0}\) we have that \(\varvec{\lambda }=\varvec{0}\), \(\varvec{\Omega }=\varvec{\Psi }\) and the Jacobian matrix \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{0})=diag (I _p,I _{p(p+1)/2},\varvec{\Psi }^{-1})\). Hence, at \(\varvec{\eta }=\varvec{0}\), the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) family becomes

$$\begin{aligned} \varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{0})=\varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{0})= \begin{bmatrix}\varvec{\Psi }^{-1}&{}\varvec{0}&{}\sqrt{\frac{2}{\pi }}\varvec{\Psi }^{-1}\\ \varvec{0}&{}\frac{1}{2}\varvec{D}_p^\top (\varvec{\Psi }^{-1}\otimes \varvec{\Psi }^{-1})\varvec{D}_p&{}\varvec{0}\\ \sqrt{\frac{2}{\pi }}\varvec{\Psi }^{-1}&{}\varvec{0}&{}\frac{2}{\pi }\varvec{\Psi }^{-1} \end{bmatrix}, \end{aligned}$$

which is clearly singular. \(\square \)

Remark Since \(\varvec{I}(\varvec{\xi },\varvec{\Omega },\varvec{0})\) is singular, in the proof that \(\varvec{I}(\varvec{\xi },\varvec{\Psi },\varvec{0})\) is also singular it is enough to prove that the Jacobian matrix \(\varvec{J}(\varvec{\xi },\varvec{\Omega },\varvec{0})\) is finite (in the matrix sense).

Obviously, the singularity of the Fisher information matrix of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) family when \(\varvec{\eta }= \varvec{0}\) is due to the fact that the score vectors corresponding to the location vector \(\varvec{\xi }\) and the skewness vector \(\varvec{\eta }\) are linearly dependent at \(\varvec{\eta }= \varvec{0}\). In fact, the score vector of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) for \((\varvec{\xi }^\top ,\text{ vech }(\varvec{\Psi })^\top ,\varvec{\eta }^\top )^\top \), at \(\varvec{\eta }= \varvec{0}\), becomes

$$\begin{aligned} \varvec{l}_{\varvec{\xi },\varvec{\Psi },\varvec{0}}(\varvec{X})&= \left( \varvec{\Psi }^{-1} (\varvec{X}- \varvec{\xi }), \dfrac{1}{2} \varvec{D}_p^\top (\varvec{\Psi }\otimes \varvec{\Psi })^{-1} \text{ vec }\left\{ (\varvec{X}- \varvec{\xi })(\varvec{X}- \varvec{\xi })^\top - \varvec{\Psi }\right\} , \right. \nonumber \\&\left. \sqrt{\dfrac{2}{\pi }} \varvec{\Psi }^{-1}(\varvec{X}-\varvec{\xi })\right) ^\top . \end{aligned}$$
(20)

It is evident from (20) that, at \(\varvec{\eta }= \varvec{0}\), the score vectors of \(\varvec{\xi }\) and \(\varvec{\eta }\) are linearly related. Consequently, the Fisher information matrix, which is the covariance matrix of the score vector, is singular at \(\varvec{\eta }= \varvec{0}\). The score vector in (20) can be obtained from a direct differentiation of the \(\mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) log-likelihood function, or from the results in Arellano-Valle and Azzalini (2008) as well as from Ley and Paindaveine (2010) since the \(\mathcal{S}\mathcal{N}\) family belongs to the generalized skew-normal family (see Genton and Loperfido (2005)). This last fact can be easily verified by the form of the mpdf of \(\varvec{X} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) in Equation (18), and from there it is clear that \(\mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is a generalized skew-normal distribution with location parameter \(\varvec{\xi }\), dispersion matrix \(\varvec{\Omega }\), density generator \(\phi _p (\varvec{z})\), and skewing function \(\pi (\varvec{z}): {\mathbb {R}}^p \rightarrow [0,1]\) with \(\pi (\varvec{z}) = \Phi \left( \varvec{\gamma }^\top \varvec{z} \right) .\)

Appendix B

1.1 Proof of Proposition 1

Let \(\tilde{\varvec{X}} = \varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{\Omega }\varvec{Y}\,T + \varvec{U}\). Since, by assumption, T is independent of \((\varvec{U}^\top ,\varvec{Y}^\top )^\top \), it is then clear that, conditionally on \(\varvec{Y} = \varvec{y}\), \(\tilde{\varvec{X}}\) has the same distribution as \(\varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y}\,T + \varvec{U}_{\varvec{y}}\), where \(\varvec{U}_{\varvec{y}} \,{\buildrel d \over =}\, \varvec{U}| \varvec{Y} = \varvec{y} \sim {\mathcal {N}}_p \left( \varvec{0}, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y} ^\top )^{-1} \right) \), independent of T. By Proposition 10, this means \(\tilde{\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1},\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \), and hence, by Proposition 10, the mpdf of \(\tilde{\varvec{X}}| \varvec{Y} = \varvec{y}\) becomes \(f_{\tilde{\varvec{X}} | \varvec{Y} = \varvec{y}}(\varvec{x}) = 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \Phi \{ \varvec{y}^\top (\varvec{x}- \varvec{\xi }) \}.\) Thus, since the mpdf of \(\tilde{\varvec{X}}\) is \(f_{\tilde{\varvec{X}}}(\varvec{x}) = \int _{{\mathbb {R}}^p} f_{\tilde{\varvec{X}} | \varvec{Y}}(\varvec{x}|\varvec{y}) f_{\varvec{Y}} (\varvec{y}) d \varvec{y}\), we have

$$\begin{aligned} f_{\tilde{\varvec{X}}}(\varvec{x})&=2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \int _{{\mathbb {R}}^p} \Phi \{ \varvec{y}^\top (\varvec{x}- \varvec{\xi }) \} \phi _p\left( \varvec{y}; \varvec{\lambda },\varvec{\Omega }^{-1} \right) d y\\&=2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) {\mathbb {E}}[\Phi \{ \varvec{Y}^\top (\varvec{x}- \varvec{\xi }) \} ]. \end{aligned}$$

Now, using the fact that, if \(X \sim {\mathcal {N}}(\mu , \sigma ^2)\), then \({\mathbb {E}} \{ \Phi (X) \}=\Phi \left( \dfrac{\mu }{\sqrt{1+\sigma ^2}}\right) \), we get the above result. \(\square \)

1.2 Proof of Corollary 1

Since, by assumption, \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), we have by Proposition 1 that \(\varvec{X}\) can be represented as \({\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1},\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \) and \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\). Combining this statement with the stochastic representation of the \(\mathcal{S}\mathcal{N}\) distribution as given in Proposition 10, the mpdf of \(\varvec{X}\) can be expressed as

$$\begin{aligned} f_{\varvec{X}} (\varvec{x})&= 2 \displaystyle {\int }_{{\mathbb {R}}^p} \displaystyle {\int }_{0} ^{\infty } \phi _p \left( \varvec{x};\varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}}}\,\varvec{\Omega }\varvec{y}\, t, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1} \right) \\&\times \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi (t) d t d \varvec{y}. \end{aligned}$$

Now, using Lemma 2 in Arellano-Valle et al. (2005) and that \((\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1} = \varvec{\Omega }- \dfrac{1}{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y} ^\top \varvec{\Omega }\), we have the identity given by

$$\begin{aligned}&\phi _p \left( \varvec{x};\varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}}}\,\varvec{\Omega }\varvec{y}\, t, \varvec{\Omega }- \dfrac{1}{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y} ^\top \varvec{\Omega }\right) \phi (t)\\&\quad = \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \phi \bigg \{ t;\dfrac{\varvec{y}^\top (\varvec{x}- \varvec{\xi })}{\sqrt{1+\varvec{y}^\top {\varvec{\Omega }}\varvec{y}}},\dfrac{1}{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}} \bigg \}. \end{aligned}$$

Thus, we have

$$\begin{aligned} f_{\varvec{X}} (\varvec{x}) = 2 \int _{{\mathbb {R}}^p} \int _0 ^{\infty } \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \bigg \{ t;\dfrac{\varvec{y}^\top (\varvec{x}- \varvec{\xi })}{\sqrt{1+\varvec{y}^\top {\varvec{\Omega }}\varvec{y}}},\dfrac{1}{1+\varvec{y}^\top {\varvec{\Omega }} \varvec{y}} \bigg \} d t d \varvec{y}. \end{aligned}$$

Considering the transformation \(w = \sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,t\), we have, for the joint mpdf of \((\varvec{X}, \varvec{Y},W)\):

$$\begin{aligned} f_{\varvec{X},\varvec{Y},W}(\varvec{x},\varvec{y},w)&= 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{ w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1 \}, \nonumber \\&\varvec{x} \in {\mathbb {R}}^p,\varvec{y} \in {\mathbb {R}}^p, w>0. \end{aligned}$$
(21)

Again, using Lemma 2 in Arellano-Valle et al. (2005), we have

$$\begin{aligned}&\phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega })\phi \{ w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1 \}=\phi _p \bigg \{ \varvec{x}; \varvec{\xi }+ w \dfrac{1}{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y}, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1} \bigg \}\\&\quad \times \phi ( w;0,1+\varvec{y}^\top \varvec{\Omega }\varvec{y} ), \end{aligned}$$

and using this result in (21), we get

$$\begin{aligned} f_{\varvec{X},\varvec{Y},W}(\varvec{x},\varvec{y},w)&= 2 \phi _p \bigg \{ \varvec{x}; \varvec{\xi }+ w \dfrac{1}{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y}, (\varvec{\Omega }^{-1} + \varvec{y} \varvec{y}^\top )^{-1} \bigg \}\nonumber \\&\times \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi ( w;0,1+\varvec{y}^\top \varvec{\Omega }\varvec{y} ) \end{aligned}$$
(22)

for \( \varvec{x} \in {\mathbb {R}}^p\), \(\varvec{y} \in {\mathbb {R}}^p\), and \(w>0\). The rest of the proof is trivial from (22). \(\square \)

1.3 Proof of Corollary 2

From Eq. (21), we get that the joint mpdf of \((\varvec{X},\varvec{Y})\) is given by

$$\begin{aligned} f_{\varvec{X},\varvec{Y}}(\varvec{x},\varvec{y})&= 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \int _{0} ^ \infty \phi \{ w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1 \} d w\\&= 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \Phi \{ \varvec{y}^\top (\varvec{x}-\varvec{\xi })\}, \quad \varvec{x},\varvec{y} \in {\mathbb {R}}^p. \end{aligned}$$

From this mpdf it follows that the marginal mpdf of \(\varvec{X}\) and the conditional mpdf of \(\varvec{Y}|\varvec{X}=\varvec{x}\) are:

$$\begin{aligned}&f_{\varvec{X}}(\varvec{x}) = 2 \phi _p(\varvec{x}; \varvec{\xi },\varvec{\Omega }) \Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}, \quad \varvec{x} \in {\mathbb {R}}^p, \\ {}&f_{\varvec{Y}|\varvec{X} = \varvec{x}}(\varvec{y}) \\ {}&\quad = \dfrac{1}{\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \} }\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \Phi \{ \varvec{y}^\top (\varvec{x}-\varvec{\xi })\},\quad \varvec{y} \in {\mathbb {R}}^p. \end{aligned}$$

Also, the conditional multivariate moment generating function of \(\varvec{Y}|\varvec{X}=\varvec{x}\) is

$$\begin{aligned} M_{\varvec{Y}|\varvec{X}=\varvec{x}}(\varvec{t})&= {\mathbb {E}}\{\exp (\varvec{t}^\top \varvec{Y})|\varvec{X}=\varvec{x}\} \\&= \int _{{\mathbb {R}}^p} \dfrac{\exp (\varvec{t}^\top \varvec{y})}{\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}}\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \Phi \{ \varvec{y}^\top (\varvec{x}-\varvec{\xi }) \} d \varvec{y}\\&= \dfrac{\exp \left( -\dfrac{1}{2}\varvec{\lambda }^\top \varvec{\Omega }\varvec{\lambda }\right) }{\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}} \\&\quad \times \int _{{\mathbb {R}}^p} \dfrac{1}{(\sqrt{2 \pi })^p\{\det (\varvec{\Omega }^{-1}) \}^{1/2}} \exp [-\dfrac{1}{2}\{ \varvec{y}^\top \varvec{\Omega }\varvec{y} -2 (\varvec{\Omega }\varvec{\lambda }+\varvec{t})^\top \varvec{y} \}]\\&\quad \times \Phi \{ \varvec{y}^\top (\varvec{x} -\varvec{\xi }) \} d \varvec{y}\\&= \dfrac{\exp \bigg (\varvec{\lambda }^\top \varvec{t} +\dfrac{1}{2} \varvec{t}^\top \varvec{\Omega }^{-1} \varvec{t}\bigg )}{\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}}\\&\quad \times \Phi \bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} \varvec{t}}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}, \quad \varvec{t} \in {\mathbb {R}}^p, \end{aligned}$$

where the last step follows from Lemma 5.3 in Azzalini and Capitanio (2014). Thus, we have

$$\begin{aligned} {\mathbb {E}}(\varvec{Y}|\varvec{X}=\varvec{x})&= \dfrac{\partial }{\partial \varvec{t}} M_{\varvec{Y}|\varvec{X}=\varvec{x}}(\varvec{t}) \bigg |_{\varvec{t} =\varvec{0}}\\&= \varvec{\lambda }+ W_{\phi }\bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}\\&\quad \times \dfrac{1}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}}\,\varvec{\Omega }^{-1}(\varvec{x} - \varvec{\xi }), \\&{\mathbb {E}}(\varvec{Y} \varvec{Y}^\top |\varvec{X}= \varvec{x}) = \dfrac{\partial ^2}{\partial \varvec{t} \partial \varvec{t}^\top } M_{\varvec{Y}|\varvec{X}=\varvec{x}}(\varvec{t}) \bigg |_{\varvec{t} =\varvec{0}}\\&= \varvec{\lambda }\varvec{\lambda }^\top + \varvec{\Omega }^{-1} + W_{\phi }\bigg \{ \dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}\\&\quad \times \dfrac{1}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}}\\&\quad \times \bigg \{\varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi }) \varvec{\lambda }^\top + \varvec{\lambda }(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1}\\&\quad -\dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}\,\varvec{\Omega }^{-1}(\varvec{x} - \varvec{\xi })(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} \bigg \}. \end{aligned}$$

Now, since \({\mathbb {E}}(W\varvec{Y}|\varvec{X} =\varvec{x}) = {\mathbb {E}}\{ W {\mathbb {E}}(\varvec{Y}|W,\varvec{X}=\varvec{x})|\varvec{X}=\varvec{x}\}\), then for the evaluation of this quantity we need the conditional mpdfs of \(\varvec{Y}|W=w,\varvec{X}=\varvec{x}\) and \(W|\varvec{X} = \varvec{x}\). Again, from (21):

$$\begin{aligned} f_{W|\varvec{X} = \varvec{x}}(w)&\propto \int _{{\mathbb {R}}^p} \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\} d \varvec{y} , \quad w>0, \end{aligned}$$

where, from Lemma 2 in Arellano-Valle et al. (2005), the product \(\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\} \) is equal to \(\phi \{ w;(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi }) \} \phi _p[\varvec{y};\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi }) \{ w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi }) \},\varvec{\Lambda }]\), where \(\varvec{\Lambda }= \{\varvec{\Omega }+(\varvec{x}- \varvec{\xi })(\varvec{x}- \varvec{\xi })^\top \}^{-1}\). Thus,

$$\begin{aligned} f_{W|\varvec{X} = \varvec{x}}(w) \propto \phi \{ w;(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi }) \}, \quad w>0. \end{aligned}$$

That is, \(W|\varvec{X}=\varvec{x} \sim \mathcal{T}\mathcal{N} \{ (\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi });(0,\infty ) \}\), and hence

$$\begin{aligned} {\mathbb {E}}(W|\varvec{X}=\varvec{x})&= (\varvec{x} - \varvec{\xi })^\top \varvec{\lambda }+\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}\,\\&\quad \times W_{\phi }\bigg \{\dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \},\\ {\mathbb {E}}(W^2|\varvec{X}=\varvec{x})&= \{(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda }\}^2 + \{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })\} \\&\quad +(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda }\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}\,\\&\quad \times W_{\phi }\bigg \{\dfrac{\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{\sqrt{1+(\varvec{x} - \varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}} \bigg \}. \end{aligned}$$

Furthermore, from Eq. (21), we have \( f_{\varvec{Y}|\varvec{X} = \varvec{x},W = w}(\varvec{y}) \propto \phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\}\), where again, from Lemma 2 in Arellano-Valle et al. (2005), the product \(\phi _p(\varvec{y};\varvec{\lambda },{\varvec{\Omega }}^{-1}) \phi \{w;\varvec{y}^\top (\varvec{x}- \varvec{\xi }),1\} \) is equal to \(\phi \{w;(\varvec{x} - \varvec{\xi })^\top \varvec{\lambda },1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })\}\phi _p[\varvec{y};\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi })\{w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })\},\varvec{\Lambda }]\), with \(\varvec{\Lambda }= \{\varvec{\Omega }+(\varvec{x}- \varvec{\xi })(\varvec{x}- \varvec{\xi })^\top \}^{-1}\). Therefore,

$$\begin{aligned} f_{\varvec{Y}|\varvec{X} = \varvec{x},W = w}(\varvec{y}) = \phi _p[\varvec{y};\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi })\{w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })\},\varvec{\Lambda }], \quad \varvec{y} \in {\mathbb {R}}^p. \end{aligned}$$

That is, \(\varvec{Y}|\varvec{X} = \varvec{x},W = w\sim {\mathcal {N}}_p(\varvec{\lambda }+ \varvec{\Lambda }(\varvec{x}-\varvec{\xi })\{w-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })\},\varvec{\Lambda })\). Hence,

$$\begin{aligned} {\mathbb {E}}(W\varvec{Y}|\varvec{X} =\varvec{x})&= {\mathbb {E}}\{W {\mathbb {E}}(\varvec{Y}|W,\varvec{X}=\varvec{x})|\varvec{X}=\varvec{x}\} \\&= {\mathbb {E}}\left[ W\left\{ \varvec{\lambda }+ \dfrac{W-\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi })}{1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1} (\varvec{x}-\varvec{\xi })}\,\varvec{\Omega }^{-1}(\varvec{x}-\varvec{\xi }) \right\} \left| \right. \varvec{X}=\varvec{x}\right] \\&={\mathbb {E}}(W|\varvec{X}=\varvec{x})\varvec{\lambda }\\&\quad +\dfrac{\{{\mathbb {E}}(W^2|\varvec{X}=\varvec{x}) -\varvec{\lambda }^\top (\varvec{x}-\varvec{\xi }){\mathbb {E}}(W|\varvec{X}=\varvec{x})\}}{1+(\varvec{x}-\varvec{\xi })^\top \varvec{\Omega }^{-1}(\varvec{x}-\varvec{\xi })}\,\varvec{\Omega }^{-1}(\varvec{x}-\varvec{\xi }). \end{aligned}$$

\(\square \)

1.4 Proof of Proposition 2

From Proposition 1, we have that \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\) can be represented as \({\varvec{X}} | \varvec{Y} = \varvec{y} \sim \mathcal{S}\mathcal{N}_p \left( \varvec{\xi }, \varvec{\Omega }- \dfrac{1}{1 + \varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y}^\top \varvec{\Omega }, \dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y} \right) \) and \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\). Now, using the results in Eq. (16), we get \({\mathbb {E}}(\varvec{X}|\varvec{Y} = \varvec{y}) = \varvec{\xi }+ \sqrt{\dfrac{2}{\pi }}\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{\Omega }\varvec{y}\) and \({\mathbb {V}}ar (\varvec{X}|\varvec{Y} = \varvec{y}) = \varvec{\Omega }- \dfrac{2}{\pi }\dfrac{1}{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}\,\varvec{\Omega }\varvec{y} \varvec{y} ^\top \varvec{\Omega }\), where \(\varvec{Y} \sim {\mathcal {N}}_p\left( \varvec{\lambda },\varvec{\Omega }^{-1}\right) \), so that

$$\begin{aligned} {\mathbb {E}}(\varvec{X}) = {\mathbb {E}}\{ {\mathbb {E}}(\varvec{X}|\varvec{Y}) \}= \varvec{\xi }+ \sqrt{\dfrac{2}{\pi }} {\mathbb {E}} \left( \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{\Omega }\varvec{Y} \right) \end{aligned}$$

and

$$\begin{aligned} {\mathbb {V}}ar (\varvec{X})&= {\mathbb {E}}\{ {\mathbb {V}}ar (\varvec{X}|\varvec{Y})\} + {\mathbb {V}}ar \{ {\mathbb {E}}(\varvec{X}| \varvec{Y}) \} \\ {}&= \varvec{\Omega }- \dfrac{2}{\pi }{\mathbb {E}}\left( \dfrac{1}{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}\,\varvec{\Omega }\varvec{Y} \varvec{Y} ^\top \varvec{\Omega }\right) + \dfrac{2}{\pi } {\mathbb {V}}ar \left( \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}} \,\varvec{\Omega }\varvec{Y}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} {\mathbb {V}}ar (\varvec{X}) = \varvec{\Omega }- \dfrac{2}{\pi } {\mathbb {E}} \left( \dfrac{}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{\Omega }\varvec{Y} \right) {\mathbb {E}} \left( \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{\Omega }\varvec{Y} \right) ^\top . \end{aligned}$$

\(\square \)

1.5 Proof of Proposition 3

Let \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\). Then, by Proposition 1 we have that \(\varvec{X}|\varvec{Y}= \varvec{y} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\), with \(\varvec{\Psi }_{\varvec{y}}=\varvec{\Omega }- \varvec{\eta }_{\varvec{y}} \varvec{\eta }_{\varvec{y}}^\top \) and \(\varvec{\eta }_{\varvec{y}}=(1+\varvec{y}^\top \varvec{\Omega }\varvec{y})^{-1/2} \varvec{\Omega }\varvec{y}\), where \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\), with \(\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\Omega }^{-1} \varvec{\eta }\) and \(\varvec{\Omega }=\varvec{\Psi }+\varvec{\eta }\varvec{\eta }^\top \) as defined in Eq. (6). Hence by adapting the result of the \(\mathcal{S}\mathcal{N}\)-mmgf from Proposition 11 to the conditional mmgf of \(\varvec{X}|\varvec{Y}=\varvec{y}\sim \mathcal{S}\mathcal{N}_p(\varvec{\xi },\varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\), we have

$$\begin{aligned} M_{\varvec{X}|\varvec{Y}=\varvec{y}}(\varvec{t})= & {} 2 \exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{t}^\top \varvec{\Omega }_{\varvec{y}} \varvec{t}\right) \Phi (\varvec{\eta }_{\varvec{y}}^\top \varvec{t})\\ \quad= & {} 2 \exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} \varvec{t}^\top \varvec{\Omega }\varvec{t}\right) \Phi \left( \frac{\varvec{y}^\top \varvec{\Omega }\varvec{t}}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\right) , \end{aligned}$$

since \(\varvec{\Omega }_{\varvec{y}}=\varvec{\Psi }_{\varvec{y}}+ \varvec{\eta }_{\varvec{y}} \varvec{\eta }_{\varvec{y}}^\top =\varvec{\Omega }\). Hence, the mmgf of \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) becomes

$$\begin{aligned} M_{\varvec{X}}(\varvec{t})={\mathbb {E}}\{M_{\varvec{X}|\varvec{Y}}(\varvec{t})\}= 2 \exp \left( \varvec{t}^\top \varvec{\xi }+\frac{1}{2} t^\top \varvec{\Omega }\varvec{t}\right) {\mathbb {E}}\left\{ \Phi \left( \frac{\varvec{Y}^\top \varvec{\Omega }\varvec{t}}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\right) \right\} , \end{aligned}$$

where as before \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\), and we note that

$$\begin{aligned} {\mathbb {E}}\left\{ \Phi \left( \frac{\varvec{Y}^\top \varvec{\Omega }\varvec{t}}{\sqrt{1+ \varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\right) \right\} ={\mathbb {E}}\left\{ \Phi \left( \frac{\varvec{Z}^\top \varvec{t}}{\sqrt{1+\varvec{Z}^\top \varvec{\Omega }^{-1} \varvec{Z}}}\right) \right\} , \end{aligned}$$

where \(\varvec{Z}=\varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_p(\bar{\varvec{\lambda }},\varvec{\Omega })\), with \(\bar{\varvec{\lambda }}=\varvec{\Omega }\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\eta }\). \(\square \)

1.6 Proof of Proposition 4

From Proposition 12, the conditional mcdf of \(\varvec{X}|\varvec{Y}=\varvec{y} \sim \mathcal{S}\mathcal{N}_p(\varvec{\xi }, \varvec{\Psi }_{\varvec{y}},\varvec{\eta }_{\varvec{y}})\) is given by

$$\begin{aligned} F_{\varvec{X}|\varvec{Y}=\varvec{y}}(\varvec{x})&=\int _{\varvec{u}\le \varvec{x}} 2\phi _p(\varvec{u};\varvec{\xi },\varvec{\Omega }_{\varvec{y}})\Phi \left\{ \varvec{\gamma }_{\varvec{y}}^\top \varvec{\Omega }_{\varvec{y}}^{-1/2}(\varvec{u}-\varvec{\xi })\right\} \text {d} \varvec{u}\\&=2\Phi _{p+1}(\varvec{x}^*;\varvec{\xi }^*,\varvec{\Omega }_{\varvec{y}}^*), \end{aligned}$$

where

$$\begin{aligned}{} & {} \varvec{x}^*=\left( \begin{array}{c} \varvec{x} \\ 0 \\ \end{array} \right) ,\quad \varvec{\xi }^*=\left( \begin{array}{c} \varvec{\xi }\\ 0 \\ \end{array} \right) ,\\ {}{} & {} \varvec{\Omega }_{\varvec{y}}^*=\left( \begin{array}{cc} \varvec{\Omega }_{\varvec{y}} - \varvec{\Omega }_{\varvec{y}}^{1/2} \varvec{\gamma }_{\varvec{y}} \\ - \varvec{\gamma }_{\varvec{y}}^\top \varvec{\Omega }_{\varvec{y}}^{1/2} &{} 1+ \varvec{\gamma }_{\varvec{y}}^\top \varvec{\Omega }_{\varvec{y}}^{-1} \varvec{\gamma }_{\varvec{y}} \\ \end{array} \right) =\left( \begin{array}{cc} \varvec{\Omega }&{} -\varvec{\Omega }\varvec{y} \\ -\varvec{y}^\top \varvec{\Omega }&{} 1+\varvec{y}^\top \varvec{\Omega }\varvec{y} \\ \end{array} \right) , \end{aligned}$$

since \(\varvec{\Omega }_{\varvec{y}}=\varvec{\Omega }\), \(\varvec{\gamma }_{\varvec{y}}=(1-\varvec{\eta }_{\varvec{y}}^\top \varvec{\Omega }_{\varvec{y}}^{-1} \varvec{\eta }_{\varvec{y}})^{-1/2} \varvec{\Omega }_{\varvec{y}}^{-1/2} \varvec{\eta }_{\varvec{y}}=\varvec{\Omega }^{1/2} \varvec{y}\) and \(\varvec{\eta }_{\varvec{y}}=(1+\varvec{y}^\top \varvec{\Omega }\varvec{y})^{-1/2} \varvec{\Omega }\varvec{y}\). Therefore, the mcdf of \(\varvec{X}\sim \mathcal {MSN}_p(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) becomes

$$\begin{aligned} F_{\varvec{X}}(\varvec{x})&={\mathbb {E}} \left[ \Phi _{p+1}\left\{ \left( \begin{array}{c} \varvec{x} \\ 0 \\ \end{array} \right) ;\left( \begin{array}{c} \varvec{\xi }\\ 0 \\ \end{array} \right) ,\left( \begin{array}{cc} \varvec{\Omega }&{} - \varvec{\Omega }\varvec{Y} \\ -\varvec{Y}^\top \varvec{\Omega }&{} 1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y} \\ \end{array} \right) \right\} \right] \\&={\mathbb {E}}\left[ \Phi _{p+1}\left\{ \left( \begin{array}{c} \varvec{x} \\ 0 \\ \end{array} \right) ;\left( \begin{array}{c} \varvec{\xi }\\ 0 \\ \end{array} \right) ,\left( \begin{array}{cc} \varvec{\Omega }&{} -\varvec{Z} \\ -\varvec{Z} &{} 1+\varvec{Z}^\top \varvec{\Omega }^{-1} \varvec{Z} \\ \end{array} \right) \right\} \right] , \end{aligned}$$

where as before \(\varvec{Z}=\varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_p(\bar{\varvec{\lambda }},\varvec{\Omega })\), with \(\bar{\varvec{\lambda }}=\varvec{\Omega }\varvec{\lambda }=(1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta })^{-1/2} \varvec{\eta }\). \(\square \)

1.7 Proof of Proposition 5

By assumption \(\widetilde{\varvec{X}}=\varvec{a} + \varvec{B} \varvec{X}\), with \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi },\varvec{\eta })\), and therefore from the stochastic representation of \(\varvec{X}\), given in Proposition 10, we have \(\widetilde{\varvec{X}}\) has stochastic representation given by \(\widetilde{\varvec{X}}\buildrel d\over = \varvec{a} + \varvec{B} \varvec{\xi }+ \dfrac{1}{\sqrt{1+\varvec{Y}^\top \varvec{\Omega }\varvec{Y}}}\,\varvec{B}\varvec{\Omega }\varvec{Y}\, T + \varvec{B}\varvec{U},\) where \(T\sim \mathcal{H}\mathcal{N}(0,1)\) and \(\varvec{Y}\sim {\mathcal {N}}_p(\varvec{\lambda },\varvec{\Omega }^{-1})\) are independent, and \(\varvec{U}|\varvec{Y}=\varvec{y}\sim {\mathcal {N}}_p\left( \varvec{0},(\varvec{\Omega }^{-1}+\varvec{y}\varvec{y}^\top )^{-1}\right) \). Also, by conditioning \(\widetilde{\varvec{X}}\) on \(\varvec{Y}=\varvec{y}\) from this stochastic representation, we have that \(\widetilde{\varvec{X}}|\varvec{Y}=\varvec{y}\sim \mathcal{S}\mathcal{N}_k(\varvec{a} + \varvec{B} \varvec{\xi }, \varvec{\Psi }_{\varvec{y}}, \varvec{\eta }_{\varvec{y}})\), with conditional mpdf

$$\begin{aligned} f_{\widetilde{\varvec{X}} | \varvec{Y}=\varvec{y}}(\widetilde{\varvec{x}}) = \phi _p(\widetilde{\varvec{x}};\varvec{a} + \varvec{B} \varvec{\xi },\varvec{\Psi }_{\varvec{y}}+\varvec{\eta }_{\varvec{y}}\varvec{\eta }_{\varvec{y}}^\top ) \Phi \left\{ \dfrac{\varvec{\eta }_{\varvec{y}}^\top \varvec{\Psi }_{\varvec{y}}^{-1}(\tilde{\varvec{x}}-\varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1+\varvec{\eta }_{\varvec{y}}^\top (\varvec{\Psi }_{\varvec{y}}+\varvec{\eta }_{\varvec{y}}\varvec{\eta }_{\varvec{y}}^\top )^{-1}\varvec{\eta }_{\varvec{y}}}}\right\} , \end{aligned}$$

where \(\varvec{Y} \sim {\mathcal {N}}_p\left( \varvec{\lambda },\varvec{\Omega }^{-1}\right) \), \(\varvec{\Psi }_{\varvec{y}}=\varvec{B}(\varvec{\Omega }^{-1}+\varvec{y}\varvec{y}^\top )^{-1}\varvec{B}^\top \) and \(\varvec{\eta }_{\varvec{y}}=\dfrac{1}{\sqrt{1+\varvec{y}^\top \varvec{\Omega }\varvec{y}}}\,\varvec{B} \varvec{\Omega }\varvec{y}.\) Thus, since \(f_{\widetilde{\varvec{X}}}(\widetilde{\varvec{x}}) = \int _{{\mathbb {R}}^p} f_{\widetilde{\varvec{X}} | \varvec{Y}}(\widetilde{\varvec{x}}) f_{\varvec{Y}}(\varvec{y}) d \varvec{y}\), we have, after some extensive but straightforward algebra, that the mpdf of \(\widetilde{\varvec{X}}\) becomes

$$\begin{aligned} f_{\widetilde{\varvec{X}}}(\widetilde{\varvec{x}})&=2 \phi _p(\widetilde{\varvec{x}}; \varvec{a} + \varvec{B} \varvec{\xi }, \varvec{B} \varvec{\Omega }\varvec{B}^\top ) \int _{{\mathbb {R}}^p} \Phi \bigg [ \dfrac{\varvec{y}^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + \varvec{y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\} \varvec{y} }} \bigg ] \phi _p\left( \varvec{y}; \varvec{\lambda },\varvec{\Omega }^{-1}\right) d \varvec{y}\\&=2 \phi _p(\widetilde{\varvec{x}}; \varvec{a} + \varvec{B} \varvec{\xi }, \varvec{B} \varvec{\Omega }\varvec{B}^\top ) {\mathbb {E}}\left( \Phi \bigg [ \dfrac{\varvec{Y}^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + \varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\} \varvec{Y} }} \bigg ]\right) , \quad \widetilde{\varvec{x}} \in {\mathbb {R}}^p, \end{aligned}$$

where \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda }, \varvec{\Omega }^{-1})\).

Now, let \(\varvec{Y}_B = \varvec{B} \varvec{\Omega }\varvec{Y}\) and \(\varvec{Y}_C = \varvec{C} \varvec{Y}\), where \(\varvec{C} = I _p - \varvec{B}^\top \varvec{(}\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\). Note that \(\varvec{B} \varvec{C}^\top = \varvec{0}\) and \( \varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }=\varvec{C}^\top \varvec{\Omega }\varvec{C}\) and so \(\varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\}\varvec{Y}=\varvec{Y}^\top \varvec{C}^\top \varvec{\Omega }\varvec{C} \varvec{Y}=\varvec{Y}_C^\top \varvec{\Omega }\varvec{Y}_C\). Since \(\varvec{Y} \sim {\mathcal {N}}_p(\varvec{\lambda }, \varvec{\Omega }^{-1})\), it follows, from the properties of the multivariate normal distribution, that \(\varvec{Y}_B \sim {\mathcal {N}}_k(\varvec{B} \varvec{\Omega }\varvec{\lambda }, \varvec{B} \varvec{\Omega }\varvec{B}^\top )\) and \(\varvec{Y}_C \sim {\mathcal {N}}_k(\varvec{C} \varvec{\lambda }, \varvec{C}\varvec{\Omega }^{-1} \varvec{C}^\top )\) and they are independent since \(cov (\varvec{Y}_B,\varvec{Y}_C)=\varvec{B} \varvec{C}^\top = \varvec{0}\). In turn, this means that \((\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{Y}_B = (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{B} \varvec{\Omega }\varvec{Y} \sim {\mathcal {N}}_k((\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1}\varvec{B} \varvec{\Omega }\varvec{\lambda },(\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1})\) and it is independent of \(\varvec{Y}_C^\top \varvec{\Omega }\varvec{Y}_C= \varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\}\varvec{Y}\). Thus, for the above expectation, we have, by using the same arguments as in the proof of Proposition 1, that

$$\begin{aligned}&{\mathbb {E}}\left( \Phi \bigg [ \dfrac{\varvec{Y}^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + \varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\} \varvec{Y} }} \bigg ]\right) \\&\quad = {\mathbb {E}}\left[ \Phi \bigg \{ \dfrac{\varvec{Y}_B^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + \varvec{Y}_C^\top \varvec{\Omega }\varvec{Y}_C }} \bigg \}\right] \\&\quad = {\mathbb {E}}\left[ \Phi \bigg \{ \dfrac{\varvec{\lambda }^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + \varvec{Y}^\top \{\varvec{\Omega }- \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} \varvec{B} \varvec{\Omega }\}\varvec{Y}}\sqrt{1 + (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}} \bigg \}\right] . \end{aligned}$$

When \(\varvec{B}\) is a nonsingular square matrix, this expectation, which corresponds to the skewing function of the mpdf of \(\widetilde{\varvec{X}}\), reduces to \(\Phi \bigg \{ \dfrac{\varvec{\lambda }^\top \varvec{\Omega }\varvec{B}^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}{\sqrt{1 + (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })^\top (\varvec{B} \varvec{\Omega }\varvec{B}^\top )^{-1} (\widetilde{\varvec{x}} - \varvec{a} - \varvec{B} \varvec{\xi })}} \bigg \},\) thus it follows that \(\widetilde{\varvec{X}} \sim \mathcal {MSN}_p(\varvec{a} + \varvec{B} \varvec{\xi },\varvec{B} \varvec{\Psi }\varvec{B} ^\top , \varvec{B} \varvec{\eta })\). \(\square \)

1.8 Proof of Corollary 3

By considering the partitions of \(\varvec{X}, \varvec{\xi },\varvec{\Psi }\), and \(\varvec{\eta }\), as in Eq. (17), the mpdf of \(\varvec{X}_1\) can be found using Proposition 5, putting \(\varvec{a} = \varvec{0}\) and \(\varvec{B} = (I _{p_1}\, \varvec{0})\) to obtain

$$\begin{aligned} f_{\varvec{X}_1}(\varvec{x}_1)&= 2 \phi _{p_1}(\varvec{x}_1; \varvec{\xi }_1, \varvec{\Omega }_{11} ) \int _{{\mathbb {R}}^p} \Phi \bigg \{ \dfrac{(\varvec{\lambda }_1 + \varvec{\Omega }_{11}^{-1} \varvec{\Omega }_{12} \varvec{\lambda }_2)^\top (\varvec{x}_1 - \varvec{\xi }_1)}{\sqrt{1 + \varvec{y}_2^\top (\varvec{\Omega }_{22}- \varvec{\Omega }_{21} \varvec{\Omega }_{11} ^{-1} \varvec{\Omega }_{12} )\varvec{y}_2 }} \bigg \}\\&\quad \times \phi _p\left( \varvec{y}; \varvec{\lambda },\varvec{\Omega }^{-1}\right) d \varvec{y}\\&= 2 \phi _{p_1}(\varvec{x}_1; \varvec{\xi }_1, \varvec{\Omega }_{11} ) \int _{{\mathbb {R}}^{p_2}} \Phi \bigg \{ \dfrac{(\varvec{\lambda }_1 + \varvec{\Omega }_{11}^{-1} \varvec{\Omega }_{12} \varvec{\lambda }_2)^\top (\varvec{x}_1 - \varvec{\xi }_1)}{\sqrt{1 + \varvec{y}_2^\top (\varvec{\Omega }_{22}- \varvec{\Omega }_{21} \varvec{\Omega }_{11} ^{-1} \varvec{\Omega }_{12} )\varvec{y}_2 }} \bigg \}\\&\quad \times \phi _{p_2}\left( \varvec{y}_2; \varvec{\lambda }_2,\varvec{\Omega }_{22\cdot 1} ^{-1}\right) d \varvec{y}_2,~\varvec{x}_1 \in {\mathbb {R}}^{p_1}, \end{aligned}$$

where \(\varvec{y} = (\varvec{y}_1^\top , \varvec{y}_2^\top )^\top \), with \(\varvec{y}_i \in {\mathbb {R}}^{p_i}\), and \(\varvec{\lambda }= (\varvec{\lambda }_1^\top , \varvec{\lambda }_2^\top )^\top \), with \(\varvec{y}_i, \varvec{\lambda }_i \in {\mathbb {R}}^{p_i}\), for \(i = 1,2\), and \(\varvec{\Omega }= (\varvec{\Omega }_{ij})\), with \(\varvec{\Omega }_{ij} = \varvec{\Psi }_{ij} + \varvec{\eta }_i \varvec{\eta }_j ^\top \), \(i,j =1,2\). Finally, by using the relations described at the beginning of the corollary we have, after some algebra, that \(\dfrac{\varvec{\lambda }_1 + \varvec{\Omega }_{11}^{-1} \varvec{\Omega }_{12} \varvec{\lambda }_2}{\sqrt{1 + \varvec{y}_2^\top (\varvec{\Omega }_{22}- \varvec{\Omega }_{21} \varvec{\Omega }_{11} ^{-1} \varvec{\Omega }_{12} ) \varvec{y}_2}}=\dfrac{1}{\sqrt{1 + \varvec{y}_2^\top \varvec{\Omega }_{22\cdot 1} \varvec{y}_2 }\sqrt{1-\varvec{\eta }^\top \varvec{\Omega }^{-1} \varvec{\eta }}}\,\varvec{\Omega }_{11}^{-1}\varvec{\eta }_1=\sqrt{\dfrac{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}{1 + \varvec{y}_2^\top \varvec{\Omega }_{22\cdot 1}\varvec{y}_2}}\dfrac{1}{\sqrt{1+\varvec{\eta }_1^\top \varvec{\Psi }_{11}^{-1} \varvec{\eta }_1}}\,\varvec{\Psi }_{11}^{-1}\varvec{\eta }_1=\varvec{\lambda }_{1\cdot 2}\). \(\square \)

1.9 Proof of Proposition 6

To prove this, we will show that the density function of \(\mathcal {MSN}_1(0,1, \eta )\) is not a log-concave function. The log-density of \(\mathcal {MSN}_1(0,1, \eta )\) is

$$\begin{aligned} \log \{f(x)\} = \log \Bigg [ 2 \phi (x;0,1+\eta ^2) \Phi \Bigg \{ \dfrac{\eta x/\sqrt{1+\eta ^2}}{\sqrt{1+x^2/(1+\eta ^2)}} \Bigg \} \Bigg ]. \end{aligned}$$

The second derivative of \(\log \{f(x)\}\) with respect to x is

$$\begin{aligned} \dfrac{\textrm{d}^2 \log \{f(x)\}}{\textrm{d}x^2}&= \dfrac{1}{1+\eta ^2} \Bigg [-1 - \dfrac{1}{(1+t^2)^3} \Bigg \{ \dfrac{\phi (\eta t/\sqrt{1+t^2})}{\Phi (\eta t/\sqrt{1+t^2})} \Bigg \}^2\\&\quad - \dfrac{1}{\Phi (\eta t/\sqrt{1+t^2})} \Bigg \{ \dfrac{3t \phi (\eta t/\sqrt{1+t^2}) }{(1+t^2)^{5/2}} \\&\quad + \dfrac{\eta ^2 t \phi (\eta t/\sqrt{1+t^2}) }{(1+t^2)^{7/2}} \Bigg \} \Bigg ], \end{aligned}$$

where \(t = x/\sqrt{1+\eta ^2}\). It has been found numerically that the sign of \(\dfrac{\textrm{d}^2 \log \{f(x)\}}{\textrm{d}x^2}\) changes for \(x \in (-10,10)\) for various values of \(\eta \). Thus, \(\log \{f(x)\}\) is not always log-concave. \(\square \)

1.10 Proof of Proposition 7

Let \(X \sim \mathcal {MSN}_1(\xi ,\Psi ,\eta )\). Then, the pdf of X is

$$\begin{aligned} f_X (x) = 2 \phi (x;\xi ,\Psi + \eta ^2) \Phi \bigg \{ \dfrac{\eta }{\sqrt{\Psi }} \dfrac{(x - \xi )}{\sqrt{\Psi + \eta ^2 + (x- \xi )^2}} \bigg \}, \quad x \in {\mathbb {R}}. \end{aligned}$$

For \(x < \xi \) and for \(\eta > 0, \dfrac{\eta }{\sqrt{\Psi }} \dfrac{(x - \xi )}{\sqrt{\Psi + \eta ^2 + (x- \xi )^2}} \rightarrow -\infty \), and for \(x > \xi \), \( \dfrac{\eta }{\sqrt{\Psi }} \dfrac{(x - \xi )}{\sqrt{\Psi + \eta ^2 + (x- \xi )^2}} \rightarrow \infty \), when \(\Psi \rightarrow 0\). As a consequence, we have, as \(\Psi \rightarrow 0\): \(f_X(x) \rightarrow 0\) if \(x < \xi \) and \(f_X(x) \rightarrow 2 \phi (x;\xi , \eta ^2)\) if \(x > \xi \), which completes the proof for \(\eta >0\). The proof for \(\eta < 0\) is similar. \(\square \)

1.11 Proof of Proposition 8

The likelihood function for \((\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) based on a random sample \(\varvec{x}_1,\ldots ,\varvec{x}_n\) from \(\varvec{X} \sim \mathcal {MSN}_p(\varvec{\xi }, \varvec{\Psi }, \varvec{\eta })\) is

$$\begin{aligned}{} & {} L(\varvec{\xi },\varvec{\Psi },\varvec{\eta }) = \prod _{i=1}^n 2 \phi _p\left( \varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top \right) \\ {}{} & {} \quad \times \Phi \bigg \{ \dfrac{1}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}} \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi })}{\sqrt{1+(\varvec{x}_i-\varvec{\xi })^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) ^{-1} (\varvec{x}_i -\varvec{\xi })}}\bigg \}, \end{aligned}$$

which becomes the profile likelihood function for the skewness parameter \(\varvec{\eta }\) when \(\varvec{\xi }\) and \(\varvec{\Psi }\) are fixed. Similar to the \(\mathcal{S}\mathcal{N}\), it can be argued exactly in the same way as in the proof of Proposition 13 that the maximum likelihood estimator of the skewness parameter is always finite for the \(\mathcal {MSN}\) distribution as well. Indeed, as in the \(\mathcal{S}\mathcal{N}\) case, for the skewing function of the \(\mathcal {MSN}\) distribution we have

$$\begin{aligned} 0&\le \Phi \bigg \{-\sqrt{(\varvec{x}_i - \varvec{\xi })^\top \varvec{\Psi }^{-1} (\varvec{x}_i -\varvec{\xi })} \bigg \}\\ {}&\le \Phi \bigg \{ \dfrac{1}{\sqrt{1+\varvec{\eta }^\top \varvec{\Psi }^{-1} \varvec{\eta }}} \dfrac{\varvec{\eta }^\top \varvec{\Psi }^{-1} (\varvec{x}_i - \varvec{\xi })}{\sqrt{1+(\varvec{x}_i-\varvec{\xi })^\top (\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) ^{-1} (\varvec{x}_i -\varvec{\xi })}}\bigg \}\\&\le \Phi \bigg \{\sqrt{(\varvec{x}_i - \varvec{\xi })^\top \varvec{\Psi }^{-1} (\varvec{x}_i -\varvec{\xi })} \bigg \} \le 1,\quad \forall \, \varvec{\eta }\in {\mathbb {R}}^p. \end{aligned}$$

Also, as we already have established in the proof of Proposition 13, \(\phi _p (\varvec{x}_i;\varvec{\xi },\varvec{\Psi }+ \varvec{\eta }\varvec{\eta }^\top ) \rightarrow 0\) for each \(~i=1,\ldots ,n\) as some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). Hence, for any fixed value of \(\varvec{\xi }\in {\mathbb {R}}^p\) and \(\varvec{\Psi }>0\), we can say that \(L(\varvec{\xi },\varvec{\Psi },\varvec{\eta }) \rightarrow 0\) whenever some (or all) components of \(\varvec{\eta }\) tend to \(\pm \infty \). This observation leads us to the conclusion that \(L(\varvec{\xi },\varvec{\Psi },\varvec{\eta })\) is not a monotonically increasing or decreasing function of any of the components of \(\varvec{\eta }\). Thus, the profile likelihood of the skewness parameter \(\varvec{\eta }\) is always maximized at a finite point for the \(\mathcal {MSN}\) family.

1.12 Proof of Proposition 9

The non-singularity of the matrix \(\varvec{i}_{\varvec{\Psi }\varvec{\Psi }}\) is just a special case of a more general result proven in Hallin and Paindaveine (2006). Thus, the Fisher information matrix \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is nonsingular if

$$\begin{aligned} \varvec{I}_{\varvec{\xi }\varvec{\eta }} = \begin{bmatrix} \varvec{i}_{\varvec{\xi }\varvec{\xi }} &{} \varvec{i}_{\varvec{\xi }\varvec{\eta }}\\ \varvec{i}_{\varvec{\xi }\varvec{\eta }} &{} \varvec{i}_{\varvec{\eta }\varvec{\eta }} \end{bmatrix} \end{aligned}$$

is nonsingular. Let \({\mathbb {E}} \left( \dfrac{\varvec{Z} \varvec{Z}^\top }{\sqrt{1+\varvec{Z}^\top \varvec{Z}}} \right) = \varvec{U}\) and \({\mathbb {E}} \left( \dfrac{\varvec{Z} \varvec{Z}^\top }{1+\varvec{Z}^\top \varvec{Z}}\right) = \varvec{V}\). Then, \(\varvec{I}_{\varvec{\xi }\varvec{\eta }}\) can be written as

$$\begin{aligned} \varvec{I}_{\varvec{\xi }\varvec{\eta }} = \varvec{\Psi }^{-1/2}\begin{bmatrix} I _p &{} \sqrt{\dfrac{2}{\pi }} \varvec{U}\\ \sqrt{\dfrac{2}{\pi }} \varvec{U} &{} \dfrac{2}{\pi } \varvec{V} \end{bmatrix} \varvec{\Psi }^{-1/2}. \end{aligned}$$

Thus, we conclude that \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is nonsingular iff the matrix \(\varvec{V} - \varvec{U}^2\) is nonsingular.

Let \(R=|\varvec{Z}|\), \(\varvec{W}=\varvec{Z}/R\) and \(\varvec{Z}_*=\varvec{Z}/\sqrt{1+R^2}.\) Since \(\varvec{Z}= R \varvec{W}\sim {\mathcal {N}}_p(\varvec{0},I _p)\), R and \(\varvec{W}\) are independent. Also, we know that \({\mathbb {E}}(\varvec{W})=\varvec{0}\) and \({\mathbb {V}} ar (\varvec{W})={\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p) I _p\). Now, note that \(\varvec{Z}_*=R_* \varvec{W}\), where \(R_*=R/\sqrt{1+R^2}\), and so is independent of \(\varvec{W}\). Then, we have \(\varvec{U}={\mathbb {C}} ov (\varvec{Z},\varvec{Z}_*)={\mathbb {C}} ov (R \varvec{W},R_* \varvec{W})={\mathbb {E}}(RR_*){\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p){\mathbb {E}}(R^2/\sqrt{1+R^2})I _p\) and \(\varvec{V}={\mathbb {V}}ar (\varvec{Z}_*)={\mathbb {V}}ar (R_* \varvec{W})={\mathbb {E}}(R_*^2){\mathbb {E}}(\varvec{W} \varvec{W}^\top )=(1/p)E(R_*^2)I _p\). Hence, \(\varvec{V}- \varvec{U}^2\) is positive definite iff \(p{\mathbb {E}}(R_*^2)-\{{\mathbb {E}}(RR_*)\}^2>0\), i.e., iff \(p{\mathbb {E}}\{R^2/(1+R^2)\}-\{{\mathbb {E}}(R^2/\sqrt{1+R^2})\}^2>0\).

Now, by the Cauchy–Schwartz inequality, we get

$$\begin{aligned}{} & {} \bigg \{ {\mathbb {E}} \left( \dfrac{R^2}{\sqrt{1+R^2}} \right) \bigg \}^2 \le {\mathbb {E}}(R^2) {\mathbb {E}} \left( \dfrac{R^2}{1+R^2} \right) \\ {}{} & {} \Rightarrow p{\mathbb {E}}\{R^2/(1+R^2)\}-\{{\mathbb {E}}(R^2/\sqrt{1+R^2})\}^2 \ge 0,~as ~ {\mathbb {E}}(R^2)=p. \end{aligned}$$

Since equality in the previous inequality cannot be achieved in this case, we conclude that \(\varvec{V}- \varvec{U}^2\) is positive definite, and, consequently, \(\varvec{I}(\varvec{\xi },\varvec{\Psi }, \varvec{0})\) is also positive definite. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondal, S., Arellano-Valle, R.B. & Genton, M.G. A multivariate modified skew-normal distribution. Stat Papers 65, 511–555 (2024). https://doi.org/10.1007/s00362-023-01397-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-023-01397-1

Keywords

Navigation