# Multivariate skew distributions with mode-invariance through the transformation of scale

- 394 Downloads
- 1 Citations

## Abstract

The skew-symmetric distribution is often-used as a skew distribution, but it is not always unimodal even when the underlying distribution is unimodal. Recently, another type of skew distribution was proposed using the transformation of scale (ToS). It is always unimodal and shows the monotonicity of skewness. In this paper, a multivariate skew distribution is considered using the ToS. The skewness for the multivariate skew distribution is proposed and the monotonicity of skewness is shown. The proposed multivariate skew dist ribution is more flexible than the conventional multivariate skew-symmetric distributions. This is illustrated in numerical examples. Additional properties are also presented, including random number generation, half distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution.

## Keywords

Multivariate skew distributions Skewness Transformation of scale Unimodality## 1 Introduction

*G*(

*x*) be a distribution function. A skew-symmetric distribution is given by

*G*was extended to a more general function by Wang et al. (2004). Various related studies were introduced and reviewed by Genton (2004) and Azzalini (2005).

*r*includes the skew parameter \(\lambda\). The simple condition \(r'(x)>0\) ensures unimodality of

*f*(

*x*) when \(\psi (y)\) is symmetric and unimodal. The normalizing constant remains unchanged for any transformation

*r*when \(s'(y)+s'(-y)=2\) with \(s(y)=r^{-1}(y)\). Jones (2014) proposed some examples of the transformation

*r*and showed monotonicity of skewness when the transformation of

*r*is of a special type. Let \(s(y)=y+H(y)\). Fujisawa and Abe (2015) imposed the condition \(H(0)=0\) for mode invariance and showed a more general monotonicity of skewness with monotonicity of

*H*.

An extension of the univariate skew distributions to the multivariate skew distributions has been studied. Azzalini and Dalla Valle (1996) proposed a multivariate skew-normal distribution using the idea of a univariate skew-normal distribution. Its extensions were discussed by Azzalini and Capitanio (1999), Azzalini and Capitanio (2003), Ma and Genton (2004) and Wang et al. (2004). The skew-symmetric distribution suffers from the singularity of Fisher information matrix. This has been further investigated (Arellano-Valle and Azzalini 2008; Ley and Paindaveine 2010; Hallin and Ley 2012).

Recently, Jones (2016) proposed a different type of multivariate skew distribution through ToS. The underlying multivariate distribution is restricted to be a sign-symmetric distribution, but a distinguishing feature is that the marginal distribution again belongs to a class of skew distributions through ToS. In this paper, we propose another type of multivariate skew distribution through ToS. The underlying multivariate distribution consists of independent univariate skew distributions through ToS. The correlation structure is incorporated by a multivariate affine transformation. This idea is often-used, but a remarkable feature is that the proposed multivariate skew distribution has a natural monotonicity of skewness for any correlation structure. In addition, the proposed distribution is expected to have a more flexible skewness than multivariate skew-symmetric distributions and does not suffer from the singularity of the Fisher information matrix

This paper is organized as follows. In Sect. 2, the multivariate skew distribution through the transformation of scale is proposed and some examples are illustrated. In Sect. 3, the skewness measure for the multivariate distribution is proposed and then the monotonicity of skewness is shown. Additional properties are seen in Sect. 4, including random number generation, half-distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution. Numerical examples are given in Sect. 5, which demonstrates that the proposed multivariate skew distribution is more flexible than the skew-*t* distribution.

## 2 Multivariate skew distribution

### 2.1 Density

Here, we mention the difference in the role of skew parameter between the density (2.1) and the multivariate skew-symmetric density \(2 G({\varvec{\lambda }}'\varvec{x}) \psi (\varvec{x};\varOmega )\). In the multivariate skew-symmetric density, the skewness depends on \(G({\varvec{\lambda }}'\varvec{x})\) through \({\varvec{\lambda }}'\varvec{x}\). Hence, the roles of skewness parameters \(\lambda _1,\ldots ,\lambda _p\) are not independent for the multivariate skew-symmetric distribution because \({\varvec{\lambda }}'\varvec{x}\) is one-dimensional, not multi-dimensional, but still independent for the density (2.1). For example, consider the case where \(\lambda _1\) is very large and \(\lambda _2\) is not large. A change of the skew parameter \(\lambda _2\) is not significant for the multivariate skew-symmetric distribution, but significant for the density (2.1).

### 2.2 Illustrative example

*H*implies favorable properties on the skew distribution was given by Fujisawa and Abe (2015). Consider the multivariate skew distribution (2.1) with \(r_j(z_j;\lambda _j)=r(z_j;\lambda _j)\) and \(\psi _j(y_j)=\phi (y_j)\). Figures 1 and 2 illustrate some shapes and contours for \(p=2\). The mode remains unchanged at zero. We can see that the skewness monotonically increases as the skew parameter \(\lambda _1\) increases.

*r*were proposed by Jones (2014) and Fujisawa and Abe (2015). We can also use many symmetric distributions as the underlying density \(f_j\), including the symmetric type of the sinh-arcsinh distribution (Jones and Pewsey 2009), given by

## 3 Skewness

### 3.1 Skewness measure for the multivariate case

There are some skewness measures for the univariate case. In this subsection, we extend the density-based skewness measure by Critchley and Jones (2008) to the multivariate case.

*g*(

*x*) be a unimodal density and let \(x_{\mathrm{m}}\) be the mode of

*g*. Let \(x_L(c)\) and \(x_R(c)\) be the left- and right-side solutions of \(g(x)=c g(x_{\mathrm{m}})\) for \(0<c<1\). We see that \(x_{\mathrm{m}}-x_L(c)\) and \(x_R(c)-x_{\mathrm{m}}\) are the left- and right-side distances from the mode to the graph of

*g*. Critchley and Jones (2008) defined the density-based skewness measure by

*g*(

*x*) is symmetric at \(x_{\mathrm{m}}\) if and only if \(\gamma (c)=0\) for any

*c*. Note that the region of

*c*is \(0<c<1\) when

*g*is continuous and has the domain on the whole line, but it is restricted, e.g., into the region \(\max \{ g(L-)/g(x_{\mathrm{m}}), g(R+)/g(x_{\mathrm{m}}) \}<c < 1\) when

*g*is continuous and has the domain (

*L*,

*R*).

*g*. Let \(a_+(c;\,\varvec{x}_{\mathrm{0}})\) and \(a_-(c;\,\varvec{x}_{\mathrm{0}})\) be the positive and negative solutions of \(g(\varvec{x}_{\mathrm{m}}+a\varvec{x}_{\mathrm{0}})=c g(\varvec{x}_{\mathrm{m}})\) for the direction \(\varvec{x}_{\mathrm{0}}\) (\(\Vert \varvec{x}_{\mathrm{0}}\Vert =1\)) and \(0<c<1\). In this paper, we define the skewness measure by

The case \(\varvec{x}_{\mathrm{0}}=(1,1)^t/\sqrt{2}\) is illustrated in Fig. 2. We clearly see that as the skewness parameter increases, \(a_+(c;\,\varvec{x}_{\mathrm{0}})\) and \(a_-(c;\,\varvec{x}_{\mathrm{0}})\) increase and then the simple skewness measure \(a_+(c;\,\varvec{x}_{\mathrm{0}})+a_-(c;\,\varvec{x}_{\mathrm{0}})\) increases. It also seems that the skewness measure \(\gamma (c;\,\varvec{x}_{\mathrm{0}})\) increases as the skewness parameter increases. The monotonicity of skewness is considered in Sect. 3.2.

### 3.2 Monotonicity of skewness

The following is the main theorem for monotonicity of skewness.

### Theorem 1

*Let*\(f^*\)

*be the multivariate skew density function defined by*(2.1).

*Assume that*\(\psi _j(y_j)\)

*is differentiable and strictly unimodal, more precisely,*\(\psi _j'(y_j)=0\)

*only when*\(y_j=0\)

*and*

*except for*\(y_j=0\).

*Let*\(a_+(c;\,\varvec{x}_{\mathrm{0}})\)

*and*\(a_-(c;\,\varvec{x}_{\mathrm{0}})\)

*be the values defined in Sect.*3.1

*from the unimodal density*\(f^*\).

*Let*\(\varvec{z}_{\mathrm{0}}= \varSigma ^{-1/2} \varvec{x}_{\mathrm{0}}=(z_{01},\ldots ,z_{0p})^t\).

*Then,*

*which implies*

The theorem shows that as \(\lambda _j\) increases, the skewness measure \(\gamma (c;\,\varvec{x}_{\mathrm{0}})\) monotonically increases/decreases, according to the sign of \(z_{0j}\). A distinguishing point is that we obtain the theorem by virtue of only one essential assumption related to the monotonicity of *H*. Note that the above theorem makes a stronger statement than monotonicity of skewness, because it shows the monotonicity of \(a_+(c;\,\varvec{x}_{\mathrm{0}})\) and \(a_-(c;\,\varvec{x}_{\mathrm{0}})\). In the univariate case, the proof for the monotonicity of skewness was easy because the skewness measure could be expressed in a closed form. In the multivariate case, skewness measure cannot be expressed in a closed form and hence the proof is not easy. The proof is given in Sect. 3.4.

### 3.3 Examples of \(H_j\)

In this subsection, we use the notation *H* instead of \(H_j\) for simplicity.

The simplest example of \(H(y;\lambda )\) satisfying the monotonicity assumption of Theorem 1 is a linear function of \(\lambda\); \(H(y)=\lambda H_a(y)\) with an appropriate function \(H_a(y)\). In a similar manner, we can make many types of \(H(y;\lambda )\), which consists of independent functions of *y* and \(\lambda\). Another interesting type of \(H(y;\lambda )\) can be obtained from the following proposition.

### Proposition 1

*Let*\(h_0(y)\)

*be a strictly monotone increasing odd differentiable function with*\(|h_0(y)|<1\).

*Let*\(H_0(y)=\int _0^y h_0(t)dt\).

*Let*

*Then,*\(|\partial H(y;\lambda )/\partial y |<1\)

*and*\(\partial H(y;\lambda )/\partial \lambda > 0\)

*except for*\(y=0\).

*In addition, if*\(\lim _{y\rightarrow \pm \infty } h_0(y)=\pm 1\)

*, then*\(\lim _{\lambda \rightarrow \pm \infty } H(y;\lambda )=\pm |y|\).

### *Proof*

The property \(|\partial H(y;\lambda )/\partial y |<1\) is necessary to satisfy \(s'(y;\lambda )>0\), as mentioned in Sect. 2. The property \(\partial H(y;\lambda )/\partial \lambda > 0\) except for \(y=0\) corresponds to the assumption in Theorem 1. The last property is related to a half distribution. For details, see Sect. 4.

A typical example of \(h_0(y)\) satisfying the conditions on Proposition 1 is \(h_0(y)=2G(y)-1\), where *G*(*y*) is a distribution function and \(G'(y)\) is a symmetric function. We can make many examples of \(H(y;\lambda )\) based on the distribution functions. Such examples can be seen in Jones (2014) and Fujisawa and Abe (2015).

We can also obtain a sign-symmetric property when \(H_j(y_j;\lambda _j)\) is of a special type given in Proposition 1.

### Theorem 2

*Let*\(D=\mathrm{diag}(d_1,\ldots ,d_p)\)*, where*\(d_j \in \{ \pm 1\}\)*. If the random vector*\(\varvec{x}\)*has the density*\(f^*(\varvec{x};\varvec{\mu },\varSigma ,{\varvec{\lambda }})\)*where*\(H_j(y_j;\lambda _j)\)*is of the special type given in Proposition* 1, then \(D\varvec{x}\)*has the density*\(f^*(\varvec{x};D\varvec{\mu },D\varSigma D,D{\varvec{\lambda }})\).

### *Proof*

Another interesting example of \(H(y;\lambda )\) is given in Sect. 2.2. In this case, we use the distribution function \(h_0(y)={y}/{\sqrt{1+y^2}}\) with the adjusting factor \(\alpha _\lambda =1-e^{-\lambda ^2}\). This function is of a special type, because it enables us to simultaneously treat monotonicity of skewness, a closed form of *r*, and whole line domain, via only one skewness parameter. For details, see Fujisawa and Abe (2015).

### 3.4 Proof of Theorem 1

*k*because \(\Vert \varvec{z}_{\mathrm{0}}\Vert =\Vert \varSigma ^{-1/2}\varvec{x}_{\mathrm{0}}\Vert >0\). In a similar manner to described above, it follows that

## 4 Additional properties

Various properties have been discussed for skew distributions in the univariate case, including random number generation, half distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution, and so on. Some of these properties can be easily extended to the multivariate case.

Random numbers are easily generated in the univariate case (Jones 2014; Fujisawa and Abe 2015). In the multivariate case, random numbers are generated using the formula \(\varvec{x}=\varvec{\mu }+\varSigma \varvec{z}\), where \(\varvec{z}=(z_1,\ldots ,z_p)^t\) and \(z_j\) is the random number generated from the univariate skew distribution \(f_j(z_j)\).

The function \(H(y)=|y|\) gives a half distribution in the univariate case (Fujisawa and Abe 2015). When we use \(H_j(y_j)=|y_j|\), the multivariate skew distribution \(f^*(\varvec{x};\varvec{\mu },\varSigma ,{\varvec{\lambda }})\) has the restricted half domain \(\{ \varvec{x}; \varvec{q}_j^t (\varvec{x}-\varvec{\mu }) \ge 0 \}\), where \(\varvec{q}_j\) is the *j*th column vector of \(\varSigma ^{-1/2}\). This type of function \(H(y;\lambda )\) can be obtained from Proposition 1 as a limiting case.

Let \(I_{\xi \eta }\) be the Fisher information matrix with respect to the parameters \(\xi\) and \(\eta\). Let \(\kappa\) be the parameter induced on the underlying symmetric density \(\psi (y)=\psi (y;\kappa )\), e.g., the kurtosis parameter. In the univariate case, the parameter orthogonality was shown, more precisely, \(I_{\xi \kappa }=0\) for \(\xi =\mu ,\lambda\) (Jones 2014). In a similar manner, it can be easily shown that the same parameter orthogonality holds in the multivariate case, when \(\mu\) and \(\lambda\) are the components of \(\varvec{\mu }\) and \({\varvec{\lambda }}\), respectively.

For a univariate skew-symmetric distribution, the Fisher information matrix degenerates when \(\lambda =0\) (Azzalini 1985). However, for the univariate skew distribution \(f^*\), it does not degenerate under mild conditions (Fujisawa and Abe 2015). This property also holds in the multivariate case.

## 5 Numerical example

The Australian Institute of Sport (AIS) data were examined by Cook and Weisberg (1994), which contains various biomedical measurements on a group of Australian athletes. The package ‘sn’ in the software ‘R’ includes 13 variables, including 2 discrete ones.

The skew distribution (2.1) with the transformation (2.2) and symmetric sinh-archsinh density in Sect. 2.2, which is called the skew-sinh distribution in this section, was applied to all the pairs of the 11 continuous variables and compared with the skew-*t* distribution. When maximizing the log-likelihood, the function ‘mst.mle’ in the package ‘sn’ was used for the skew-*t* distribution and the optimization function ’nlminb’ was used for the skew-sinh distribution, because the function ‘mst.mle’ uses the function ‘nlminb’ as a default optimization function, with appropriate transformations of parameters (e.g. the transformation \(\sigma ^2=\exp (\theta )\) is used for the variance parameter \(\sigma ^2\) because the region of variance is \((0,\infty )\), but the region of \(\theta\) is the whole line). The convergence failed in 14 cases for the skew-*t* distribution and in 6 cases for the skew-sinh distribution, which were included in the previous 14 cases. When the skewness is very large, the difficulty of the maximum likelihood estimation is well-known (Azzalini and Capitanio 1999; Pewsey 2000). In this section, we do not examine this problem any more. From now on, we consider 41 cases where the convergence succeeded both for the skew-sinh and skew-*t* distributions.

*t*distribution. Here, we focus on two special cases (WCC, Bfat) and (Fe,BMI) where the difference in the maximum of log-likelihood is very large (42.02 and 34.58). The fitted skew distributions are depicted in Fig. 4. The skew-sinh distribution shows a more skew shape in the case of (WCC, Bfat) and a more triangular shape in the case of (Fe,BMI).

## Notes

### Acknowledgements

Toshihiro Abe was supported in part by JSPS KAKENHI Grant Number 19K11869 and Nanzan University Pache Research Susidy I-A-2 for the 2019 academic year. Hironori Fujisawa was supported in part by JSPS KAKENHI Grant Number 17K00065.

## References

- Arellano-Valle, R. B., & Azzalini, A. (2008). The centred parametrization for the multivariate skew-normal distribution.
*Journal of Multivariate Analysis*,*99*, 1362–1382.MathSciNetCrossRefGoogle Scholar - Azzalini, A. (1985). A class of distributions which includes the normal ones.
*Scandinavian Journal of Statistics*,*12*, 171–178.MathSciNetzbMATHGoogle Scholar - Azzalini, A. (2005). The skew-normal distribution and related multivariate families.
*Scandinavian Journal of Statistics*,*32*, 159–200.MathSciNetCrossRefGoogle Scholar - Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*,*61*, 579–602.MathSciNetCrossRefGoogle Scholar - Azzalini, A., & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*,*65*, 367–389.MathSciNetCrossRefGoogle Scholar - Azzalini, A., & Dalla Valle, A. (1996). The multivariate skew-normal distribution.
*Biometrika*,*83*, 715–726.MathSciNetCrossRefGoogle Scholar - Cook, R. D., & Weisberg, S. (1994).
*An introduction to regression graphics*. New York: Wiley.CrossRefGoogle Scholar - Critchley, F., & Jones, M. C. (2008). Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis.
*Scandinavian Journal of Statistics*,*35*, 415–437.MathSciNetCrossRefGoogle Scholar - Fujisawa, H., & Abe, T. (2015). A family of skew distributions with mode-invariance through transformation of scale.
*Statistical Methodology*,*25*, 89–98.MathSciNetCrossRefGoogle Scholar - Genton, M. G. (Ed.). (2004).
*Skew-elliptical distributions and their applications*. Boca Raton, FL: Chapman & Hall.zbMATHGoogle Scholar - Hallin, M., & Ley, C. (2012). Skew-symmetric distributions and fisher information-a tale of two densities.
*Bernoulli*,*18*, 747–763.MathSciNetCrossRefGoogle Scholar - Jones, M. C. (2014). Generating distributions by transformation of scale.
*Statistica Sinica*,*24*, 749–771.MathSciNetzbMATHGoogle Scholar - Jones, M. C. (2016). On bivariate transformation of scale distributions.
*Communications in Statistics-Theory and Methods*,*45*, 577–588.MathSciNetCrossRefGoogle Scholar - Jones, M. C., & Pewsey, A. (2009). Sinh-arcsinh distributions.
*Biometrika*,*96*, 761–780.MathSciNetCrossRefGoogle Scholar - Ley, C., & Paindaveine, D. (2010). On the singularity of multivariate skew-symmetric models.
*Journal of Multivariate Analysis*,*101*, 1434–1444.MathSciNetCrossRefGoogle Scholar - Ma, Y., & Genton, M. G. (2004). Flexible class of skew-symmetric distributions.
*Scandinavian Journal of Statistics*,*31*, 459–468.MathSciNetCrossRefGoogle Scholar - Pewsey, A. (2000). Problems of inference for azzalini’s skewnormal distribution.
*Journal of Applied Statistics*,*27*, 859–870.CrossRefGoogle Scholar - Wang, J., Boyer, J., & Genton, M. G. (2004). A skew-symmetric representation of multivariate distributions.
*Statistica Sinica*,*14*, 1259–1270.MathSciNetzbMATHGoogle Scholar