Japanese Journal of Statistics and Data Science

, Volume 2, Issue 2, pp 529–544

# Multivariate skew distributions with mode-invariance through the transformation of scale

• Toshihiro Abe
• Hironori Fujisawa
Original Paper Information Theory and Statistics

## Abstract

The skew-symmetric distribution is often-used as a skew distribution, but it is not always unimodal even when the underlying distribution is unimodal. Recently, another type of skew distribution was proposed using the transformation of scale (ToS). It is always unimodal and shows the monotonicity of skewness. In this paper, a multivariate skew distribution is considered using the ToS. The skewness for the multivariate skew distribution is proposed and the monotonicity of skewness is shown. The proposed multivariate skew dist ribution is more flexible than the conventional multivariate skew-symmetric distributions. This is illustrated in numerical examples. Additional properties are also presented, including random number generation, half distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution.

## Keywords

Multivariate skew distributions Skewness Transformation of scale Unimodality

## 1 Introduction

In recent years, there have been many studies on skew-symmetric distributions. A remarkable idea was proposed by Azzalini (1985). Let $$\psi (x)$$ be a symmetric density function and let G(x) be a distribution function. A skew-symmetric distribution is given by
\begin{aligned} 2G(\lambda x)\psi (x), \end{aligned}
where $$G'(x)$$ is a symmetric function and $$\lambda$$ is the skew parameter. In particular, when the density and distribution functions are normal, the corresponding distribution is called a skew-normal distribution. It has some useful properties, including unimodality. The multiplicative factor G was extended to a more general function by Wang et al. (2004). Various related studies were introduced and reviewed by Genton (2004) and Azzalini (2005).
The density function $$2G(\lambda x)\psi (x)$$ is very flexible, but not necessarily unimodal even if $$\psi (x)$$ is unimodal. It is often a hard task to investigate whether a new density is unimodal. To avoid it, a different type of skew distribution was considered by Jones (2014) through the transformation of scale (ToS) with the transformed density
\begin{aligned} f(x)=\psi (r(x)), \end{aligned}
where the transformation r includes the skew parameter $$\lambda$$. The simple condition $$r'(x)>0$$ ensures unimodality of f(x) when $$\psi (y)$$ is symmetric and unimodal. The normalizing constant remains unchanged for any transformation r when $$s'(y)+s'(-y)=2$$ with $$s(y)=r^{-1}(y)$$. Jones (2014) proposed some examples of the transformation r and showed monotonicity of skewness when the transformation of r is of a special type. Let $$s(y)=y+H(y)$$. Fujisawa and Abe (2015) imposed the condition $$H(0)=0$$ for mode invariance and showed a more general monotonicity of skewness with monotonicity of H.

An extension of the univariate skew distributions to the multivariate skew distributions has been studied. Azzalini and Dalla Valle (1996) proposed a multivariate skew-normal distribution using the idea of a univariate skew-normal distribution. Its extensions were discussed by Azzalini and Capitanio (1999), Azzalini and Capitanio (2003), Ma and Genton (2004) and Wang et al. (2004). The skew-symmetric distribution suffers from the singularity of Fisher information matrix. This has been further investigated (Arellano-Valle and Azzalini 2008; Ley and Paindaveine 2010; Hallin and Ley 2012).

Recently, Jones (2016) proposed a different type of multivariate skew distribution through ToS. The underlying multivariate distribution is restricted to be a sign-symmetric distribution, but a distinguishing feature is that the marginal distribution again belongs to a class of skew distributions through ToS. In this paper, we propose another type of multivariate skew distribution through ToS. The underlying multivariate distribution consists of independent univariate skew distributions through ToS. The correlation structure is incorporated by a multivariate affine transformation. This idea is often-used, but a remarkable feature is that the proposed multivariate skew distribution has a natural monotonicity of skewness for any correlation structure. In addition, the proposed distribution is expected to have a more flexible skewness than multivariate skew-symmetric distributions and does not suffer from the singularity of the Fisher information matrix

This paper is organized as follows. In Sect. 2, the multivariate skew distribution through the transformation of scale is proposed and some examples are illustrated. In Sect. 3, the skewness measure for the multivariate distribution is proposed and then the monotonicity of skewness is shown. Additional properties are seen in Sect. 4, including random number generation, half-distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution. Numerical examples are given in Sect. 5, which demonstrates that the proposed multivariate skew distribution is more flexible than the skew-t distribution.

## 2 Multivariate skew distribution

### 2.1 Density

Let
\begin{aligned} s_j(y_j;\lambda _j)=y_j+H_j(y_j;\lambda _j) \qquad \text{ for } j=1,\ldots ,p. \end{aligned}
Let $$h_j(y_j;\lambda _j)=H_j'(y_j;\lambda _j)$$. We assume that $$s_j'(y_j;\lambda _j)>0$$ and $$h_j(y_j;\lambda _j)$$ is an odd function, which implies that $$|h_j(y_j;\lambda _j)|<1$$. Suppose that $$\psi _j(y_j)$$ is the underlying density which is unimodal and symmetric at zero. Let
\begin{aligned} f_j(z_j;\lambda _j) = \psi _j(r_j(z_j;\lambda _j)), \end{aligned}
where $$r_j(z_j;\lambda _j)=s_j^{-1}(z_j;\lambda _j)$$. The normalizing constant remains unchanged and the density is always unimodal. Assume that $$H_j(0;\lambda _j)=0$$. We see that the mode remains unchanged at zero (Fujisawa and Abe 2015). A basic multivariate skew distribution is proposed by
\begin{aligned} f(\varvec{z};{\varvec{\lambda }}) = \prod _{j=1}^p f_j(z_j;\lambda _j) = \prod _{j=1}^p \psi _j(r_j(z_j;\lambda _j)) = \psi (\varvec{r}(\varvec{z};{\varvec{\lambda }})), \end{aligned}
where $$\psi (\varvec{y}) = \prod _{j=1}^p \psi _j(y_j)$$. A general type through multivariate affine transformation is given by
\begin{aligned} f^*(\varvec{x};\varvec{\mu },\varSigma ,{\varvec{\lambda }})= \,&{} f\left( \varSigma ^{-1/2} (\varvec{x}-\varvec{\mu });{\varvec{\lambda }}\right) |\varSigma |^{-1/2} \nonumber \\= \,&{} \psi \left( \varvec{r}\left( \varSigma ^{-1/2} (\varvec{x}-\varvec{\mu });{\varvec{\lambda }}\right) \right) |\varSigma |^{-1/2}. \end{aligned}
(2.1)
The density has a unique mode at $$\varvec{\mu }$$. The number of skew parameters is the same as in multivariate skew-symmetric distributions (Azzalini 2005).

Here, we mention the difference in the role of skew parameter between the density (2.1) and the multivariate skew-symmetric density $$2 G({\varvec{\lambda }}'\varvec{x}) \psi (\varvec{x};\varOmega )$$. In the multivariate skew-symmetric density, the skewness depends on $$G({\varvec{\lambda }}'\varvec{x})$$ through $${\varvec{\lambda }}'\varvec{x}$$. Hence, the roles of skewness parameters $$\lambda _1,\ldots ,\lambda _p$$ are not independent for the multivariate skew-symmetric distribution because $${\varvec{\lambda }}'\varvec{x}$$ is one-dimensional, not multi-dimensional, but still independent for the density (2.1). For example, consider the case where $$\lambda _1$$ is very large and $$\lambda _2$$ is not large. A change of the skew parameter $$\lambda _2$$ is not significant for the multivariate skew-symmetric distribution, but significant for the density (2.1).

### 2.2 Illustrative example

Consider the function
\begin{aligned} H(y;\lambda ) = \left\{ \begin{array}{ll} \displaystyle { \alpha _\lambda \frac{\sqrt{1+\lambda ^2 y^2} - 1}{\lambda } } &{} \quad \text{ for } \lambda \ne 0, \\ \quad 0 &{} \quad \text{ for } \lambda =0, \end{array} \right. \end{aligned}
where $$\alpha _\lambda =1-e^{-\lambda ^2}$$. Let $$s(y;\lambda )=y+H(y;\lambda )$$ and $$r(z;\lambda )=s^{-1}(z;\lambda )$$. We have
\begin{aligned} r(z;\lambda ) = \left\{ \begin{array}{ll} {\displaystyle \frac{ \lambda z + \alpha _\lambda - \alpha _\lambda \sqrt{ (\lambda z+\alpha _\lambda )^2 + 1-\alpha _\lambda ^2 } }{ \lambda (1-\alpha _\lambda ^2) } } &{} \quad \text{ for } \lambda \ne 0, \\ \quad z &{} \quad \text{ for } \lambda =0. \end{array} \right. \end{aligned}
(2.2)
Let $$\phi (y)$$ be the standard normal density function. The function $$f(z;\lambda )=\phi (r(z;\lambda ))$$ is a univariate skew density function. The reason that the above function H implies favorable properties on the skew distribution was given by Fujisawa and Abe (2015). Consider the multivariate skew distribution (2.1) with $$r_j(z_j;\lambda _j)=r(z_j;\lambda _j)$$ and $$\psi _j(y_j)=\phi (y_j)$$. Figures 1 and 2 illustrate some shapes and contours for $$p=2$$. The mode remains unchanged at zero. We can see that the skewness monotonically increases as the skew parameter $$\lambda _1$$ increases.
Many types of the transformation r were proposed by Jones (2014) and Fujisawa and Abe (2015). We can also use many symmetric distributions as the underlying density $$f_j$$, including the symmetric type of the sinh-arcsinh distribution (Jones and Pewsey 2009), given by
\begin{aligned} \frac{\delta }{\sqrt{2\pi (1+x^2)}} C(x;\delta ) \exp \left\{ - S(x;\delta )^2/2\right\} , \end{aligned}
(2.3)
where $$S(x;\delta )=\sinh (\delta \sinh ^{-1}(x))$$ and $$C(x;\delta )=\cosh (\delta \sinh ^{-1}(x))$$.

## 3 Skewness

### 3.1 Skewness measure for the multivariate case

There are some skewness measures for the univariate case. In this subsection, we extend the density-based skewness measure by Critchley and Jones (2008) to the multivariate case.

Let g(x) be a unimodal density and let $$x_{\mathrm{m}}$$ be the mode of g. Let $$x_L(c)$$ and $$x_R(c)$$ be the left- and right-side solutions of $$g(x)=c g(x_{\mathrm{m}})$$ for $$0<c<1$$. We see that $$x_{\mathrm{m}}-x_L(c)$$ and $$x_R(c)-x_{\mathrm{m}}$$ are the left- and right-side distances from the mode to the graph of g. Critchley and Jones (2008) defined the density-based skewness measure by
\begin{aligned} \gamma (c) = \frac{ \left( x_R(c) - x_{\mathrm{m}} \right) - \left( x_\mathrm{m} - x_L(c) \right) }{ \left( x_R(c) - x_{\mathrm{m}} \right) + \left( x_{\mathrm{m}} - x_L(c) \right) } = \frac{ x_R(c)+x_L(c)-2x_{\mathrm{m}} }{ x_R(c)-x_L(c) }. \end{aligned}
We see that g(x) is symmetric at $$x_{\mathrm{m}}$$ if and only if $$\gamma (c)=0$$ for any c. Note that the region of c is $$0<c<1$$ when g is continuous and has the domain on the whole line, but it is restricted, e.g., into the region $$\max \{ g(L-)/g(x_{\mathrm{m}}), g(R+)/g(x_{\mathrm{m}}) \}<c < 1$$ when g is continuous and has the domain (LR).
Let $$g(\varvec{x})$$ be a multivariate unimodal density and let $$\varvec{x}_{\mathrm{m}}$$ be the mode of g. Let $$a_+(c;\,\varvec{x}_{\mathrm{0}})$$ and $$a_-(c;\,\varvec{x}_{\mathrm{0}})$$ be the positive and negative solutions of $$g(\varvec{x}_{\mathrm{m}}+a\varvec{x}_{\mathrm{0}})=c g(\varvec{x}_{\mathrm{m}})$$ for the direction $$\varvec{x}_{\mathrm{0}}$$ ($$\Vert \varvec{x}_{\mathrm{0}}\Vert =1$$) and $$0<c<1$$. In this paper, we define the skewness measure by
\begin{aligned} \gamma (c;\,\varvec{x}_{\mathrm{0}}) = \frac{ a_+(c;\,\varvec{x}_{\mathrm{0}})+a_-(c;\,\varvec{x}_{\mathrm{0}}) }{a_+(c;\,\varvec{x}_{\mathrm{0}})-a_-(c;\,\varvec{x}_{\mathrm{0}}) }. \end{aligned}
This is a natural extension of the skewness $$\gamma (c)$$. The total skewness measure can also be defined, e.g., by the mean of $$\gamma _j(c;\,\varvec{x}_{\mathrm{0}})$$ with an appropriate measure on $$\varvec{x}_{\mathrm{0}}$$.

The case $$\varvec{x}_{\mathrm{0}}=(1,1)^t/\sqrt{2}$$ is illustrated in Fig. 2. We clearly see that as the skewness parameter increases, $$a_+(c;\,\varvec{x}_{\mathrm{0}})$$ and $$a_-(c;\,\varvec{x}_{\mathrm{0}})$$ increase and then the simple skewness measure $$a_+(c;\,\varvec{x}_{\mathrm{0}})+a_-(c;\,\varvec{x}_{\mathrm{0}})$$ increases. It also seems that the skewness measure $$\gamma (c;\,\varvec{x}_{\mathrm{0}})$$ increases as the skewness parameter increases. The monotonicity of skewness is considered in Sect. 3.2.

### 3.2 Monotonicity of skewness

The following is the main theorem for monotonicity of skewness.

### Theorem 1

Let$$f^*$$be the multivariate skew density function defined by (2.1). Assume that$$\psi _j(y_j)$$is differentiable and strictly unimodal, more precisely,$$\psi _j'(y_j)=0$$only when$$y_j=0$$and
\begin{aligned} \frac{\partial H_j(y_j;\lambda _j)}{\partial \lambda _j} > 0 \end{aligned}
except for$$y_j=0$$. Let$$a_+(c;\,\varvec{x}_{\mathrm{0}})$$and$$a_-(c;\,\varvec{x}_{\mathrm{0}})$$be the values defined in Sect. 3.1from the unimodal density$$f^*$$. Let$$\varvec{z}_{\mathrm{0}}= \varSigma ^{-1/2} \varvec{x}_{\mathrm{0}}=(z_{01},\ldots ,z_{0p})^t$$. Then,
\begin{aligned} \mathrm{sgn}\left( \frac{\partial a_+(c;\,\varvec{x}_{\mathrm{0}})}{\partial \lambda _j} \right) = \mathrm{sgn}(z_{0j}), \qquad \mathrm{sgn}\left( \frac{\partial a_-(c;\,\varvec{x}_{\mathrm{0}})}{\partial \lambda _j} \right) = \mathrm{sgn}(z_{0j}), \end{aligned}
which implies
\begin{aligned} \mathrm{sgn}\left( \frac{\partial \gamma (c;\,\varvec{x}_{\mathrm{0}})}{\partial \lambda _j} \right) = \mathrm{sgn}(z_{0j}). \end{aligned}

The theorem shows that as $$\lambda _j$$ increases, the skewness measure $$\gamma (c;\,\varvec{x}_{\mathrm{0}})$$ monotonically increases/decreases, according to the sign of $$z_{0j}$$. A distinguishing point is that we obtain the theorem by virtue of only one essential assumption related to the monotonicity of H. Note that the above theorem makes a stronger statement than monotonicity of skewness, because it shows the monotonicity of $$a_+(c;\,\varvec{x}_{\mathrm{0}})$$ and $$a_-(c;\,\varvec{x}_{\mathrm{0}})$$. In the univariate case, the proof for the monotonicity of skewness was easy because the skewness measure could be expressed in a closed form. In the multivariate case, skewness measure cannot be expressed in a closed form and hence the proof is not easy. The proof is given in Sect. 3.4.

### 3.3 Examples of $$H_j$$

In this subsection, we use the notation H instead of $$H_j$$ for simplicity.

The simplest example of $$H(y;\lambda )$$ satisfying the monotonicity assumption of Theorem 1 is a linear function of $$\lambda$$; $$H(y)=\lambda H_a(y)$$ with an appropriate function $$H_a(y)$$. In a similar manner, we can make many types of $$H(y;\lambda )$$, which consists of independent functions of y and $$\lambda$$. Another interesting type of $$H(y;\lambda )$$ can be obtained from the following proposition.

### Proposition 1

Let$$h_0(y)$$be a strictly monotone increasing odd differentiable function with$$|h_0(y)|<1$$. Let $$H_0(y)=\int _0^y h_0(t)dt$$. Let
\begin{aligned} H(y;\lambda ) = \left\{ \begin{array}{ll} \displaystyle { H_0(\lambda y)/\lambda } &{} \quad \text{ for } \lambda \ne 0, \\ \quad 0 &{} \quad \text{ for } \lambda =0. \end{array} \right. \end{aligned}
Then,$$|\partial H(y;\lambda )/\partial y |<1$$and$$\partial H(y;\lambda )/\partial \lambda > 0$$except for$$y=0$$. In addition, if$$\lim _{y\rightarrow \pm \infty } h_0(y)=\pm 1$$, then$$\lim _{\lambda \rightarrow \pm \infty } H(y;\lambda )=\pm |y|$$.

### Proof

We have $$\partial H(y;\lambda )/\partial y = h_0(\lambda y)$$, which implies $$|\partial H(y;\lambda )/\partial y |<1$$. For $$\lambda \ne 0$$, we see that
\begin{aligned} \frac{\partial H(y;\lambda )}{\partial \lambda } = - \frac{1}{\lambda ^2} H_0( \lambda y) + \frac{y}{\lambda } h_0(\lambda y) = \frac{1}{\lambda ^2} \left\{ - H_0( \lambda y) + \lambda y h_0(\lambda y) \right\} . \end{aligned}
Note that $$H_0(y)=\int _0^y h_0(t) dt< y h_0(y)$$ except for $$y=0$$ since $$h_0(y)$$ is a strictly monotone increasing odd function. Hence, we have $$\partial H(y;\lambda )/\partial \lambda > 0$$ except for $$y=0$$. The case $$\lambda =0$$ can be shown in a similar manner. This completes the proof of the first part. Suppose that $$\lim _{y\rightarrow \pm \infty } h_0(y)=\pm 1$$. It holds that
\begin{aligned} \lim _{\lambda \rightarrow \pm \infty } H(y;\lambda ) = \lim _{\lambda \rightarrow \pm \infty } \frac{H_0(\lambda y)}{\lambda } = \lim _{\lambda \rightarrow \pm \infty } y\, h_0(\lambda y) = \pm |y|. \end{aligned}
The proof of the second part is complete. $$\square$$

The property $$|\partial H(y;\lambda )/\partial y |<1$$ is necessary to satisfy $$s'(y;\lambda )>0$$, as mentioned in Sect. 2. The property $$\partial H(y;\lambda )/\partial \lambda > 0$$ except for $$y=0$$ corresponds to the assumption in Theorem 1. The last property is related to a half distribution. For details, see Sect. 4.

A typical example of $$h_0(y)$$ satisfying the conditions on Proposition 1 is $$h_0(y)=2G(y)-1$$, where G(y) is a distribution function and $$G'(y)$$ is a symmetric function. We can make many examples of $$H(y;\lambda )$$ based on the distribution functions. Such examples can be seen in Jones (2014) and Fujisawa and Abe (2015).

We can also obtain a sign-symmetric property when $$H_j(y_j;\lambda _j)$$ is of a special type given in Proposition 1.

### Theorem 2

Let$$D=\mathrm{diag}(d_1,\ldots ,d_p)$$, where$$d_j \in \{ \pm 1\}$$. If the random vector$$\varvec{x}$$has the density$$f^*(\varvec{x};\varvec{\mu },\varSigma ,{\varvec{\lambda }})$$where$$H_j(y_j;\lambda _j)$$is of the special type given in Proposition 1, then $$D\varvec{x}$$has the density$$f^*(\varvec{x};D\varvec{\mu },D\varSigma D,D{\varvec{\lambda }})$$.

### Proof

Using $$r_j=s_j^{-1}$$, we have
\begin{aligned} y_j= \,&{} s_j(r_j(y_j; \lambda _j d_j); \lambda _j d_j)\\= \,&{} r_j(y_j; \lambda _j d_j) + \frac{1}{\lambda _j d_j}H_0(\lambda _j d_j r_j(y_j; \lambda _j d_j))\\= \,&{} \frac{1}{d_j}s_j (d_j r_j(y_j; \lambda _j d_j); \lambda _j), \end{aligned}
which implies that $$r_j(d_j y_j; \lambda _j) = d_j r_j(y_j; \lambda _j d_j)$$. Then we have $$\varvec{r}(D\varvec{y};{\varvec{\lambda }})=D\, \varvec{r}( \varvec{y}; D{\varvec{\lambda }})$$. Note that $$|\text{ det }(D)|=1$$ since $$d_j \in \{\pm 1\}$$. Remember that $$\psi (\varvec{y})=\psi (D\varvec{y})$$ because $$\psi (\varvec{y})=\prod _{j=1}^p \psi _j(y_j)$$ and $$\psi (y_j)$$ is an even function. The probability density function of $$\varvec{u}=D\varvec{x}$$ is given by
\begin{aligned} f_U(\varvec{u})= \,&{} f^*(D^{-1}\varvec{u}) \\= \,&{} \psi \left( \varvec{r}\left( \varSigma ^{-1/2}\left( D^{-1}\varvec{u}-\varvec{\mu }\right) ; {\varvec{\lambda }}\right) \right) |\varSigma |^{-1/2} \\= \,&{} \psi \left( \varvec{r}\left( D(D\varSigma ^{1/2} D)^{-1}\left( \varvec{u}-D\varvec{\mu }\right) ; {\varvec{\lambda }}\right) \right) |D\varSigma D|^{-1/2} \\= \,&{} \psi \left( D\, \varvec{r}\left( (D\varSigma ^{1/2} D)^{-1}\left( \varvec{u}-D\varvec{\mu }\right) ; D{\varvec{\lambda }}\right) \right) |D\varSigma D|^{-1/2} \\= \,&{} \psi \left( \varvec{r}\left( (D\varSigma ^{1/2} D)^{-1}\left( \varvec{u}-D\varvec{\mu }\right) ; D{\varvec{\lambda }}\right) \right) |D\varSigma D|^{-1/2}. \end{aligned}
We see that $$(D\varSigma ^{1/2} D)^2 = D\varSigma ^{1/2} D^2 \varSigma ^{1/2} D = D\varSigma D$$. The proof is complete. $$\square$$

Another interesting example of $$H(y;\lambda )$$ is given in Sect. 2.2. In this case, we use the distribution function $$h_0(y)={y}/{\sqrt{1+y^2}}$$ with the adjusting factor $$\alpha _\lambda =1-e^{-\lambda ^2}$$. This function is of a special type, because it enables us to simultaneously treat monotonicity of skewness, a closed form of r, and whole line domain, via only one skewness parameter. For details, see Fujisawa and Abe (2015).

### 3.4 Proof of Theorem 1

The values $$a_+(c;\,\varvec{x}_{\mathrm{0}})$$ and $$a_-(c;\,\varvec{x}_{\mathrm{0}})$$ are the positive and negative solutions of $$c f^*(\varvec{0})=f^*(a\varvec{x}_{\mathrm{0}})$$ since the mode of $$f^*$$ is the zero vector. The left-hand side is $$c f^*(\varvec{0})=c \psi (\varvec{r}(\varvec{0}))|\varSigma |^{-1/2}=c \psi (\varvec{0})|\varSigma |^{-1/2}$$. The right-hand side is
\begin{aligned} f^*(a \varvec{x}_0)= \psi (\varvec{r}(\varSigma ^{-1/2} a\varvec{x}_0;{\varvec{\lambda }})) |\varSigma |^{-1/2} = \psi (\varvec{r}(a \varvec{z}_0;{\varvec{\lambda }})) |\varSigma |^{-1/2}. \end{aligned}
Let $$\xi (a;{\varvec{\lambda }}) = \xi _{\varvec{\lambda }}(a) = \psi (\varvec{r}(a \varvec{z}_0;{\varvec{\lambda }}))$$. The solution is expressed as
\begin{aligned} a=a({\varvec{\lambda }}) = \xi _{\varvec{\lambda }}^{-1}(c \psi (\varvec{0}) ). \end{aligned}
Consider the differential of the equation $$c \psi (\varvec{0}) = \xi (a({\varvec{\lambda }});{\varvec{\lambda }})$$ with respect to $$\lambda _j$$. It holds that
\begin{aligned} 0 = \frac{\partial \xi }{\partial a} \frac{\partial a}{\partial \lambda _j} + \frac{\partial \xi }{\partial \lambda _j} . \end{aligned}
Hence,
\begin{aligned} \frac{\partial a}{\partial \lambda _j} = - \frac{\partial \xi }{\partial \lambda _j} \Bigg / \frac{\partial \xi }{\partial a} . \end{aligned}
It follows from simple calculations (Appendix 1) that
\begin{aligned} \frac{\partial \xi }{\partial \lambda _j} = - \frac{\partial \psi }{\partial \varvec{y}^t} \left( I + \frac{\partial \varvec{H}}{\partial \varvec{y}^t}\right) ^{-1} \frac{\partial \varvec{H}}{\partial \lambda _j}, \qquad \frac{\partial \xi }{\partial a} = \frac{\partial \psi }{\partial \varvec{y}^t} \left( I + \frac{\partial \varvec{H}}{\partial \varvec{y}^t}\right) ^{-1} \varvec{z}_0, \end{aligned}
where the differentials are evaluated at $$\varvec{y}=\varvec{r}(a\varvec{z}_{\mathrm{0}};{\varvec{\lambda }})$$.
Let us take into consideration the specific formulas of $$\psi$$ and $$\varvec{H}$$; $$\psi (\varvec{y})=\prod _{j=1}^p \psi (y_j)$$ and $$\varvec{H}(\varvec{y};{\varvec{\lambda }})=(H_1(y_1;\lambda _1),\ldots ,H_p(y_p;\lambda _p))^t$$. Remember that $$\psi _j(y_j)$$ is unimodal and symmetric at zero, and $$\psi _j'(y_j)=0$$ only when $$y_j=0$$. Let $$\psi _j(y_j)= q_j(y_j^2)$$ with $$q_j'(t) < 0$$ for $$t>0$$. We see that
\begin{aligned} \frac{\partial \psi }{\partial y_j}= \,&{} 2 y_j q_j'(y_j^2) \prod _{k \ne j} q(y_k^2), \\ \frac{\partial \varvec{H}}{\partial \varvec{y}^t}= \,&{} \mathrm{diag}\left( \frac{\partial H_1}{\partial y_1},\ldots ,\frac{\partial H_p}{\partial y_p} \right) , \\ \frac{\partial \varvec{H}}{\partial \lambda _j}= \,&{} \left( 0,\ldots ,0,\frac{\partial H_j}{\partial \lambda _j},0,\ldots ,0 \right) ^t. \end{aligned}
Thus,
\begin{aligned} \frac{\partial \xi }{\partial \lambda _j} = - 2 y_j q_j'(y_j^2) \prod _{k \ne j} q(y_k^2) \left( 1+\frac{\partial H_j}{\partial y_j}\right) ^{-1} \frac{\partial H_j}{\partial \lambda _j}. \end{aligned}
Note that $$\prod _{k \ne j} q(y_k^2) >0$$ and $$1+{\partial H_j}/{\partial y_j}=s_j'>0$$. Consider the case $$y_j \ne 0$$. We see that $$q_j'(y_j^2)<0$$ and $${\partial H_j}/{\partial \lambda _j}>0$$. Hence,
\begin{aligned} \mathrm{sgn}\left( \frac{\partial \xi }{\partial \lambda _j} \right) = \mathrm{sgn}(y_j). \end{aligned}
Remember that $$y_j=r_j(a z_{0j})$$. Since $$r_j'>0$$ and $$r_j(0)=0$$, we have
\begin{aligned} \mathrm{sgn}(y_j)=\mathrm{sgn}(a z_{0j}). \end{aligned}
In the case $$y_j=0$$, we have $$\partial \xi /\partial \lambda _j=0$$. Therefore,
\begin{aligned} \mathrm{sgn}\left( \frac{\partial \xi }{\partial \lambda _j} \right) = \mathrm{sgn}(a z_{0j}). \end{aligned}
It also holds that
\begin{aligned} \frac{\partial \xi }{\partial a} = \sum _{k=1}^p 2 y_k z_{0k} q_k'(y_k^2) \prod _{l \ne k} q(y_l^2) \left( 1+\frac{\partial H_k}{\partial y_k}\right) ^{-1} . \end{aligned}
Since $$\mathrm{sgn}(y_k)=\mathrm{sgn}(a z_{0k})$$, we have $$\mathrm{sgn}(y_k z_{0k}) = \mathrm{sgn}(a z_{0k}^2)$$. Note that $$z_{0k}^2 >0$$ for some k because $$\Vert \varvec{z}_{\mathrm{0}}\Vert =\Vert \varSigma ^{-1/2}\varvec{x}_{\mathrm{0}}\Vert >0$$. In a similar manner to described above, it follows that
\begin{aligned} \mathrm{sgn}\left( \frac{\partial \xi }{\partial a} \right) = - \mathrm{sgn}(a). \end{aligned}
Therefore, we have
\begin{aligned} \mathrm{sgn}\left( \frac{\partial a}{\partial \lambda _j} \right) = - \frac{\mathrm{sgn}(a z_{0j})}{-\mathrm{sgn}(a)} = \mathrm{sgn}(z_{0j}). \end{aligned}
Hence, we see that
\begin{aligned} \frac{\partial \gamma }{\partial \lambda _j}= \,&{} \frac{1}{(a_+ - a_-)^2} \left\{ \left( \frac{\partial a_+}{\partial \lambda _j} + \frac{\partial a_-}{\partial \lambda _j} \right) (a_+ - a_-) - (a_+ + a_-) \left( \frac{\partial a_+}{\partial \lambda _j} - \frac{\partial a_-}{\partial \lambda _j} \right) \right\} \\= \,&{} \frac{1}{(a_+ - a_-)^2} \left\{ 2 a_+ \frac{\partial a_-}{\partial \lambda _j} - 2 a_- \frac{\partial a_+}{\partial \lambda _j} \right\} \end{aligned}
and then we have $$\mathrm{sgn}\left( {\partial \gamma }/{\partial \lambda _j} \right) = \mathrm{sgn}(z_{0j})$$. The proof is complete.

Various properties have been discussed for skew distributions in the univariate case, including random number generation, half distribution, parameter orthogonality, non-degenerated Fisher information, entropy maximization distribution, and so on. Some of these properties can be easily extended to the multivariate case.

Random numbers are easily generated in the univariate case (Jones 2014; Fujisawa and Abe 2015). In the multivariate case, random numbers are generated using the formula $$\varvec{x}=\varvec{\mu }+\varSigma \varvec{z}$$, where $$\varvec{z}=(z_1,\ldots ,z_p)^t$$ and $$z_j$$ is the random number generated from the univariate skew distribution $$f_j(z_j)$$.

The function $$H(y)=|y|$$ gives a half distribution in the univariate case (Fujisawa and Abe 2015). When we use $$H_j(y_j)=|y_j|$$, the multivariate skew distribution $$f^*(\varvec{x};\varvec{\mu },\varSigma ,{\varvec{\lambda }})$$ has the restricted half domain $$\{ \varvec{x}; \varvec{q}_j^t (\varvec{x}-\varvec{\mu }) \ge 0 \}$$, where $$\varvec{q}_j$$ is the jth column vector of $$\varSigma ^{-1/2}$$. This type of function $$H(y;\lambda )$$ can be obtained from Proposition 1 as a limiting case.

Let $$I_{\xi \eta }$$ be the Fisher information matrix with respect to the parameters $$\xi$$ and $$\eta$$. Let $$\kappa$$ be the parameter induced on the underlying symmetric density $$\psi (y)=\psi (y;\kappa )$$, e.g., the kurtosis parameter. In the univariate case, the parameter orthogonality was shown, more precisely, $$I_{\xi \kappa }=0$$ for $$\xi =\mu ,\lambda$$ (Jones 2014). In a similar manner, it can be easily shown that the same parameter orthogonality holds in the multivariate case, when $$\mu$$ and $$\lambda$$ are the components of $$\varvec{\mu }$$ and $${\varvec{\lambda }}$$, respectively.

For a univariate skew-symmetric distribution, the Fisher information matrix degenerates when $$\lambda =0$$ (Azzalini 1985). However, for the univariate skew distribution $$f^*$$, it does not degenerate under mild conditions (Fujisawa and Abe 2015). This property also holds in the multivariate case.

In the univariate case, the skew distribution $$f^*(x)=\phi (r((x-\mu )/\sigma ))/\sigma$$ is an entropy maximization distribution in a class of distributions which satisfy $$\mathrm{E}_g[\{r((x-\mu )/\sigma )\}^2]=1$$ and some additional conditions (Fujisawa and Abe 2015). In a similar manner, it can be shown that the multivariate skew distribution $$f^*(\varvec{x})=\phi (\varvec{r}(\varSigma ^{-1/2}(\varvec{x}-\varvec{\mu })))/|\varSigma |^{1/2}$$ is an entropy maximization distribution in a class of distributions which satisfy
\begin{aligned} \mathrm{E}_g\left[ \left\| \varvec{r}\left( \varSigma ^{-1/2}(\varvec{x}-\varvec{\mu })\right) \right\| ^2\right] =p \end{aligned}
(4.1)
and some additional conditions. Note that the condition (4.1) can not be replaced by
\begin{aligned} \mathrm{E}_g\left[ \varvec{r}\left( \varSigma ^{-1/2}(\varvec{x}-\varvec{\mu })\right) \varvec{r}\left( \varSigma ^{-1/2}(\varvec{x}-\varvec{\mu })\right) ^t \right] =I_p, \end{aligned}
because this does not hold due to the skew structure.

## 5 Numerical example

The Australian Institute of Sport (AIS) data were examined by Cook and Weisberg (1994), which contains various biomedical measurements on a group of Australian athletes. The package ‘sn’ in the software ‘R’ includes 13 variables, including 2 discrete ones.

The skew distribution (2.1) with the transformation (2.2) and symmetric sinh-archsinh density in Sect. 2.2, which is called the skew-sinh distribution in this section, was applied to all the pairs of the 11 continuous variables and compared with the skew-t distribution. When maximizing the log-likelihood, the function ‘mst.mle’ in the package ‘sn’ was used for the skew-t distribution and the optimization function ’nlminb’ was used for the skew-sinh distribution, because the function ‘mst.mle’ uses the function ‘nlminb’ as a default optimization function, with appropriate transformations of parameters (e.g. the transformation $$\sigma ^2=\exp (\theta )$$ is used for the variance parameter $$\sigma ^2$$ because the region of variance is $$(0,\infty )$$, but the region of $$\theta$$ is the whole line). The convergence failed in 14 cases for the skew-t distribution and in 6 cases for the skew-sinh distribution, which were included in the previous 14 cases. When the skewness is very large, the difficulty of the maximum likelihood estimation is well-known (Azzalini and Capitanio 1999; Pewsey 2000). In this section, we do not examine this problem any more. From now on, we consider 41 cases where the convergence succeeded both for the skew-sinh and skew-t distributions.

The differences in the maximum of log-likelihood are shown in Fig. 3. There are 34 cases where the maximum of log-likelihood for the skew-sinh distribution is larger than that for the skew-t distribution. Here, we focus on two special cases (WCC, Bfat) and (Fe,BMI) where the difference in the maximum of log-likelihood is very large (42.02 and 34.58). The fitted skew distributions are depicted in Fig. 4. The skew-sinh distribution shows a more skew shape in the case of (WCC, Bfat) and a more triangular shape in the case of (Fe,BMI).

## Notes

### Acknowledgements

Toshihiro Abe was supported in part by JSPS KAKENHI Grant Number 19K11869 and Nanzan University Pache Research Susidy I-A-2 for the 2019 academic year. Hironori Fujisawa was supported in part by JSPS KAKENHI Grant Number 17K00065.

## References

1. Arellano-Valle, R. B., & Azzalini, A. (2008). The centred parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis, 99, 1362–1382.
2. Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171–178.
3. Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32, 159–200.
4. Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61, 579–602.
5. Azzalini, A., & Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew $$t$$-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 367–389.
6. Azzalini, A., & Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83, 715–726.
7. Cook, R. D., & Weisberg, S. (1994). An introduction to regression graphics. New York: Wiley.
8. Critchley, F., & Jones, M. C. (2008). Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis. Scandinavian Journal of Statistics, 35, 415–437.
9. Fujisawa, H., & Abe, T. (2015). A family of skew distributions with mode-invariance through transformation of scale. Statistical Methodology, 25, 89–98.
10. Genton, M. G. (Ed.). (2004). Skew-elliptical distributions and their applications. Boca Raton, FL: Chapman & Hall.
11. Hallin, M., & Ley, C. (2012). Skew-symmetric distributions and fisher information-a tale of two densities. Bernoulli, 18, 747–763.
12. Jones, M. C. (2014). Generating distributions by transformation of scale. Statistica Sinica, 24, 749–771.
13. Jones, M. C. (2016). On bivariate transformation of scale distributions. Communications in Statistics-Theory and Methods, 45, 577–588.
14. Jones, M. C., & Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika, 96, 761–780.
15. Ley, C., & Paindaveine, D. (2010). On the singularity of multivariate skew-symmetric models. Journal of Multivariate Analysis, 101, 1434–1444.
16. Ma, Y., & Genton, M. G. (2004). Flexible class of skew-symmetric distributions. Scandinavian Journal of Statistics, 31, 459–468.
17. Pewsey, A. (2000). Problems of inference for azzalini’s skewnormal distribution. Journal of Applied Statistics, 27, 859–870.
18. Wang, J., Boyer, J., & Genton, M. G. (2004). A skew-symmetric representation of multivariate distributions. Statistica Sinica, 14, 1259–1270.

© Japanese Federation of Statistical Science Associations 2019

## Authors and Affiliations

• Toshihiro Abe
• 1
• Hironori Fujisawa
• 2
• 3
1. 1.Department of Systems and Mathematical SciencesNanzan UniversityShowa-kuJapan
2. 2.The Institute of Statistical MathematicsTachikawaJapan
3. 3.Center for Advanced Intelligence ProjectRIKENChuo-kuJapan