On Multivariate Skewness and Kurtosis

A unified treatment of all currently available cumulant-based indexes of multivariate skewness and kurtosis is provided here, expressing them in terms of the third and fourth-order cumulant vectors respectively. Such a treatment helps reveal many subtle features and inter-connections among the existing indexes as well as some deficiencies, which are hitherto unknown. Computational formulae for obtaining these measures are provided for spherical and elliptically-symmetric, as well as skew-symmetric families of multivariate distributions, yielding several new results and a systematic exposition of many known results.


Introduction
Using the standard normal distribution as the yardstick, statisticians have defined notions of skewness (asymmetry) and kurtosis (peakedness) in the univariate case. Since all the odd central moments (when they exist) are zero for a symmetric distribution on the real line, a first attempt at measuring asymmetry is to ask how different the 3rd central moment is from zero, although in principle, one could use any other odd central moment, or even a combination of them. Several alternate indexes of asymmetry, using the mode or the quantiles are also available -see for instance (Arnold and Groeneveld, 1995;Averous and Meste, 1997;Ekström and Jammalamadaka, 2012) for recent work and review. Similarly for kurtosis, taking again the standard normal distribution as the yardstick, a coefficient of kurtosis has been developed using the 4th central moment.
When dealing with multivariate distributions, the notions of symmetry, measurement of skewness, as well as of kurtosis, are not uniquely defined. For example, the mode-based approach in Arnold and Groeneveld (1995), although very popular, does not seem extendable to the multivariate case. Also, in interpreting the cumulant-based measures of skewness and kurtosis, one has to pause -especially for the kurtosis if there are multiple peaks.
Our focus in this paper is on multivariate distributions and one may remark that the two fundamental papers by Rao (1948a, b) take us back to the early days of such multivariate analysis. A good starting point is the monograph by Fang et al. (2017) and a more recent review article by Serfling (2004). We consider symmetry of a d-dimensional random vector X, around a given point, which we will assume without loss of generality, to be the origin. Such an X is said to be spherically symmetric or rotationally symmetric if for all (d × d) orthogonal matrices A, X has the same distribution as AX. One may generalize this to elliptical or ellipsoidal symmetry in an obvious way. A more common and practical notion of symmetry in multidimensions, is the reflective or antipodal symmetry, and we say X has this property if it has the same distribution as −X. For measuring departures from such symmetry, various notions of skewness have been proposed by different authors, and the list includes (Mardia, 1970;Malkovich and Afifi, 1973;Isogai, 1982;Srivastava, 1984;Song, 2001) and Móri et al. (1994); see also Sumikawa et al. (2013) for an extension of Mardia's multivariate skewness index to the high-dimensional case. In a broad discussion and analysis of multivariate cumulants, their properties and their use in inference, Jammalamadaka et al. (2006) proposed using the full vector of third and fourth order cumulants, as vectorial measures for multivariate skewness and kurtosis respectively. Such measures based on the cumulant vectors, were further discussed by Balakrishnan et al. (2007) and Kollo (2008). A systematic treatment of asymptotic distributions of skewness and kurtosis indexes can be found in Baringhaus andHenze (1991a, b, 1992), and Klar (2002).
In this paper, one of our primary goals is to look at these disparate looking definitions of skewness and kurtosis based on cumulants which have been proposed in the literature, and to assess, and relate them from a unified perspective in terms of the cumulant vectors discussed in Jammalamadaka et al. (2006). Such a unified treatment helps reveal several relationships and features of the many existing proposals. For example it will be shown: (i) that Mardia (1970) and Malkovich and Afifi (1973) skewness measures can be equivalent; (ii) that Balakrishnan et al. (2007) and Móri et al. (1994) skewness vectors are just proportional to each other; (iii) that Kollo (2008) vectorial measure can be a null vector even for some asymmetric distributions, and (iv) that Srivastava (1984) index is not affine invariant.
In Section 4, we also introduce alternate measures for skewness and Kurtosis based just on the distinct cumulants, and evaluate their performance with some examples in Section 7.
Another significant contribution of the paper is to provide clear and easy formulae for computing cumulants up to the fourth order for spherically symmetric, elliptically symmetric, and skew elliptical families along with several specific examples. These, comprehensive and mostly new, results allow for a straightforward computation of all the indexes of skewness and kurtosis discussed here for several important multivariate distributions.
The analysis presented here is based on the cumulant vectors of the third and fourth order, defined below. In our derivations we utilize an elegant and powerful tool-the so-called T -derivative, which we now describe. Let λ = (λ 1 , . . . , λ d ) be a d-dimensional vector of constants and let φ(λ) = (φ 1 (λ), . . . , φ m (λ)) denote a m-dimensional vector valued function (m ∈ N), which is differentiable in all its arguments. The Jacobian matrix of φ is defined by ...,m;j=1,...,d .
Then the operator D ⊗ λ , which we refer to as the T -derivative, is defined as where the symbol ⊗ here and everywhere else in the paper, denotes the Kronecker (tensor) product. Assuming φ is k times differentiable, the k-th T -derivative is given by  Terdik 2002). We note that if φ X (λ) denotes the characteristic function of a d-dimensional random vector X, then the operator D ⊗k λ applied to φ, and log φ will provide the vector of moments of order k, and cumulants of order k respectively.
One may also refer to MacRae (1974) for a similar definition of a matrix derivative using the tensor product and a differential operator; indeed, the T-derivative we obtain by vectorizing the transposed Jacobian (1.1), can be seen as closely related to that. While these results can also be obtained by using tensor-calculus, as is done in McCullagh (1987) and Speed (1990), our approach is more straightforward and simpler, requiring only the knowledge of calculus of several variables. We believe that using only the tensor products of vectors leads to an intuitive and natural way to deal with higher order moments and cumulants for multivariate distributions, as it will be demonstrated in the paper. Another comprehensive reference on matrix derivatives is the book by Mathai (1997).
The paper is organized as follows: Sections 2 and 3 introduce respectively the skewness and kurtosis vectors, and treat several existing measures of skewness and kurtosis based on these cumulant vectors. Section 4 discusses a linear transformation on the skewness and kurtosis vectors, which helps remove from them redundant/duplicate information, and proposes skewness and kurtosis indexes based on distinct elements of the corresponding vectors. Sections 5 and 6 provide computational formulae for the skewness and kurtosis vectors for spherical, elliptical and asymmetric/skew multivariate distributions. Section 7 provides some examples while Section 8 provides some final considerations. To improve readability of the paper, more technical details and proofs are placed in an Appendix. A word about the notations: bold uppercase letters are used for random vectors and matrices while bold lowercase letters denote their specific values.

Multivariate Skewness
In this section and the next one, it will be shown that all cumulantbased measures of skewness and kurtosis that appear in the literature can be expressed in terms of the third and fourth cumulant vectors respectively. Also, several hitherto unnoticed relationships between different indexes will be brought out.
Let X be a d−dimensional random vector whose first four moments exist. We will denote E X = μ, with a positive-definite variance-covariance matrix Var X = Σ . Consider the standardized vector with zero means and identity matrix for its variance-covariance. A complete picture about the skewness, is contained in the "skewness vector" of X or Y, defined as κ 3 = Cum 3 (Y) . Since the third order cumulants are the same as the third order central moments, we may write Among these d 3 elements, only d(d+1)(d+2)/6 are distinct elements, and in Section 4 we discuss a linear transformation which allows one to get the distinct elements of κ 3 . In the examples below, we denote the unit matrix of dimension k by I k , and the k-vector of ones by 1 k .
The following 6 examples reveal several relationships among various indexes of skewness which appear in the literature, and their connection to the third-order cumulant vector κ 3 , which can actually be seen as the common denominator.
Example 1. Mardia (1970) suggested the square norm of the vector κ 3 as a measure of departure from symmetry, viz.
which is denoted by β 1,d . If Y 1 and Y 2 are two independent copies of Y, then we may write Example 2. Móri et al. (1994), after observing that which contains d unit values per-row whereas all the others are 0; as a consequence, this measure does not take into account the contribution of cumulants of the type (r, s, t) are all different.
Example 3. Kollo (2008), noting the fact that not all third-order mixed moments appear in s(Y), proposes an alternate skewness vector b (Y) which can again be expressed in terms of κ 3 as follows: (2.5) Malkovich-Afifi define their measure as where cos (a, b) indicates the cosine of the angle between the vectors a and b; next note that and sup u cos u ⊗3 , κ 3 could be 1 when there would exist a u 0 such that u ⊗3 0 = κ 3 / κ 3 . This can happen only when the normed κ 3 / κ 3 has the same form as u ⊗3 . It follows that and we see that this extension is a constant times (a matrix-multiple of ) the skewness vector κ 3 . Indeed, one can use Theorem 3.3 of Fang et al (1990) (Fang et al., 2017 to show that this matrix-multiple reduces to Székely and Rohatgi (1984). In particular when d = 3, we have T = 3 15 s (Y). It follows that, as in Móri et al. (1994), the vector T does not take into account the contribution of cumulants of the type Example 6. Srivastava (1984). If X is a d−dimensional random vector with variance matrix Σ, then consider Γ ΣΓ = Diag (λ 1 , . . . , λ d ) = D λ , with orthogonal matrix Γ. The skewness measure defined by Srivastava can be written as One notices that e ⊗3 i is a unit axis vector in the Euclidean space R d 3 , so that the measure b 2 1 (Y) is the norm square of the projection of κ 3 to the subspace of R d 3 , and it does not contain all the information contained in the vector κ 3 . Note that this index is NOT affine invariant (nor the corresponding kurtosis index).

Multivariate Kurtosis
The kurtosis of Y is measured by the 4th order cumulant vector denoted by κ 4 , and is computed as (3.1) where K 2,2 denotes the commutator matrix (1.2) (see Appendix 1 for details). Note that κ 4 turns out to be the zero vector for multivariate Gaussian distributions, and may be used as the "standard".
This kurtosis vector κ 4 forms the basis for all multivariate measures of kurtosis proposed in the literature, as our next 6 examples demonstrate. For instance, one may define its square norm as a scalar index of kurtosis, called the total kurtosis defined by which is one of the measures. We now connect the kurtosis vector κ 4 and the total kurtosis κ 4 to various other indexes discussed in the literature.

Example 7. Mardia (1970), defined an index of kurtosis as β
and this is related to the kurtosis vector κ 4 as follows: In particular for the standard Gaussian vector Y, we have κ 4 = 0, so that Note that vec I d 2 contains only d 2 ones and hence Mardia's measure does not take into account all the entries of κ 4 ; it includes only some of the entries of E Y ⊗4 namely Example 8. Koziol (1989) considered the following index of kurtosis. Let Y be an independent copy of Y, then is the next higher degree analogue of Mardia's skewness index β 1,d . Specifically Then As in the case of their skewness measure, this measure does not take into account the contribution of cumulants of the type Example 10. Malkovich and Afifi (1973). Similar to the discussion regarding skewness, a measure proposed by Malkovich and Afifi simply provides a different derivation of the total kurtosis in the form because of the fact that Remark 1. It may be noted that the idea used in our (2.6), namely integrating u u ⊗4 κ Y over the unit sphere, will not work for the kurtosis since it can be verified that this will result in a zero vector.

Alternative Measures Based on Distinct Elements of the Cumulant Vectors
The skewness κ 3 and the kurtosis κ 4 vectors contain d 3 and d 4 elements respectively, which are not all distinct. Just as the covariance matrix of a d-dimensional vector contains only d 2 = d(d + 1)/2 distinct elements, a simple computation shows that κ 3 contains Similar to the fact that there are many applications as well as measures which consider only the distinct elements of a covariance matrix, it is quite sensible and reasonable to follow this approach and define skewness and kurtosis measures based on just the distinct elements of the corresponding cumulant vectors. For example, in estimating the "total skewness" index κ 3 2 discussed in Mardia (1970), one may use the "elimination matrix" since the terms in the summation are symmetric like in a covariance matrix. Selection of the distinct elements from the vectors κ 3 and κ 4 can be accomplished via linear transformations. This approach can be traced back to Magnus and Neudecker (1980) who introduce two transformation matrices, L and D, which consist of zeros and ones. For any (n, n) arbitrary matrix A, L eliminates from vec(A) the supra-diagonal elements of A, while D performs the reverse transformation for a symmetric A.
In the case of a covariance matrix Σ of a vector X, the elimination matrix L above is a matrix acting on vec(Σ) and in the approach defined in this paper it holds that vec(Σ) = Cum 2 (X, X), i.e. the distinct elements of vec(Σ) correspond to the distinct elements of a tensor product X ⊗ X. In this way the elimination matrix L can be generalized to tensor products of higher orders in a simple way.
We shall use elimination matrices G (3, d) and G (4, d), for shortening the skewness and kurtosis vectors respectively and keeping just the distinct entries in them. See Meijer (2005) for details where the notations T 3 + and T 4 + are used respectively, which are actually the Moore-Penrose inverses of triplication and quadruplication matrices. This gives cumulant vectors of distinct elements as In such a case, the distinct element vector has dimension dim (κ 3,D ) = d (d + 1) (d + 2) /6. For instance, when d = 2, dim (κ 3,D ) is 50% of the dim (κ 3 ), and in general, the percentage of distinct elements decreases in the proportion (1 + 1/d) (1 + 2/d) /6, getting close to 1/6 for large d. Similarly, the fraction of distinct elements in κ 4,D relative to κ 4 approaches 1/24 for large d -a significant reduction.
Following the discussion of this section one could define indexes of total skewness and total kurtosis exploiting the square norms of the skewness and kurtosis vectors containing only the distinct elements, i.e.
(4.1) In Section 7 some numerical evidence about the performance of these indexes, which eliminate duplication of information, will be given.

Multivariate Symmetric Distributions
Two important classes of symmetric distributions are the spherically symmetric distributions and the elliptically symmetric distributions. In this section, we discuss them in turn and derive their cumulants. Some related results and discussion may be found in Fang et al. (2017).

Multivariate Spherically Symmetric Distributions
where g is called the "characteristic generator" and R is the generating variate, with say, a generating distribution F . The relationship between the distribution F of R and g is given through the characteristic function of the uniform distribution on the sphere (see Fang et al., 2017, p. 30).
The marginal distributions of any such spherically or elliptically symmetric distributions have zero skewness, and the same kurtosis value given by the "kurtosis parameter" of the form (see Muirhead, 2009, p. 41) The next lemma provides the moments of the uniform distribution on S d−1 , and is proved in the Appendix.
Lemma 1. Let U be uniform on sphere S d−1 . Then

For odd-order moments
while for even-order moments For T -products we have

5)
where zero-one vectors e d 3 , d and e d 4 are given by Eq. 1.8 and 1.9 respectively, more over the sum of the entries is

The odd moments of the modulus
where k = k i ,see Eq. 6.3 for G k , and d 1 is the number of nonzero k i , in particular For T -products we have and where the zero-one vectors are given in Eq. 1.10 and 1, 1, 1) where the zero-one vectors are given in Eq. 1.11.

Remark 2. Observe that there are only three distinct elements of the third order moments, and four distinct elements of the fourth order moments!
The next lemma provides the cumulants of W (proof in the Appendix).
From the representation of W given in Eq. 5.1, it is easy to see that Cum 2 +1 (W) = 0, = 0, 1 . . . , while the second and fourth order cumulants are calculated directly using the Lemmas 1 and 2 above, so that we have the following Theorem 1. If W is spherically distributed with the representation W = RU and E(R 4 ) < ∞, then Cum 2 +1 (W) = 0, Cum 2 (W) = E R 2 d vec I d and In terms of these moments, the kurtosis parameter becomes

Multivariate elliptically symmetric distributions A d-vector X has an elliptically symmetric distribution if it has the representation
where μ ∈ R d , Σ is a variance-covariance matrix and W is spherically distributed. Hence the cumulants of X are just constant times the cumulants of W except for the mean i.e.

Moments of elliptically symmetric distributions have also been discussed by
Berkane and Bentler (1986). As a special case, we now discuss 5.
where R * = √ m Z /S, and R * 2 /d has an F-distribution with d and m degrees of freedom. Let μ ∈ R d , and A is an d × d matrix and X = μ + A W then X ∈ Mt d (m, μ,Σ), where Σ = A A, hence X is an elliptically symmetric random variable. Since the characteristic function is quite complicated (see Fang et al., 2017 Section 3.3.6 p. 85), we utilize the stochastic representation given in Eq. 5.7 for deriving the kurtosis (note that the skewness is zero). The proof of the next Lemma is in the Appendix.

Multivariate Skew Distributions
Starting with Azzalini and Dalla Valle (1996) who suggest methods for obtaining multivariate skew-normal distributions, several authors discuss different approaches for obtaining asymmetric multivariate distributions, by skewing a spherically or an elliptically symmetric distribution. We mention here (Branco and Dey, 2001) who extend the work in Azzalini and Dalla Valle (1996) to multivariate skew-elliptical distribution, Arnold and Beaver (2002) who use a conditioning approach on a d-dimensional random vector to an elliptically contoured distribution (although this conditioning is not strictly necessary), Sahu et al. (2003) who use transformation and conditioning techniques, Dey and Liu (2005) who discuss an approach based on linear constraints, Genton and Loperfido (2005) who introduce a general class of multivariate skew-elliptical distributions-the so-called multivariate generalized skew-elliptical (GSE) distribution. See also Genton (2004) and the references therein.
Here we provide a systematic treatment of several skew-multivariate distributions by providing general formulae for cumulant vectors up to the fourth order, which are needed in deriving the corresponding skewness and kurtosis measures discussed in Sections 2 and 3.
6.1. Multivariate skew spherical distributions Let Z = Z 1 , Z 2 be spherically symmetric distributed in dimension (m+d). Define the canonical fundamental skew-spherical (CFSS) distribution (Arellano-Valle and Genton 2005, Prop. 3.3) by and, written as G k for short. We may express the joint moments of p 1 , p 2 as (6.4) In particular E p 1 = G 1,0 (m, d) , E p 1 p 2 = G 1,1 (m, d). The cumulants of X can be obtained from these moments. Using Eq. 6.4 above, and Lemma (1) for the moments of p k and |U j |, we obtain the following result (see the Appendix for the proof).

Theorem 2. Let the d-vector X have a CFSS distribution as defined in
Eq. 6.1 and denote by V 2 = I − ΔΔ 1/2 Z 2 for ease of notation, with Azzalini and Capitanio (2003); consult also Kim and Mallick (2003) for a derivation of moments up to the 4th order and moments of quadratic forms of the multivariate skew t-distribution. Let X be a d-dimensional vector having multivariate skew-normal distribution, SN d (0, Ω, α) (see Section 6.3 below for details), and S 2 be a random variable which follows a χ 2 distribution with m degrees of freedom. Then the random vector

Multivariate Skew-t Distribution This distribution goes back to
has a multivariate skew t-distribution denoted by St d (μ, Ω, α, m). Derivation of the cumulants of V up to the 4th order are provided in the next theorem, with detailed proof given in the Appendix.
Theorem 3. Let V ∈ St d (μ, Ω, α, m) with m > 4, then its first four cumulants are: From Theorem 3 the skewness and kurtosis vectors are

Multivariate Skew-Normal Distribution
Consider the multivariate skew-normal distribution introduced by Azzalini and Dalla Valle (1996), whose marginal densities are scalar skew-normals. A d-dimensional random vector X is said to have a multivariate skew-normal distribution, SN d (μ, Ω, α) with shape parameter α if it has the density function where ϕ (X; μ, Ω) is the d-dimensional normal density with mean μ and correlation matrix Ω; here ϕ and Φ denote the univariate standard normal density and the cdf. The cumulant function of SN d (0, Ω, α) is given by (1 + α Σα) 1/2 Ωα and α = 1 Note that the cumulants of order higher than 2 do not depend on Ω but only on δ. Here we use the approach discussed in this paper to get explicit expressions for the cumulants of the multivariate SN distribution. See also Genton et al. (2001) and Kollo et al. (2018) for moments of SN and its quadratic forms. The proof of the next Lemma is similar to that of Lemma 5 that is coming later, and is omitted. Cum 1 (X) = E X = 2 π δ, and Cum 2 (X) = vec Ω − 2 π δ ⊗2 , while for k = 3, 4 . . ., Cum k (X) = c k δ ⊗k . In particular From this Lemma one observes that δ = π/2μ X , from which Cum 3 (X) = 2 − π 2 μ ⊗3 X . Hence Mardia's skewness measure becomes and the kurtosis measure β 2,d = (2π − 6) 2 (vec I d 2 ) μ ⊗4 X + d(d + 2).

Canonical Fundamental Skew-Normal (CFUSN) Distribution
Arellano- Valle and Genton (2005) introduced the CFUSN distribution (cf. their Proposition 2.3), to include all existing definitions of SN distributions. The marginal stochastic representation of X with distribution CFUSN d,m (Δ) is given by where Δ, is the d × m skewness matrix such that Δa < 1, for all a = 1, and Z 1 ∈ N (0, I m ) and Z 2 ∈ N (0, I d ) are independent (Proposition 2.2. (Arellano- Valle and Genton, 2005) (μ, Σ, Δ) can be defined via the linear transformation μ + Σ 1/2 X. We then have the following The cumulant function of a CFUSN d,m (0, Σ, Δ) is where Φ m denotes the standard normal distribution function of dimension m. Note again that the cumulants of order higher than 2 do not depend on Σ but just on Δ. The proof of the next Lemma is given in the Appendix.
Lemma 5. The cumulants of CF U SN d,m (0, Σ, Δ) are given by Cum 2 (X) = vecΣ − 2 π Δ ⊗2 vec I m , (6.8)   where e k denotes the k th unit vector in R m . Expressions for c k are provided in Lemma 4. Figure 1 provides the contour plots of the density of a CFUSN 2,2 (0, I 2 , Δ) for selected choices of the generating Λ as described under each figure. Table 1 reports the values of skewness and kurtosis indexes computed with the help of Lemma 5. These results have been further verified by simulations. The index of Malkovich and Afifi and that of Balakrishnan et al. are not reported as they are equivalent to Mardia and MSR measures respectively.

Some Illustrative Examples
Among the skewness indexes, note that Kollo's vector measure is not able to capture the presence of skewness (although it is quite strong) in case b). As far as the kurtosis indexes are concerned, note that the total kurtosis κ 4 and κ 4,D are quite effective in showing differences among the four cases. Figure 2 reports the contour plots of the density of St 2 (0, I 2 , α, 10) for some choices of α as given under the figure. Table 2 reports the corresponding values of skewness and kurtosis measures

Concluding Remarks
In this paper we have taken a vectorial approach to express information about skewness and kurtosis of multivariate distributions. This can be achieved by applying a vector derivative operator that we call the Tderivative, to the cumulant generating function. Although some of our methods may appear as similar to some existing results, we demonstrate that they lead to a direct and natural way of expressing higher order cumulants and moments in the multivariate case. This approach can also be easily extended to obtain moments and cumulants beyond the fourth order.
Our careful analysis of existing measures of skewness and kurtosis via the third and fourth order cumulant vectors, reveals some hidden features and relationships among them.
Explicit formulae for cumulant vectors for several distributions have been obtained. Several results are new, such as those in Lemmas 1 and 3, while others complement and complete existing results available in the literature.
The availability of explicit formulae for κ 3 and κ 4 together with available computing formulae for commutators provides a systematic treatment of higher order moments and cumulants for general classes of symmetric and asymmetric multivariate distributions, which are needed in applications, estimation and testing.
Acknowledgments. The work of the third author is partially supported by the Project EFOP3.6.2-16-2017-00015 of the European Union and cofinanced by the European Social Fund.
Funding Information. Open access funding provided by Università degli Studi di Trento within the CRUI-CARE Agreement.
Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/. while E X ⊗4 = κ X,4 +K 3,1 κ X,3 ⊗ κ X,1 +K 2,2 κ ⊗2 X,2 +K 2,1,1 κ X,2 ⊗ κ ⊗2 X,1 +κ ⊗4 X,1 , where (1,3,2,4) + K −1 (3,1,2,4) + K −1 (3,1,4,2) + K −1 (1,3,4,2) + K −1 (3,4,1,2) .
For cumulants we have (see e.g. Jammalamadaka et al., 2006) where the commutators are as defined in Eqs. 1.1 and 1.2.

* Proofs
We start with the proof of Theorem 3, which is one of the central results of the paper. Proofs of the remaining results follow.
Proof of Theorem 3. In order to derive the cumulants of V, set Note that the cumulants of W are the same as those of V except for the mean, and that (cf. Johnson et al., 1994, p. 452) (m) where G k (m) is as given in Eq. 6.3. We have .
Consider first the computation of mean and variance.
with δ as also α given in Eq. 6.5. From the formula for cumulants of products (see Terdik 2002) we need to compute: where, from Lemma 4, For m > 2, using the results above We note that Sahu et al., (2003, p.137) provide a different approach and expression for the second cumulant. Consider now the third order cumulant. Again, from the formula for cumulants of products (see Terdik 2002) we need to compute: +C 1 (S m ) C 2 (S m ) (3κ X,3 + 2K 2,1 (κ X,2 ⊗ κ X,1 )) Notice that W is a scalar multiple of a vector valued variate, making the general formula for vector valued case somewhat simplified. Now, we consider separately the terms in Eq. 1.4: first those depending on X, namely κ X,3 + K 2,1 (κ X,2 ⊗ κ X,1 ) + κ ⊗3 Here, as well as in the next set of calculations, we use Lemma 4 which expresses the cumulants of a multivariate skew normal variable in terms of its parameters. As a side result we have Plugging these results into Eq. 1.4, we obtain The cumulants of S m that are needed, come from the expressions for cumulants in terms of moments and the moment formula (1.3). They are as follows C 1 (S m ) 3 is simply the third power of the expected value. Finally we collect the coefficients in Eq. 1.6 of different terms. Starting with K 2,1 (κ X,2 ⊗ κ X,1 ), we have .
Proof of Lemma 1. This Lemma can be obtained essentially from the results in Fang et al. (2017), by rearranging them into a vector form, and using our notations.
The first two cumulants are obtained, by evaluating the derivatives at λ = 0. We have: For the third order cumulant, we have Here e ⊗2 k k=1:m is a matrix of vectors of e ⊗2 k with dimension m 2 ×m. Higher order cumulants come out as constant times Δ ⊗k vec e ⊗k−1 p p=1:m , k = 3, 4, . . ..
Note that c 3 and c 4 coincide with those given in Appendix A.2 of Azzalini and Capitanio (1999), and we omit details about the other constants c k .