1 Introduction

Using the standard normal distribution as the yardstick, statisticians have defined notions of skewness (asymmetry) and kurtosis (peakedness) in the univariate case. Since all the odd central moments (when they exist) are zero for a symmetric distribution on the real line, a first attempt at measuring asymmetry is to ask how different the 3rd central moment is from zero, although in principle, one could use any other odd central moment, or even a combination of them. Several alternate indexes of asymmetry, using the mode or the quantiles are also available –see for instance (Arnold and Groeneveld, 1995; Averous and Meste, 1997; Ekström and Jammalamadaka, 2012) for recent work and review. Similarly for kurtosis, taking again the standard normal distribution as the yardstick, a coefficient of kurtosis has been developed using the 4th central moment.

When dealing with multivariate distributions, the notions of symmetry, measurement of skewness, as well as of kurtosis, are not uniquely defined. For example, the mode-based approach in Arnold and Groeneveld (1995), although very popular, does not seem extendable to the multivariate case. Also, in interpreting the cumulant-based measures of skewness and kurtosis, one has to pause – especially for the kurtosis if there are multiple peaks.

Our focus in this paper is on multivariate distributions and one may remark that the two fundamental papers by Rao (1948a, b) take us back to the early days of such multivariate analysis. A good starting point is the monograph by Fang et al. (2017) and a more recent review article by Serfling (2004). We consider symmetry of a d-dimensional random vector X, around a given point, which we will assume without loss of generality, to be the origin. Such an X is said to be spherically symmetric or rotationally symmetric if for all (d × d) orthogonal matrices A, X has the same distribution as AX. One may generalize this to elliptical or ellipsoidal symmetry in an obvious way. A more common and practical notion of symmetry in multi-dimensions, is the reflective or antipodal symmetry, and we say X has this property if it has the same distribution as −X. For measuring departures from such symmetry, various notions of skewness have been proposed by different authors, and the list includes (Mardia, 1970; Malkovich and Afifi, 1973; Isogai, 1982; Srivastava, 1984; Song, 2001) and Móri et al. (1994); see also Sumikawa et al. (2013) for an extension of Mardia’s multivariate skewness index to the high-dimensional case. In a broad discussion and analysis of multivariate cumulants, their properties and their use in inference, Jammalamadaka et al. (2006) proposed using the full vector of third and fourth order cumulants, as vectorial measures for multivariate skewness and kurtosis respectively. Such measures based on the cumulant vectors, were further discussed by Balakrishnan et al. (2007) and Kollo (2008). A systematic treatment of asymptotic distributions of skewness and kurtosis indexes can be found in Baringhaus and Henze (1991a, 1991b, 1992), and Klar (2002).

In this paper, one of our primary goals is to look at these disparate looking definitions of skewness and kurtosis based on cumulants which have been proposed in the literature, and to assess, and relate them from a unified perspective in terms of the cumulant vectors discussed in Jammalamadaka et al. (2006). Such a unified treatment helps reveal several relationships and features of the many existing proposals. For example it will be shown: (i) that Mardia (1970) and Malkovich and Afifi (1973) skewness measures can be equivalent; (ii) that Balakrishnan et al. (2007) and Móri et al. (1994) skewness vectors are just proportional to each other; (iii) that Kollo (2008) vectorial measure can be a null vector even for some asymmetric distributions, and (iv) that Srivastava (1984) index is not affine invariant.

In Section 4, we also introduce alternate measures for skewness and Kurtosis based just on the distinct cumulants, and evaluate their performance with some examples in Section 7.

Another significant contribution of the paper is to provide clear and easy formulae for computing cumulants up to the fourth order for spherically symmetric, elliptically symmetric, and skew elliptical families along with several specific examples. These, comprehensive and mostly new, results allow for a straightforward computation of all the indexes of skewness and kurtosis discussed here for several important multivariate distributions.

The analysis presented here is based on the cumulant vectors of the third and fourth order, defined below. In our derivations we utilize an elegant and powerful tool— the so-called T-derivative, which we now describe. Let \(\boldsymbol {\lambda }=(\lambda _{1},\dots ,\lambda _{d})^{\top }\) be a d-dimensional vector of constants and let \(\boldsymbol {{\phi }}(\boldsymbol {\lambda })=(\phi _{1} (\boldsymbol {\lambda }),\dots ,\phi _{m}(\boldsymbol {\lambda }))^{\top }\) denote a m-dimensional vector valued function (\(m\in \mathbb {N}\)), which is differentiable in all its arguments. The Jacobian matrix of ϕ is defined by

$$ D_{\boldsymbol{\lambda}}\boldsymbol{{\phi}}(\underline{\lambda })=\frac{\partial\boldsymbol{\phi}(\boldsymbol{\lambda})}{\partial \boldsymbol{\lambda}^{\top}}=\left[ \frac{\partial\phi_{i}(\boldsymbol{\lambda })}{\partial\lambda_{j}}\right]_{i=1,\dots,m; j=1,\dots,d}. $$

Then the operator \(D_{\boldsymbol {\lambda }}^{\otimes }\), which we refer to as the T-derivative, is defined as

$$ D_{\boldsymbol{\lambda}}^{\otimes}\boldsymbol{{\phi}} (\boldsymbol{\lambda})=\operatorname{vec}\left( \frac{\partial\boldsymbol{\phi }(\boldsymbol{\lambda})}{\partial\boldsymbol{\lambda}^{\top}}\right)^{\top }=\boldsymbol{\phi}(\boldsymbol{\lambda})\otimes\frac{\partial}{\partial \boldsymbol{\lambda}}, $$
(1.1)

where the symbol ⊗ here and everywhere else in the paper, denotes the Kronecker (tensor) product. Assuming ϕ is k times differentiable, the k-th T-derivative is given by

$$ D_{\boldsymbol{\lambda}}^{\otimes k}\boldsymbol{{\phi} }(\boldsymbol{\lambda})=D_{\boldsymbol{\lambda}}^{\otimes}\left( D_{\boldsymbol{\lambda}}^{\otimes k-1}\boldsymbol{{\phi} }(\boldsymbol{\lambda})\right), $$
(1.2)

which is a vector of order m × dk containing all possible partial derivatives of entries of ϕ(λ) according to the tensor product \(\left (\frac {\partial }{\partial \lambda _{1}},\dots ,\frac {\partial }{\partial \lambda _{d}}\right )^{\top \otimes k}\). Refer to Jammalamadaka et al. (2006) for further details, properties, examples, and applications of the operator \(D_{\boldsymbol {\lambda }}^{\otimes }\) (see also Terdik 2002).

We note that if ϕX(λ) denotes the characteristic function of a d-dimensional random vector X, then the operator \(D^{\otimes k}_{\boldsymbol {\lambda }}\) applied to ϕ, and \(\log \phi \) will provide the vector of moments of order k, and cumulants of order k respectively.

One may also refer to MacRae (1974) for a similar definition of a matrix derivative using the tensor product and a differential operator; indeed, the T-derivative we obtain by vectorizing the transposed Jacobian (1.1), can be seen as closely related to that. While these results can also be obtained by using tensor-calculus, as is done in McCullagh (1987) and Speed (1990), our approach is more straightforward and simpler, requiring only the knowledge of calculus of several variables. We believe that using only the tensor products of vectors leads to an intuitive and natural way to deal with higher order moments and cumulants for multivariate distributions, as it will be demonstrated in the paper. Another comprehensive reference on matrix derivatives is the book by Mathai (1997).

The paper is organized as follows: Sections 2 and 3 introduce respectively the skewness and kurtosis vectors, and treat several existing measures of skewness and kurtosis based on these cumulant vectors. Section 4 discusses a linear transformation on the skewness and kurtosis vectors, which helps remove from them redundant/duplicate information, and proposes skewness and kurtosis indexes based on distinct elements of the corresponding vectors. Sections 5 and 6 provide computational formulae for the skewness and kurtosis vectors for spherical, elliptical and asymmetric/skew multivariate distributions. Section 7 provides some examples while Section 8 provides some final considerations. To improve readability of the paper, more technical details and proofs are placed in an Appendix. A word about the notations: bold uppercase letters are used for random vectors and matrices while bold lowercase letters denote their specific values.

2 Multivariate Skewness

In this section and the next one, it will be shown that all cumulant-based measures of skewness and kurtosis that appear in the literature can be expressed in terms of the third and fourth cumulant vectors respectively. Also, several hitherto unnoticed relationships between different indexes will be brought out.

Let X be a d −dimensional random vector whose first four moments exist. We will denote EX = μ, with a positive-definite variance-covariance matrix VarX = Σ. Consider the standardized vector

$$ \mathbf{Y}=\mathbf{\Sigma}^{-1/2}\left( \mathbf{X}-\boldsymbol{\mu}\right) $$
(2.1)

with zero means and identity matrix for its variance-covariance. A complete picture about the skewness, is contained in the “skewness vector” of X or Y, defined as \( \boldsymbol {\kappa }_{3}=\underline {\operatorname *{Cum}}_{3}\left (\mathbf {Y}\right ). \) Since the third order cumulants are the same as the third order central moments, we may write

$$ \boldsymbol{\kappa}_{3}=\operatorname*{E}\mathbf{Y}^{\otimes3}. $$
(2.2)

Note that for a d-dimensional vector \(\mathbf {Y}=(Y_{1},\dots ,Y_{d})^{\top }\), κ3 has length d3 and it contains all terms of the form \(\operatorname *{E}{Y_{r}^{3}}\), \(\operatorname *{E} {Y_{r}^{2}}Y_{s}\), EYrYsYt for 1 ≤ r,s,td. Among these d3 elements, only d(d + 1)(d + 2)/6 are distinct elements, and in Section 4 we discuss a linear transformation which allows one to get the distinct elements of κ3. In the examples below, we denote the unit matrix of dimension k by Ik, and the k-vector of ones by 1k.

The following 6 examples reveal several relationships among various indexes of skewness which appear in the literature, and their connection to the third-order cumulant vector κ3, which can actually be seen as the common denominator.

Example 1.

Mardia (1970) suggested the square norm of the vector κ3 as a measure of departure from symmetry, viz.

$$ \kappa_{3}=\left\Vert \boldsymbol{\kappa}_{3}\right\Vert^{2}, $$

which is denoted by β1,d. If Y1 and Y2 are two independent copies of Y, then we may write

$$ \kappa_{3}=\left\Vert \operatorname*{E}\mathbf{Y}^{\otimes3}\right\Vert^{2}=\operatorname*{E}\mathbf{Y}_{1}^{\top\otimes3}\operatorname*{E} \mathbf{Y}_{2}^{\otimes3}=\operatorname*{E}\left( \mathbf{Y}_{1}^{\top }\mathbf{Y}_{2}\right)^{3}. $$

Example 2.

Móri et al. (1994), after observing that

$$ \left( \operatorname*{vec}\mathbf{I}_{d}\right)^{\top}\mathbf{Y} ^{\otimes2}=\mathbf{Y}^{\top\otimes2}\operatorname*{vec}\mathbf{I} _{d}=\operatorname*{vec}\mathbf{Y}^{\top}\mathbf{Y}=\sum\limits_{i=1}^{d} {Y_{i}^{2}} $$

define a “skewness vector”

$$ s(\mathbf{Y})=\operatorname*{E}\left( \mathbf{Y}^{\top}\mathbf{Y} \right) \mathbf{Y}=\left( \left( \operatorname*{vec}\mathbf{I} _{d}\right)^{\top}\otimes\mathbf{I}_{d}\right) \operatorname*{E} \mathbf{Y}^{\otimes3}=\left( \left( \operatorname*{vec}\mathbf{I} _{d}\right)^{\top}\otimes\mathbf{I}_{d}\right) \boldsymbol{\kappa}_{3}. $$
(2.3)

Note that \(\left (\operatorname *{vec}\mathbf {I}_{d}\right )^{\top } \otimes \mathbf {I}_{d}\) is a matrix of dimension (d × d3) which contains d unit values per-row whereas all the others are 0; as a consequence, this measure does not take into account the contribution of cumulants of the type \(\operatorname *{E}\left (Y_{r}Y_{s}Y_{t}\right ) \), where (r,s,t) are all different.

Example 3.

Kollo (2008), noting the fact that not all third-order mixed moments appear in s(Y), proposes an alternate skewness vector \(b\left (\mathbf {Y}\right ) \) which can again be expressed in terms of κ3 as follows:

$$ b\left( \mathbf{Y}\right) =\operatorname*{E}\left[ \mathbf{1}_{d^{2} }^{\top}\left( \mathbf{Y}\otimes\mathbf{Y}\right) \right] \otimes\mathbf{Y} =\left( \mathbf{1}_{d^{2}}^{\top}\otimes\mathbf{I} _{d}\right) \operatorname*{E}\mathbf{Y}^{\otimes3} =\left( \mathbf{1} _{d^{2}}^{\top}\otimes\mathbf{I}_{d}\right) \boldsymbol{\kappa}_{3}. $$
(2.4)

Comparing \(b\left (\mathbf {Y}\right ) \) in Eq. 2.4 to s(Y) in Eq. 2.3, we see that the difference between the two expressions comes from the fact that \(b\left (\mathbf {Y}\right ) \) has the term \(\mathbf {1}_{d^{2}}^{\top }\), compared to \(\left (\operatorname *{vec} \mathbf {I}_{d}\right )^{\top }\) in s(Y). This results in the Kollo measure summing the elements of d consecutive groups of size d2 in κ3.

Following this line of reasoning, note that if d = 2 and \(\boldsymbol {\kappa }_{3} ={}\left [ 1,-1,-1,\right .\) \(\left .1,-1,1,1,-1\right ] \), then \(b\left (\mathbf {Y} \right ) =0\) even for an asymmetric distribution, making it not a valid measure of skewness. Section 7 gives specific examples where this actually happens.

Example 4.

Malkovich and Afifi (1973) (see also Balakrishnan and Scarpa 2012) consider the following approach to measuring skewness: let \(\mathbb {S}_{d-1}\) be the (d − 1) dimensional unit sphere in \(\mathbb {R}^{d}\). First, for \(\mathbf {u}\in \mathbb {S}_{d-1}\), note that

$$ \underline{\operatorname*{Cum}}_{3}\left( \mathbf{u}^{\top}\mathbf{Y} \right) =\left( \mathbf{u}^{\top}\right)^{\otimes3} \underline{\operatorname*{Cum}}_{3}\left( \mathbf{Y}\right) =\mathbf{u}^{\top\otimes3}\operatorname*{E}\mathbf{Y}^{\otimes3} =\mathbf{u}^{\top\otimes3}\boldsymbol{\kappa}_{3} . $$
(2.5)

Malkovich–Afifi define their measure as

$$ b_{1}^{*}\left( \mathbf{Y}\right) =\sup_{\mathbf{u}}\left( \left( \mathbf{u}^{\top\otimes3}\boldsymbol{\kappa}_{3}\right)^{2}\right). $$

Consider

$$ \begin{array}{@{}rcl@{}} \left( \mathbf{u}^{\top\otimes3}\boldsymbol{\kappa}_{3}\right)^{2} & =\left\Vert \mathbf{u}^{\top\otimes3}\right\Vert^{2}\left\Vert \boldsymbol{\kappa}_{3}\right\Vert^{2}\cos^{2}\left( \mathbf{u} ^{\top\otimes3},\boldsymbol{\kappa}_{3}\right) , \end{array} $$

where \(\cos \limits \left (\mathbf {a},\mathbf {b} \right )\) indicates the cosine of the angle between the vectors a and b; next note that

$$ \begin{array}{@{}rcl@{}} \left\Vert \mathbf{u}^{\top\otimes3}\right\Vert^{2} & =\mathbf{u} ^{\top\otimes3}\mathbf{u}^{\otimes3}=\left( \mathbf{u}^{\top }\mathbf{u}\right)^{\otimes3}=1, \end{array} $$

and \( \sup _{\mathbf {u}}\cos \limits \left (\mathbf {u}^{\top \otimes 3}, \boldsymbol {\kappa }_{3}\right ) \) could be 1 when there would exist a u0 such that \(\mathbf {u}_{0}^{\top \otimes 3} =\boldsymbol {\kappa }_{3}/\left \Vert \boldsymbol {\kappa }_{3}\right \Vert \). This can happen only when the normed \(\boldsymbol {\kappa }_{3}/\left \Vert \boldsymbol {\kappa }_{3}\right \Vert \) has the same form as u⊤⊗3. It follows that

$$ b_{1}^{*}\left( \mathbf{Y}\right) \leq \left\Vert \boldsymbol{\kappa}_{3}\right\Vert^{2}. $$

Example 5.

Balakrishnan et al. (2007) discuss a multivariate extension of Malkovich–Afifi measure. Denoting \({\Omega }\left (d \mathbf {u}\right ) \) as the normalized Lebesgue element of surface area on \(\mathbb {S}_{d-1}\), they suggest

$$ \mathbf{T}={\int}_{\mathbb{S}_{d-1}}\mathbf{u}\left( \mathbf{u} ^{\top\otimes3}\boldsymbol{\kappa}_{3}\right) {\Omega}\left( d\mathbf{u} \right) ={\int}_{\mathbb{S}_{d-1}}\mathbf{u}\otimes\mathbf{u} ^{\top\otimes3}{\Omega}\left( d\mathbf{u}\right) \boldsymbol{\kappa}_{3} $$
(2.6)

and we see that this extension is a constant times (a matrix-multiple of) the skewness vector κ3. Indeed, one can use Theorem 3.3 of Fang et al (1990) (Fang et al. 2017) to show that this matrix-multiple reduces to

$$ {\int}_{\mathbb{S}_{d-1}}\mathbf{u}\otimes\mathbf{u}^{\top\otimes3} {\Omega}\left( d\mathbf{u}\right) =\frac{3}{d\left( d+2\right) }\left( \left( \operatorname*{vec}\mathbf{I}_{d}\right)^{\top}\otimes\mathbf{I} _{d}\right) . $$

Therefore T defined in Eq. 2.6 becomes a scalar-multiple, \(3/d\left (d+2\right ) \) times \(s\left (\mathbf {Y} \right ) \) defined in Eq. 2.3 by Móri, Székely and Rohatgi (1984). In particular when d = 3, we have \(\mathbf {T}=\frac {3}{15}s\left (\mathbf {Y}\right ) \). It follows that, as in Móri et al. (1994), the vector T does not take into account the contribution of cumulants of the type \(\operatorname *{E}\left (Y_{r}Y_{s}Y_{t}\right ) \), where r,s,t are all different.

Example 6.

Srivastava (1984). If X is a d −dimensional random vector with variance matrix Σ, then consider \({\Gamma }^{\top }\boldsymbol {\Sigma } \boldsymbol {\Gamma }=\operatorname *{Diag}\left (\lambda _{1},\ldots ,\lambda _{d}\right ) =\boldsymbol {D_{\lambda }}\), with orthogonal matrix Γ. The skewness measure defined by Srivastava can be written as

$$ {b_{1}^{2}}\left( \mathbf{Y}\right) =\frac{1}{d}\sum\limits_{j=1}^{d}\left( \operatorname*{E}\widetilde{Y}_{i}^{3}\right)^{2}. $$

where \(\widetilde {\mathbf {Y}}=\boldsymbol {D_{\lambda }}^{-1}{\Gamma }\left (\mathbf {X} -\operatorname *{E}\mathbf {X}\right ) \). We have \( \widetilde {\mathbf {Y}}=\boldsymbol {D_{\lambda }}^{-1}\boldsymbol {\Gamma }\boldsymbol {\Sigma } ^{1/2}\mathbf {Y}=\boldsymbol {D_{\lambda }}^{-1/2}\boldsymbol {\Gamma }\mathbf {Y}, \) and the ith coordinate of \(\widetilde {\mathbf {Y}}\) is \(\widetilde {Y} _{i}=\mathbf {e}_{i}^{\top }\widetilde {\mathbf {Y}},\) where ei is the ith coordinate axis. Since

E\(\widetilde {\mathbf {Y}}=0\), and \(\operatorname *{Var}\left (\widetilde {\mathbf {Y}}\right ) =D_{\lambda }^{-1}\), Srivastava measure can be re-expressed as

$$ \operatorname*{E}\widetilde{Y}_{i}^{3}=\underline{\operatorname*{Cum}} _{3}\left( \widetilde{Y}_{i}\right) =\underline{\operatorname*{Cum}} _{3}\left( \mathbf{e}_{i}^{\top}\boldsymbol{D_{\lambda}}^{-1/2}\boldsymbol{\Gamma}\mathbf{Y}\right) =\mathbf{e}_{i}^{\top\otimes3}\left( \boldsymbol{D_{\lambda}}^{-1/2}\right)^{\otimes 3}\boldsymbol{\Gamma}^{\otimes3}\boldsymbol{\kappa}_{3}, $$

and

$$ {b_{1}^{2}}\left( \mathbf{Y}\right) =\frac{1}{d}\sum\limits_{j=1}^{d}\left( \mathbf{e}_{i}^{\top\otimes3}\left( \boldsymbol{ D_{\lambda}}^{-1/2}\right)^{\otimes 3}\boldsymbol{\Gamma}^{\otimes3}\boldsymbol{\kappa}_{3}\right)^{2}. $$

One notices that \(\mathbf {e}_{i}^{\top \otimes 3}\) is a unit axis vector in the Euclidean space \(\mathbb {R}^{d^{3}}\), so that the measure \({b_{1}^{2}}\left (\mathbf {Y}\right ) \) is the norm square of the projection of κ3 to the subspace of \(\mathbb {R}^{d^{3}}\), and it does not contain all the information contained in the vector κ3. Note that this index is NOT affine invariant (nor the corresponding kurtosis index).

3 Multivariate Kurtosis

The kurtosis of Y is measured by the 4th order cumulant vector denoted by κ4, and is computed as

$$ \boldsymbol{\kappa}_{4}=\underline{\operatorname*{Cum}}_{4}\left( \mathbf{Y}\right) =\operatorname*{E}\mathbf{Y}^{\otimes4} -\mathbf{K}_{2,2}\underline{\operatorname*{Cum}}_{2}\left( \mathbf{Y} \right)^{\otimes2}=\operatorname*{E}\mathbf{Y}^{\otimes4}-\mathbf{K} _{2,2}\left[ \operatorname*{vec}\mathbf{I}_{d}\right]^{\otimes2}, $$
(3.1)

where K2,2 denotes the commutator matrix (1.2) (see Appendix 1 for details). Note that κ4 turns out to be the zero vector for multivariate Gaussian distributions, and may be used as the “standard”.

This kurtosis vector κ4 forms the basis for all multivariate measures of kurtosis proposed in the literature, as our next 6 examples demonstrate. For instance, one may define its square norm as a scalar index of kurtosis, called the total kurtosis defined by

$$ \kappa_{4} = || \boldsymbol{\kappa}_{4}||^{2} $$
(3.2)

which is one of the measures. We now connect the kurtosis vector κ4 and the total kurtosis κ4 to various other indexes discussed in the literature.

Example 7.

Mardia (1970), defined an index of kurtosis as \( \beta _{2,d}=\operatorname *{E}\left (\mathbf {Y}^{\top }\mathbf {Y}\right )^{2}. \) Note that

$$ \operatorname*{E}\operatorname*{vec}\left( \mathbf{Y}^{\top} \mathbf{Y}\right)^{2}=\operatorname*{E}\left[ \mathbf{Y}\right]^{\top\otimes2}\left[ \mathbf{Y}\right]^{\otimes2}=\operatorname*{E} \left[ \mathbf{Y}\right]^{\top\otimes4}\operatorname*{vec} \mathbf{I}_{d^{2}} $$

and this is related to the kurtosis vector κ4 as follows:

$$ \beta_{2,d}=\left( \operatorname*{vec}\mathbf{I}_{d^{2}}\right)^{\top }\boldsymbol{\kappa}_{4}+\left( \operatorname*{vec}\mathbf{I}_{d^{2}}\right)^{\top}\mathbf{K}_{2,2}\left[ \operatorname*{vec}\mathbf{I}_{d}\right]^{\otimes2}. $$

In particular for the standard Gaussian vector Y, we have κ4 = 0, so that \(\beta _{2,d}=\operatorname *{E}\left (\mathbf {Y}^{\top }\mathbf {Y}\right )^{2}=d\left (d+2\right ) \). As a consequence, for such a Y we have the equation

$$ \left( \operatorname*{vec}\mathbf{I}_{d^{2}}\right)^{\top}\mathbf{K} _{2,2}\left[ \operatorname*{vec}\mathbf{I}_{d}\right]^{\otimes2}=d\left( d+2\right) . $$

Note that \(\operatorname *{vec}\mathbf {I}_{d^{2}}\) contains only d2 ones and hence Mardia’s measure does not take into account all the entries of κ4; it includes only some of the entries of EY⊗4 namely

$$ \beta_{2,d}=\sum\limits_{r=1}^{d}\operatorname{E}{Y_{r}^{4}}+\sum\limits_{r\neq s} \operatorname{E}{Y_{r}^{2}}{Y_{s}^{2}}. $$

Example 8.

Koziol (1989) considered the following index of kurtosis. Let \(\widetilde {\mathbf {Y}}\) be an independent copy of Y, then

$$ \operatorname*{E}\left( \widetilde{\mathbf{Y}}^{\top}\mathbf{Y}\right)^{4} =\operatorname*{E}\widetilde{\mathbf{Y}}^{\top\otimes4}\mathbf{Y} ^{\otimes4} =\left\Vert \operatorname*{E}\mathbf{Y}^{\otimes4}\right\Vert^{2}, $$

is the next higher degree analogue of Mardia’s skewness index β1,d. Specifically

$$ \begin{array}{@{}rcl@{}} \left\Vert \operatorname*{E}\mathbf{Y}^{\otimes4}\right\Vert^{2} & =&\left\Vert \boldsymbol{\kappa}_{4}+\mathbf{K}_{2,2}\left[ \operatorname*{vec} \mathbf{I}_{d}\right]^{\otimes2}\right\Vert^{2}\\ & =& {\kappa}_{4}+2\boldsymbol{\kappa}_{4}^{\top}\mathbf{K}_{2,2}\left( \operatorname*{vec}\mathbf{I}_{d}\right)^{\otimes2}+d^{2}\\ & =&{\kappa}_{4}+6\beta_{2,d}-d^{2} \end{array} $$

where β2,d is Mardia (1970) index of kurtosis.

Example 9.

Móri et al. (1994) define kurtosis of Y as

$$ K\left( \mathbf{Y}\right) =\operatorname*{E}\left( \mathbf{Y} \mathbf{Y}^{\top}\mathbf{Y}\mathbf{Y}^{\top}\right) -\left( d+2\right) \mathbf{I}_{d}=\operatorname*{E}\left( \mathbf{Y}^{\top }\mathbf{Y}\right) \mathbf{Y}\mathbf{Y}^{\top}-\left( d+2\right) \mathbf{I}_{d}. $$

Then \( \operatorname *{vec}K\left (\mathbf {Y}\right ) =\left (\mathbf {I}_{d^{2} }\otimes \left (\operatorname *{vec}\mathbf {I}_{d}\right )^{\top }\right ) \operatorname *{E}\mathbf {Y}^{\otimes 4}-\left (d+2\right ) \operatorname *{vec}\mathbf {I}_{d} \) which can be expressed in terms of κ4 as \( \operatorname *{vec}K\left (\mathbf {Y}\right ) =\left (\mathbf {I}_{d^{2} }\otimes \left (\operatorname *{vec}\mathbf {I}_{d}\right )^{\top }\right ) \boldsymbol {\kappa }_{4}\).

As in the case of their skewness measure, this measure does not take into account the contribution of cumulants of the type \(\operatorname *{E}\left (Y_{r}Y_{s}Y_{t}Y_{u}\right ) \) where r,s,t,u are all different.

Example 10.

Malkovich and Afifi (1973). Similar to the discussion regarding skewness, a measure proposed by Malkovich and Afifi simply provides a different derivation of the total kurtosis in the form

$$ {b_{2}^{*}}\left( \mathbf{Y}\right) =\sup_{\mathbf{u}}\left( \left( \mathbf{u}^{\top\otimes4}\boldsymbol{\kappa}_{4}\right)^{2}\right) \leq \left\Vert \boldsymbol{\kappa}_{4}\right\Vert^{2}, $$

because of the fact that

$$\left( \mathbf{u}^{\top\otimes4} \boldsymbol{\kappa}_{4}\right)^{2}=\left\Vert \mathbf{u}^{\top\otimes 4}\right\Vert^{2}\left\Vert \boldsymbol{\kappa}_{4}\right\Vert^{2}\cos^{2}\left( \mathbf{u}^{\top\otimes4},\boldsymbol{\kappa}_{4}\right) =\left\Vert \boldsymbol{\kappa}_{4}\right\Vert^{2}\cos^{2}\left( \mathbf{u}^{\top\otimes4},\boldsymbol{\kappa}_{4}\right). $$

Remark 1.

It may be noted that the idea used in our (2.6), namely integrating \(\mathbf {u}\left (\mathbf {u}^{\top \otimes 4}\boldsymbol {\kappa }_{Y}\right ) \) over the unit sphere, will not work for the kurtosis since it can be verified that this will result in a zero vector.

Example 11.

Kollo (2008) introduces the kurtosis matrix \(\mathbf {B}\left (\mathbf {Y} \right ) \) as

$$ \begin{array}{@{}rcl@{}} \mathbf{B}\left( \mathbf{Y}\right) & =&\sum\limits_{i,j=1}^{d}\operatorname*{E} Y_{i}Y_{j}\mathbf{Y}\mathbf{Y}^{\top}=\operatorname*{E}\sum\limits\limits_{i,j=1} ^{d}Y_{i}Y_{j}\mathbf{Y}\mathbf{Y}^{\top}=\operatorname*{E}\left( \sum\limits_{j=1}^{d}Y_{i}\right)^{2}\mathbf{Y}\mathbf{Y}^{\top}\\ & =&\operatorname*{E}\left[ \mathbf{1}_{d^{2}}^{\top}\left( \mathbf{Y} \otimes\mathbf{Y}\right) \right] \mathbf{Y}\mathbf{Y}^{\top}. \end{array} $$

Then \( \operatorname *{vec}\mathbf {B}\left (\mathbf {Y}\right ) =\operatorname *{E} \left ({\sum }_{j=1}^{d}Y_{i}\right )^{2}\left (\mathbf {Y}\otimes \mathbf {Y}\right ) , \) which can be written as

$$ \begin{array}{@{}rcl@{}} \operatorname*{vec}\mathbf{B}\left( \mathbf{Y}\right) & =&\operatorname*{E}\left[ \mathbf{1}_{d^{2}}^{\top}\left( \mathbf{Y} \otimes\mathbf{Y}\right) \right] \operatorname*{vec}\mathbf{Y} \mathbf{Y}^{\top}=\operatorname*{E}\mathbf{Y}^{\otimes2}\left[ \mathbf{1}_{d^{2}}^{\top}\left( \mathbf{Y}\otimes\mathbf{Y}\right) \right] \\ & =&\left( \mathbf{I}_{d^{2}}\otimes\mathbf{1}_{d^{2}}^{\top}\right) \operatorname*{E}\mathbf{Y}^{\otimes4}=\left( \mathbf{I}_{d^{2}} \otimes\mathbf{1}_{d^{2}}^{\top}\right) \left( \boldsymbol{\kappa} _{4}+\mathbf{K}_{2,2}\left( \operatorname*{vec}\mathbf{I}_{d}\right)^{\otimes2}\right). \end{array} $$

4 Alternative Measures Based on Distinct Elements of the Cumulant Vectors

The skewness κ3 and the kurtosis κ4 vectors contain d3 and d4 elements respectively, which are not all distinct. Just as the covariance matrix of a d-dimensional vector contains only d2 = d(d + 1)/2 distinct elements, a simple computation shows that κ3 contains d3 = d(d + 1)(d + 2)/6 distinct elements, while κ4 contains d4 = d(d + 1)(d + 2)(d + 3)/24 distinct elements.

Similar to the fact that there are many applications as well as measures which consider only the distinct elements of a covariance matrix, it is quite sensible and reasonable to follow this approach and define skewness and kurtosis measures based on just the distinct elements of the corresponding cumulant vectors. For example, in estimating the “total skewness” index \(\left \Vert \boldsymbol {\kappa }_{3}\right \Vert ^{2}\) discussed in Mardia (1970), one may use the “elimination matrix” since the terms in the summation are symmetric like in a covariance matrix.

Selection of the distinct elements from the vectors κ3 and κ4 can be accomplished via linear transformations. This approach can be traced back to Magnus and Neudecker (1980) who introduce two transformation matrices, L and D, which consist of zeros and ones. For any \(\left (n,n\right ) \) arbitrary matrix A, L eliminates from vec(A) the supra-diagonal elements of A, while D performs the reverse transformation for a symmetric A.

In the case of a covariance matrix Σ of a vector X, the elimination matrix L above is a matrix acting on vec(Σ) and in the approach defined in this paper it holds that \(\operatorname {vec}(\boldsymbol {\Sigma })= \underline {\operatorname *{Cum}}_{2}\left (\mathbf {X},\mathbf {X}\right ) \), i.e. the distinct elements of vec(Σ) correspond to the distinct elements of a tensor product XX. In this way the elimination matrix L can be generalized to tensor products of higher orders in a simple way.

We shall use elimination matrices \(\mathbf {G}\left (3,d\right ) \) and \(\mathbf {G}\left (4,d\right ) \), for shortening the skewness and kurtosis vectors respectively and keeping just the distinct entries in them. See Meijer (2005) for details where the notations \(\mathbf {T}_{3}{{~}^{+}}\) and \(\mathbf {T}_{4}{{~}^{+}}\) are used respectively, which are actually the Moore–Penrose inverses of triplication and quadruplication matrices. This gives cumulant vectors of distinct elements as

\(\boldsymbol {\kappa }_{3,D}=\mathbf {G} \left (3,d \right ) \boldsymbol {\kappa }_{3}\) and \(\boldsymbol {\kappa }_{4,D}=\mathbf {G} \left (4,d \right ) \boldsymbol {\kappa }_{4}\).

In such a case, the distinct element vector has dimension \(\dim \left (\boldsymbol {\kappa }_{3,D}\right ) =d\left (d+1\right ) \left (d+2\right ) /6\). For instance, when d = 2, \(\dim \left (\boldsymbol {\kappa } _{3,D}\right ) \) is 50% of the \(\dim \left (\boldsymbol {\kappa }_{3}\right ) \), and in general, the percentage of distinct elements decreases in the proportion \(\left (1+1/d\right ) \left (1+2/d\right ) /6\), getting close to 1/6 for large d. Similarly, the fraction of distinct elements in κ4,D relative to κ4 approaches 1/24 for large d — a significant reduction.

Following the discussion of this section one could define indexes of total skewness and total kurtosis exploiting the square norms of the skewness and kurtosis vectors containing only the distinct elements, i.e.

$$ \kappa_{3,D} = ||\boldsymbol{\kappa}_{3,D}||^{2} = || \mathbf{G}\left( 3,d\right) \boldsymbol{\kappa}_{3}||^{2} \quad\text{and} \quad \kappa_{4,D}= ||\boldsymbol{\kappa}_{4,D}||^{2} = || \mathbf{G}\left( 4,d\right) \boldsymbol{\kappa}_{4}||^{2} . $$
(4.1)

In Section 7 some numerical evidence about the performance of these indexes, which eliminate duplication of information, will be given.

5 Multivariate Symmetric Distributions

Two important classes of symmetric distributions are the spherically symmetric distributions and the elliptically symmetric distributions. In this section, we discuss them in turn and derive their cumulants. Some related results and discussion may be found in Fang et al. (2017).

5.1 Multivariate Spherically Symmetric Distributions

A d-vector \(\mathbf {W}=(W_{1}, {\dots } , W_{d})^{\top }\) has a spherically symmetric distribution if that distribution is invariant under the group of rotations in \(\mathbb {R}^{d}\). This is equivalent to saying that W has the stochastic representation

$$ \mathbf{W}=R\mathbf{U}, $$
(5.1)

where R is a non negative random variable, \(\mathbf {U}=(U_{1}, {\dots } , U_{d})^{\top }\) is uniform on the sphere \(\mathbb {S}_{d-1}\), and R and U are independent (see e.g. Fang et al., 2017, Theorem 2.5). The moments of the components of W, when they exist, can be expressed in terms of a one-dimensional integral, Fang et al., (2017 Theorem 2.8, p. 34) and the characteristic function has the form

$$ \phi_{\mathbf{W}}\left( \boldsymbol{\lambda}\right) =g\left( \boldsymbol{\lambda}^{\intercal}\boldsymbol{\lambda}\right) , $$
(5.2)

where g is called the “characteristic generator” and R is the generating variate, with say, a generating distribution F. The relationship between the distribution F of R and g is given through the characteristic function of the uniform distribution on the sphere (see Fang et al., 2017, p. 30).

The marginal distributions of any such spherically or elliptically symmetric distributions have zero skewness, and the same kurtosis value given by the “kurtosis parameter” of the form (see Muirhead, 2009, p. 41)

$$ \kappa_{0}=\frac{g^{\prime\prime}\left( 0\right) -g^{\prime}\left( 0\right)^{2}}{g^{\prime}\left( 0\right)^{2}}. $$
(5.3)

The next lemma provides the moments of the uniform distribution on \(\mathbb {S}_{d-1}\), and is proved in the Appendix.

Lemma 1.

Let U be uniform on sphere \(\mathbb {S}_{d-1}\). Then

  1. 1.

    For odd-order moments

    $$\operatorname*{E}\mathbf{U}^{\otimes\left( 2k+1\right) }=0,\qquad \underline{\operatorname*{Cum}}_{2k+1}\left( \mathbf{U}\right) =0,$$

    while for even-order moments

    $$ \operatorname*{E} {\displaystyle\prod\limits_{i=1}^{d}} U_{i}^{2k_{i}}=\frac{1}{\left( d/2\right)_{k}} {\displaystyle\prod\limits_{i=1}^{d}} \frac{\left( 2k_{i}\right) !}{2^{2k_{i}}k_{i}!}. $$
    (5.4)

    For T-products we have

    $$ \operatorname*{E}\mathbf{U}^{\otimes4}=\frac{1}{d\left( d+2\right) }\left( 3\mathbf{e}\left( d^{3},d\right) +\mathbf{e}\left( d^{4}\right) \right) , $$
    (5.5)

    where zero-one vectors \(\mathbf {e}\left (d^{3},d\right ) \ \) and \(\mathbf {e}\left (d^{4}\right ) \) are given by Eq. 1.8 and 1.9 respectively, more over the sum of the entries is

    $$ \sum\operatorname*{E}\mathbf{U}^{\otimes4}=\frac{3d}{d+2}. $$
  2. 2.

    The odd moments of the modulus

    $$ \operatorname*{E} {\displaystyle\prod\limits_{i=1}^{d}} \left\vert U_{i}\right\vert^{k_{i}}=\sqrt{\frac{1}{\pi^{d_{1}}}}\frac {1}{G_{k}} {\displaystyle\prod\limits_{i=1}^{d_{1}}} {\Gamma}\left( \left( k_{i}+1\right) /2\right) $$
    (5.6)

    where \(k=\sum k_{i}\),see Eq. 6.3 for Gk, and d1 is the number of nonzero ki, in particular

    $$ \operatorname*{E}\left\vert U_{i}\right\vert^{2k+1}=\sqrt{\frac{1}{\pi}} \frac{k!}{G_{2k+1}}. $$

    For T-products we have

    $$ \operatorname{E}\left\vert \mathbf{U}\right\vert^{\otimes2}=\frac{1} {d}\operatorname*{vec}\mathbf{I}_{d}+\frac{1}{\pi}\frac{1}{G_{2} }\left( \boldsymbol{1}_{d^{2}}-\operatorname*{vec}\mathbf{I} _{d}\right) $$

    and

    $$ \operatorname{E}\left\vert \mathbf{U}\right\vert^{\otimes3}=\sqrt{\frac {1}{\pi}}\frac{1}{G_{3}}\left( \boldsymbol{1}_{d^{3}}\left( 3\right) +\frac{1}{2}\boldsymbol{1}_{d^{3}}\left( 1,2\right) +\frac{1}{\pi }\boldsymbol{1}_{d^{3}}\left( 1,1,1\right) \right) , $$

    where the zero-one vectors are given in Eq. 1.10 and

    $$ \begin{array}{@{}rcl@{}} \operatorname{E}\left\vert \mathbf{U}\right\vert^{\otimes4} & =&\frac {3}{d\left( d+2\right) }\boldsymbol{1}_{d^{4}}\left( 4\right) +\frac {1}{4G_{4}}\boldsymbol{1}_{d^{4}}\left( 2,2\right) \\ && +\frac{1}{\pi}\frac{1}{G_{4}}\left( \boldsymbol{1}_{d^{4}}\left( 1,3\right) +\sqrt{\frac{1}{\pi}}\boldsymbol{1}_{d^{4}}\left( 2,1,1\right) +\frac{1}{\pi}\boldsymbol{1}_{d^{4}}\left( 1,1,1,1\right) \right) \end{array} $$

    where the zero-one vectors are given in Eq. 1.11.

Remark 2.

Observe that there are only three distinct elements of the third order moments, and four distinct elements of the fourth order moments!

The next lemma provides the cumulants of W (proof in the Appendix).

Lemma 2.

Let W be spherically symmetric with the representation W = RU and characteristic generator g having second derivative at 0; then \( \underline {\operatorname *{Cum}}_{1}\) \(\left (\mathbf {W}\right )= \underline {\operatorname *{Cum}}_{3}\left (\mathbf {W}\right ) =0\), \( \underline {\operatorname *{Cum}}_{2}\left (\mathbf {W}\right ) =-2g^{\prime }\left (0\right ) \operatorname *{vec}\mathbf {I}_{d}, \) and

$$ \underline{\operatorname*{Cum}}_{4}\left( \mathbf{W}\right) =4\left( g^{\prime\prime}\left( 0\right) -g^{\prime}\left( 0\right)^{2}\right) \mathbf{K}_{2,2}\left( \operatorname*{vec}\mathbf{I}_{d}\right)^{\otimes 2}. $$

In terms of kurtosis parameter κ0, we have \(\kappa _{4}=3d(d+2){\kappa _{0}^{2}}\), \(\kappa _{4,D}={\kappa _{0}^{2}}(9d+d(d-1)/2)\) and Mardia’s β2,d = d(d + 2)(κ0 + 1).

From the representation of W given in Eq. 5.1, it is easy to see that \( \underline {\operatorname *{Cum}}_{2\ell +1}\left (\mathbf {W}\right ) =0,\quad \ell =0,1\ldots , \) while the second and fourth order cumulants are calculated directly using the Lemmas 1 and 2 above, so that we have the following

Theorem 1.

If W is spherically distributed with the representation W = RU and \(\operatorname {E}(R^{4})<\infty \), then \( \underline {\operatorname *{Cum}}_{2\ell +1}\left (\mathbf {W}\right ) = 0\), \( \underline {\operatorname *{Cum}}_{2}\left (\mathbf {W}\right ) =\frac {\operatorname *{E}R^{2}}{d}\operatorname *{vec}\mathbf {I}_{d} \) and

$$ \underline{\operatorname*{Cum}}_{4}\left( \mathbf{W}\right) =\operatorname*{E}R^{4}\operatorname*{E}\mathbf{U}^{\otimes4}-3\left( \frac{\operatorname*{E}R^{2}}{d}\right)^{2}\left( \operatorname*{vec} \mathbf{I}_{d}\right)^{\otimes2}. $$

In terms of these moments, the kurtosis parameter becomes

$$ \kappa_{0}=\left( \frac{d}{d+2}\frac{\operatorname*{E}R^{4}}{\left( \operatorname*{E}R^{2}\right)^{2}}-1\right) . $$

5.2 Multivariate elliptically symmetric distributions

A d-vector X has an elliptically symmetric distribution if it has the representation

$$ \mathbf{X}=\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1/2}\mathbf{W} $$

where \(\boldsymbol {\mu }\in \mathbb {R}^{d}\), Σ is a variance-covariance matrix and W is spherically distributed. Hence the cumulants of X are just constant times the cumulants of W except for the mean i.e.

$$ \underline{\operatorname*{Cum}}_{m}\left( \mathbf{X}\right) =\left( \boldsymbol{\Sigma}^{1/2}\right)^{\otimes m}\underline{\operatorname*{Cum} }_{m}\left( \mathbf{W}\right) , $$

from which one gets \( \underline {\operatorname *{Cum}}_{1}\left (\mathbf {X}\right ) =\boldsymbol {\mu }\), \(\underline {\operatorname *{Cum}}_{2\ell +1}\left (\mathbf {X}\right ) =0\), = 1,… and

$$\underline{\operatorname*{Cum} }_{2\ell}\left( \mathbf{X}\right) =\left( \boldsymbol{\Sigma} ^{1/2}\right)^{\otimes2\ell}\underline{\operatorname*{Cum}}_{2\ell}\left( \mathbf{W}\right) . $$

Moments of elliptically symmetric distributions have also been discussed by Berkane and Bentler (1986). As a special case, we now discuss

5.2.1 Multivariate t-distribution

Multivariate t-distribution is spherically symmetric (see Example 2.5 Fang et al., 2017, p.32). Consult also the monograph by Kotz and Nadarajah (2004) for further details. Let

$$ \mathbf{W}=\sqrt{\frac{m}{S^{2}}}\mathbf{Z} $$

where \(\mathbf {Z}\in \mathcal {N}_{d}\left (0,\mathbf {I}_{d}\right ) \) standard normal, and S2 is χ2 distributed with m degrees of freedom\(.\mathbf {W}\in Mt_{d}\left (m,0,\mathbf {I}_{d}\right ) \), we have

$$ \mathbf{W} =\sqrt{m}\frac{\left\Vert \mathbf{Z} \right\Vert }{S}\mathbf{Z}/\left\Vert \mathbf{Z}\right\Vert =R^{\ast }\mathbf{U}, $$
(5.7)

where \(R^{\ast }=\sqrt {m}\left \Vert \mathbf {Z}\right \Vert /S\), and R∗2/d has an F-distribution with d and m degrees of freedom. Let \(\boldsymbol {\mu }\in \mathbb {R}^{d}\), and A is an d × d matrix and X = μ + AW then \(\mathbf {X}\in Mt_{d}\left (m,\boldsymbol {\mu ,}\boldsymbol {\Sigma }\right ) \), where Σ = AA, hence X is an elliptically symmetric random variable. Since the characteristic function is quite complicated (see Fang et al., 2017 Section 3.3.6 p. 85), we utilize the stochastic representation given in Eq. 5.7 for deriving the kurtosis (note that the skewness is zero). The proof of the next Lemma is in the Appendix.

Lemma 3.

Let W be a multivariate t-vector with dimension d and degrees of freedom m, with m > 4, then EW = 0, \( \underline {\operatorname *{Cum}}_{2}\left (\mathbf {W}\right ) =\frac {m}{m-2}\operatorname *{vec}\mathbf {I}_{d}\) and \(\underline {\operatorname *{Cum} }_{3}\left (\mathbf {W}\right ) =0\). From Theorem 1, the kurtosis parameter in Eq. 5.3 becomes \(\kappa _{0}=\frac {2}{m-4}\), and the kurtosis κ4 = 3d2κ0. Moreover, if \(\mathbf {X}\in Mt_{d}\left (m,\boldsymbol {\mu },\boldsymbol {\Sigma }\right ) \), where Σ = AA, then EX = μ,

$$ \underline{\operatorname*{Cum}}_{2}\left( \mathbf{X}\right) =\frac {m}{m-2}\operatorname*{vec}\boldsymbol{\Sigma},\qquad \underline{\operatorname*{Cum}}_{3}\left( \mathbf{X}\right) =0, $$

and kurtosis parameter κ0 and kurtosis κ4 are the same as for W.

6 Multivariate Skew Distributions

Starting with Azzalini and Dalla Valle (1996) who suggest methods for obtaining multivariate skew-normal distributions, several authors discuss different approaches for obtaining asymmetric multivariate distributions, by skewing a spherically or an elliptically symmetric distribution. We mention here (Branco and Dey, 2001) who extend the work in Azzalini and Dalla Valle (1996) to multivariate skew-elliptical distribution, Arnold and Beaver (2002) who use a conditioning approach on a d-dimensional random vector to an elliptically contoured distribution (although this conditioning is not strictly necessary), Sahu et al. (2003) who use transformation and conditioning techniques, Dey and Liu (2005) who discuss an approach based on linear constraints, Genton and Loperfido (2005) who introduce a general class of multivariate skew-elliptical distributions– the so-called multivariate generalized skew-elliptical (GSE) distribution. See also Genton (2004) and the references therein.

Here we provide a systematic treatment of several skew-multivariate distributions by providing general formulae for cumulant vectors up to the fourth order, which are needed in deriving the corresponding skewness and kurtosis measures discussed in Sections 2 and 3.

6.1 Multivariate skew spherical distributions

Let \(\mathbf {Z}=\left [ \mathbf {Z}_{1}^{\top },\mathbf {Z}_{2}^{\top }\right ]^{\top }\) be spherically symmetric distributed in dimension (m + d). Define the canonical fundamental skew-spherical (CFSS) distribution (Arellano-Valle and Genton 2005, Prop. 3.3) by

$$ \mathbf{X}=\boldsymbol{\Delta}\left\vert \mathbf{Z}_{1}\right\vert +\left( \mathbf{I}_{d}-\boldsymbol{\Delta{\Delta}}^{\top}\right)^{1/2}\mathbf{Z}_{2}, $$
(6.1)

where the modulus is taken element-wise, and Δ is the d × m skewness matrix, and let R be the generating variate. A simple construction of Δ is given by \(\boldsymbol {\Delta }=\boldsymbol {\Lambda }\left (\mathbf {I}_{m}+\boldsymbol {\Lambda }^{\top }\boldsymbol {\Lambda }\right )^{-1/2}\) with some real matrix Λ of dimension d × m. If Z = RU, then \(\mathbf {Z}_{1}=p_{1}R\mathbf {U}_{1}\in \mathbb {R}^{m}\) and \(\mathbf {Z}_{2}=p_{2}R\mathbf {U}_{2}\in \mathbb {R}^{d}\), then \({p_{1}^{2}}\) is Beta\(\left (m/2,d/2\right ) \), \({p_{2}^{2}}=1-{p_{1}^{2}}\), which is Beta\(\left (d/2,m/2\right ) \). The variables R, \({p_{1}^{2}}\), U1 and U2 are independent by Theorem 2.6 in Fang et al. (2017). Then we have EZ = 0 and \(\operatorname *{Cov}\left (\mathbf {Z}\right ) =\operatorname *{E}R^{2}\mathbf {I}/d. \) Introduce the function

$$ G_{\ell_{1},\ell_{2}}\left( m,d\right) =\frac{B\left( m/2+\ell_{1} ,d/2+\ell_{2}\right) }{B\left( m/2,d/2\right) }. $$
(6.2)

and,

$$ G_{k}(d)=\frac{\Gamma\left( \left( d+k\right) /2\right) }{\Gamma\left( d/2\right) } $$
(6.3)

written as Gk for short. We may express the joint moments of p1, p2 as

$$ \operatorname*{E}p_{1}^{\ell_{1}}p_{2}^{\ell_{2}} =\frac{1}{B\left( m/2,d/2\right) }{{\int}_{0}^{1}}x^{m/2+\ell_{1}-1}\left( 1-x\right)^{d/2+\ell_{2}-1}dx =G_{\ell_{1},\ell_{2}}\left( m,d\right) . $$
(22)

In particular \(\operatorname *{E}p_{1}=G_{1,0}\left (m,d\right ) ,\)\(\operatorname *{E}p_{1}p_{2}=G_{1,1}\left (m,d\right ) \). The cumulants of X can be obtained from these moments. Using Eq. 6.4 above, and Lemma (1) for the moments of pk and \(\left \vert \mathbf {U}_{j}\right \vert \), we obtain the following result (see the Appendix for the proof).

Theorem 2.

Let the d-vector X have a CFSS distribution as defined in Eq. 6.1 and denote by \(\mathbf {V}_{2}=\left (\mathbf {I}-\boldsymbol {\Delta {\Delta }}^{\top }\right )^{1/2}\mathbf {Z}_{2}\) for ease of notation, with

$$ \operatorname*{E}\mathbf{V}_{2}^{\otimes2}=\frac{1}{d}\left( \operatorname*{vec}\mathbf{I}_{d}-\boldsymbol{\Delta}^{\otimes2} \operatorname*{vec}\mathbf{I}_{m}\right) . $$

The moments of X, assuming \(\operatorname {E}(R^{4})<\infty \), are given by:

$$ \operatorname*{E}\mathbf{X}=\boldsymbol{\Delta}\operatorname*{E} p_{1}\operatorname*{E}R\mathbf{1}_{m}, $$
$$ \operatorname*{E}\mathbf{X}^{\otimes2}=\operatorname*{E}p_{1} ^{2}\operatorname*{E}R^{2}\boldsymbol{\Delta}^{\otimes2}\operatorname{E} \left\vert \mathbf{U}_{1}\right\vert^{\otimes2}+\operatorname*{E}\left( p_{2}R\right)^{2}\operatorname*{E}\mathbf{V}_{2}^{\otimes2}, $$
$$ \operatorname{E}\mathbf{X}^{\otimes3}=\operatorname{E}{p_{1}^{3}} \operatorname{E}R^{3}\operatorname{E}\left\vert \mathbf{U}_{1}\right\vert^{\otimes3}+\operatorname*{E}p_{1}{p_{2}^{2}}\operatorname*{E}R^{3} \mathbf{K}_{2,1}\left( \mathbf{I}_{d^{2}}\otimes\boldsymbol{\Delta}\right) \left( \operatorname*{E}\mathbf{V}_{2}^{\otimes2}\otimes\operatorname*{E} \left\vert \mathbf{U}_{1}\right\vert \right) , $$
$$ \begin{array}{@{}rcl@{}} \operatorname{E}\mathbf{X}^{\otimes4}=\operatorname{E}{p_{1}^{4}} \operatorname{E}R^{4}\boldsymbol{\Delta}^{\otimes4}\operatorname{E}\left\vert \mathbf{U}_{1}\right\vert^{\otimes4}+\operatorname*{E}{p_{1}^{2}}p_{2}^{2}\operatorname*{E}R^{4}\mathbf{K}_{2,1,1}\left( \boldsymbol{\Delta}^{\otimes2}\otimes\mathbf{I}_{m^{2}}\right) \times \\ \qquad \times \left( \operatorname{E} \left\vert \mathbf{U}_{1}\right\vert^{\otimes2}\otimes\operatorname*{E} \mathbf{V}_{2}^{\otimes2}\right) +\left( \operatorname*{E}{p_{2}^{4}}\operatorname*{E}R^{4}\right) \boldsymbol{\Delta}_{1}^{\otimes4}\operatorname{E}\mathbf{U}_{2}^{\otimes 4},\qquad\qquad \end{array} $$

where \(\boldsymbol {\Delta }_{1}=\left (\mathbf {I}_{d}-\boldsymbol {\Delta {\Delta }}^{\top }\right )^{1/2}\), with the commutators as given in Eqs. 1.1 and 1.2. The cumulants of X are obtained by using the relations in Section 1.2.

Remark 3.

It can be seen that the cumulants depend on Δ, and the moments of the generating variate R. For instance

$$ \underline{\operatorname*{Cum}}_{2}\left( \mathbf{X}\right) =\operatorname*{E}R^{2}\mathbf{D}_{2}\left( m,d,\boldsymbol{\Delta}\right) -\left( \operatorname*{E}R\right)^{2}\left( \operatorname*{E}p_{1}\right)^{2}\left( \operatorname*{E}\left\vert \mathbf{U}_{1}\right\vert \right)^{\otimes2} $$

where

$$ \begin{array}{@{}rcl@{}} \mathbf{D}_{2}\left( m,d,\boldsymbol{\Delta}\right) =\frac{G_{2,0}} {m}\boldsymbol{\Delta}^{\otimes2}\left( \operatorname*{vec}\mathbf{I} _{m}+\frac{1}{\pi}\frac{1}{G_{2}\left( m\right) }\left( \boldsymbol{1} _{m^{2}}-\operatorname*{vec}\mathbf{I}_{m}\right) \right) \\ +\frac{G_{0,2}} {d}\left( \mathbf{I}_{d^{2}}-\boldsymbol{\Delta}^{\otimes2}\right) \operatorname*{vec}\mathbf{I}_{m}. \end{array} $$

6.2 Multivariate Skew-t Distribution

This distribution goes back to Azzalini and Capitanio (2003); consult also Kim and Mallick (2003) for a derivation of moments up to the 4th order and moments of quadratic forms of the multivariate skew t-distribution. Let X be a d-dimensional vector having multivariate skew-normal distribution, \(SN_{d}\left (0,\boldsymbol {\Omega },\boldsymbol {\alpha }\right )\) (see Section 6.3 below for details), and S2 be a random variable which follows a χ2 distribution with m degrees of freedom. Then the random vector

$$ \mathbf{V}=\boldsymbol{\mu}+\frac{\sqrt{m}}{S}\mathbf{X} $$

has a multivariate skew t-distribution denoted by \(St_{d}\left (\boldsymbol {\mu },\boldsymbol {\Omega },\boldsymbol {\alpha },m\right ) \). Derivation of the cumulants of V up to the 4th order are provided in the next theorem, with detailed proof given in the Appendix.

Theorem 3.

Let \(\mathbf {V}\in St_{d}\left (\boldsymbol {\mu },\boldsymbol {\Omega },\boldsymbol {\alpha },m\right ) \) with m > 4, then its first four cumulants are:

$$ \underline{\operatorname*{Cum}}_{1}\left( \mathbf{V}\right) =\operatorname*{E}\mathbf{V}=\boldsymbol{\mu}+G_{1}\left( m\right) \sqrt{\frac{m}{\pi}}\boldsymbol{\delta}, $$
$$ \underline{\operatorname*{Cum}}_{2}\left( \mathbf{V}\right) =\frac {m}{m-2}\operatorname*{vec}\boldsymbol{\Omega}-\frac{m}{\pi}\left( G_{1}\left( m\right) \right)^{2}\boldsymbol{\delta}^{\otimes2}, $$
$$ \begin{array}{@{}rcl@{}} \underline{\operatorname*{Cum}}_{3}\left( \mathbf{V}\right) =\sqrt {\frac{2}{\pi}}\left( \frac{m}{2}\right)^{3/2}G_{1}\left( m\right) \times \\ \left( \left( \frac{4}{\pi}G_{1}\left( m\right)^{2}-\frac{2}{m-3}\right) \boldsymbol{\delta}^{\otimes3}+ \frac{2}{\left( m-3\right) \left( m-2\right) }\mathbf{K}_{2,1}\left( \operatorname*{vec}\boldsymbol{\Omega}\otimes \boldsymbol{\delta}\right) \right) , \end{array} $$
$$ \begin{array}{@{}rcl@{}} \underline{\operatorname*{Cum}}_{4}\left( \mathbf{V}\right) & =&\frac {4}{\pi}\left( \frac{m}{2}\right)^{2}G_{-1}\left( m\right)^{2}\left( \frac{4}{m-3}-\frac{6}{\pi}G_{-1}\left( m\right)^{2}\right) \boldsymbol{\delta}^{\otimes4}\\ && +\left( \frac{m}{2}\right)^{2}\frac{8}{\left( m-4\right) \left( m-2\right)^{2}}\mathbf{K}_{2,2}\left( \left( \operatorname*{vec} \boldsymbol{\Omega}\right)^{\otimes2}\right) \\ && -\frac{8}{\pi}\left( \frac{m}{2}\right)^{2}G_{-1}\left( m\right)^{2}\frac{1}{\left( m-3\right) \left( m-2\right) }\mathbf{K} _{2,1,1}\left( \operatorname*{vec}\boldsymbol{\Omega}\otimes\boldsymbol{\delta }^{\otimes2}\right). \end{array} $$

From Theorem 3 the skewness and kurtosis vectors are \(\boldsymbol {\kappa }_{3}=\left (\boldsymbol {\Sigma }_{\mathbf {V} }^{-1/2}\right )^{\otimes 3}\) \(\underline {\operatorname *{Cum}}_{3}\left (\mathbf {V}\right ) ,\) and \(\boldsymbol {\kappa }_{4}=\left (\boldsymbol {\Sigma }_{\mathbf {V}}^{-1/2}\right )^{\otimes 4} \underline {\operatorname *{Cum}}_{4}\left (\mathbf {V}\right ) \), where

$$ \boldsymbol{\Sigma}_{\mathbf{V}}=\operatorname*{Var}\mathbf{V}=\frac {m}{m-2}\boldsymbol{\Omega}-\frac{m}{\pi}\left( G_{1}\left( m\right) \right)^{2}\boldsymbol{\delta}\boldsymbol{\delta}^{\top}. $$

6.3 Multivariate Skew-Normal Distribution

Consider the multivariate skew-normal distribution introduced by Azzalini and Dalla Valle (1996), whose marginal densities are scalar skew-normals. A d-dimensional random vector X is said to have a multivariate skew-normal distribution, \(\text {SN}_{d}\left (\boldsymbol {\mu },\boldsymbol {\Omega },\boldsymbol {\alpha }\right ) \) with shape parameter α if it has the density function

$$ 2\varphi\left( \mathbf{X};\boldsymbol{\mu},\boldsymbol{\Omega}\right) {\Phi}\left( \boldsymbol{\alpha}^{\top}\left( \mathbf{X}-\boldsymbol{\mu }\right) \right) , \quad\mathbf{X} \in\mathbb{R}^{d}. $$

where \(\varphi \left (\mathbf {X};\boldsymbol {\mu },\boldsymbol {\Omega }\right ) \) is the d-dimensional normal density with mean μ and correlation matrix Ω; here φ and Φ denote the univariate standard normal density and the cdf. The cumulant function of \(\text {SN}_{d}\left (0,\boldsymbol {\Omega },\boldsymbol {\alpha }\right ) \) is given by

$$ C_{\mathbf{X}}\left( \boldsymbol{\lambda}\right) =\log2-\frac{1} {2}\boldsymbol{\lambda}^{\top}\boldsymbol{\Omega}\boldsymbol{\lambda}+\log {\Phi}\left( i\boldsymbol{\delta}^{\top}\boldsymbol{\lambda}\right) , $$

where

$$ \boldsymbol{\delta}=\frac{1}{\left( 1+\boldsymbol{\alpha}^{\top}\mathbf{\Sigma} \boldsymbol{\alpha}\right)^{1/2}}\boldsymbol{\Omega}\boldsymbol{\alpha} \qquad\text{and} \qquad\boldsymbol{\alpha}=\frac{1}{\left( 1-\boldsymbol{\delta }^{\top}\mathbf{\Sigma}^{-1}\boldsymbol{\delta}\right)^{1/2}}\boldsymbol{\Omega} ^{-1}\boldsymbol{\delta}. $$
(6.5)

Note that the cumulants of order higher than 2 do not depend on Ω but only on δ. Here we use the approach discussed in this paper to get explicit expressions for the cumulants of the multivariate SN distribution. See also Genton et al. (2001) and Kollo et al. (2018) for moments of SN and its quadratic forms. The proof of the next Lemma is similar to that of Lemma 5 that is coming later, and is omitted.

Lemma 4.

The cumulants of the multivariate skew-normal distribution, \(\text {SN}_{d}\left (\boldsymbol {\mu },\boldsymbol {\Omega },\boldsymbol {\alpha }\right ) \) are the following:

$$ \underline{\operatorname*{Cum}}_{1}\left( \mathbf{X}\right) =\operatorname*{E}\mathbf{X}=\sqrt{\frac{2}{\pi}}\boldsymbol{\delta} ,\qquad\text{and} \qquad\underline{\operatorname*{Cum}}_{2}\left( \mathbf{X}\right) =\operatorname*{vec}\boldsymbol{\Omega}-\frac{2}{\pi }\boldsymbol{\delta}^{\otimes2}, $$

while for k = 3,4…, \( \underline {\operatorname *{Cum}}_{k}\left (\mathbf {X}\right ) =c_{k}\boldsymbol {\delta }^{\otimes k}. \) In particular

$$ c_{3} =2\left( \sqrt{\frac{2}{\pi}}\right)^{3}-\sqrt{\frac{2}{\pi}},\qquad c_{4} =-6\left( \sqrt{\frac{2}{\pi}}\right)^{4}+4\left( \sqrt{\frac{2} {\pi}}\right)^{2}, $$
$$ c_{5} =24\left( \sqrt{\frac{2}{\pi}}\right)^{5}-20\left( \sqrt{\frac {2}{\pi}}\right)^{3}+3\sqrt{\frac{2}{\pi}}. $$

From this Lemma one observes that \(\boldsymbol {\delta }=\sqrt {\pi /2} \boldsymbol {\mu }_{\mathbf {X}}\), from which \(\underline {\operatorname *{Cum} }_{3}\) \(\left (\mathbf {X}\right ) =\left (2-\frac {\pi }{2}\right ) \boldsymbol {\mu }_{\mathbf {X}}^{\otimes 3}\). Hence Mardia’s skewness measure becomes

$$ \beta_{1,d} =\left( \frac{4-\pi}{2}\right)^{2}\left\Vert \underline{\mu }_{\mathbf{X}}^{\otimes3}\right\Vert^{2} =\left( \frac{4-\pi}{2}\right)^{2}\left\Vert \boldsymbol{\mu}_{\mathbf{X}}\right\Vert^{6} $$

and the kurtosis measure \( \beta _{2,d}=\left (2\pi -6\right )^{2}\left (\operatorname *{vec} \mathbf {I}_{d^{2}}\right )^{\top }\boldsymbol {\mu }_{\mathbf {X}}^{\otimes 4}+d(d+2). \)

6.4 Canonical Fundamental Skew-Normal (CFUSN) Distribution

Arellano-Valle and Genton (2005) introduced the CFUSN distribution (cf. their Proposition 2.3), to include all existing definitions of SN distributions. The marginal stochastic representation of X with distribution \(\text {CFUSN}_{d,m}\left (\boldsymbol {\Delta }\right ) \) is given by

$$ \mathbf{X}=\boldsymbol{\Delta}\left\vert \mathbf{Z}_{1}\right\vert +\left( \mathbf{I}_{d}-\boldsymbol{\Delta{\Delta}}^{\top}\right)^{1/2}\mathbf{Z}_{2} $$
(6.6)

where Δ, is the d × m skewness matrix such that \(\left \Vert \boldsymbol {\Delta }\underline {a}\right \Vert <1\), for all \(\left \Vert \underline {a}\right \Vert =1\), and \(\mathbf {Z}_{1}\in \mathcal {N}\left (0,\mathbf {I}_{m}\right ) \) and \(\mathbf {Z}_{2} \in \mathcal {N}\left (0,\mathbf {I}_{d}\right ) \) are independent (Proposition 2.2. (Arellano-Valle and Genton, 2005)). A simple construction of Δ is \(\boldsymbol {\Delta }=\boldsymbol {\Lambda }\left (\mathbf {I}_{m}\mathbf {+}\boldsymbol {\Lambda }^{\top }\boldsymbol {\Lambda }\right )^{-1/2}\) with some real matrix Λ with d × m. The CFUSNd,m \(\left (\boldsymbol {\mu },\boldsymbol {\Sigma },\boldsymbol {\Delta }\right ) \) can be defined via the linear transformation μ + Σ1/2X. We then have the following

$$ \operatorname*{E}\mathbf{X}=\boldsymbol{\Delta}\operatorname*{E}\left\vert \mathbf{Z}_{1}\right\vert =\sqrt{\frac{2}{\pi}}\boldsymbol{\Delta }\mathbf{1}_{m}, $$
$$ \operatorname*{Var}\mathbf{X}=\boldsymbol{\Delta}\operatorname*{Var} \left\vert \mathbf{Z}_{1}\right\vert \boldsymbol{\Delta}^{\top }\boldsymbol{+}\left( \mathbf{I}_{d}-\boldsymbol{\Delta{\Delta}}^{\top}\right)^{1/2} \left( \mathbf{I}_{d}-\boldsymbol{\Delta{\Delta}}^{\top}\right)^{1/2}, $$
$$ \operatorname*{Var}\left\vert \mathbf{Z}_{1}\right\vert =\mathbf{I} _{m}-\frac{2}{\pi}\mathbf{I}_{m},\qquad\operatorname*{Var}\mathbf{X} =\mathbf{I}_{d}-\frac{2}{\pi}\boldsymbol{\Delta{\Delta}}^{\top}. $$

The cumulant function of a \(\text {CFUSN}_{d,m}\left (0,\boldsymbol {\Sigma },\boldsymbol {\Delta }\right ) \) is

$$ C_{\mathbf{X}}\left( \boldsymbol{\lambda}\right) =m\log2-\frac{1} {2}\boldsymbol{\lambda}^{\top}\boldsymbol{\Sigma}\boldsymbol{\lambda}+\log{\Phi}_{m}\left( i\boldsymbol{\Delta}^{\intercal}\boldsymbol{\lambda}\right) , $$

where Φm denotes the standard normal distribution function of dimension m. Note again that the cumulants of order higher than 2 do not depend on Σ but just on Δ. The proof of the next Lemma is given in the Appendix.

Lemma 5.

The cumulants of \(CFUSN_{d,m}\left (0,\boldsymbol {\Sigma },\boldsymbol {\Delta }\right ) \) are given by

$$ \begin{array}{@{}rcl@{}} \underline{\operatorname*{Cum}}_{1}\left( \mathbf{X}\right) & =&\operatorname*{E}\mathbf{X}=\sqrt{\frac{2}{\pi}}\boldsymbol{\Delta }\mathbf{1}_{m}, \end{array} $$
(6.7)
$$ \begin{array}{@{}rcl@{}} \underline{\operatorname*{Cum}}_{2}\left( \mathbf{X}\right) & =&\operatorname*{vec}\boldsymbol{\Sigma}-\frac{2}{\pi}\boldsymbol{\Delta }^{\otimes2}\operatorname*{vec}\mathbf{I}_{m}, \end{array} $$
(6.8)

and for k = 3,4…

$$ \underline{\operatorname*{Cum}}_{k}\left( \mathbf{X}\right) =c_{k}\boldsymbol{\Delta}^{\otimes k}\operatorname*{vec}\left[ \mathbf{e} _{p}^{\otimes k-1}\right]_{p=1:m}. $$
(6.9)

where ek denotes the kth unit vector in \(\mathbb {R}^{m} \). Expressions for ck are provided in Lemma 4.

7 Some Illustrative Examples

Figure 1 provides the contour plots of the density of a \(\text {CFUSN}_{2,2}(\underline {0},\boldsymbol {I}_{2},\boldsymbol {\Delta })\) for selected choices of the generating Λ as described under each figure.

Figure 1
figure 1

Contour plots of the densities of \(\text {CFUSN}_{2,2}(\underline {0},\boldsymbol {I}_{2},\boldsymbol {\Delta })\) distribution with generating Λ as indicated above

Table 1 reports the values of skewness and kurtosis indexes computed with the help of Lemma 5. These results have been further verified by simulations. The index of Malkovich and Afifi and that of Balakrishnan et al. are not reported as they are equivalent to Mardia and MSR measures respectively.

Table 1 Skewness and kurtosis measures for the bivariate CFUSN distribution of Fig. 1

Among the skewness indexes, note that Kollo’s vector measure is not able to capture the presence of skewness (although it is quite strong) in case b). As far as the kurtosis indexes are concerned, note that the total kurtosis κ4 and κ4,D are quite effective in showing differences among the four cases.

Figure 2 reports the contour plots of the density of \(\text {St}_{2}(\underline {0},\boldsymbol {I}_{2},\boldsymbol {\alpha },10)\) for some choices of α as given under the figure. Table 2 reports the corresponding values of skewness and kurtosis measures

Figure 2
figure 2

Contour plots of the densities of Skew t distribution with unit covariance matrix and \(\underline {\alpha }\) as indicated above

Table 2 Skewness and kurtosis measures for the bivariate Skew-t distribution of Fig. 2

8 Concluding Remarks

In this paper we have taken a vectorial approach to express information about skewness and kurtosis of multivariate distributions. This can be achieved by applying a vector derivative operator that we call the T-derivative, to the cumulant generating function. Although some of our methods may appear as similar to some existing results, we demonstrate that they lead to a direct and natural way of expressing higher order cumulants and moments in the multivariate case. This approach can also be easily extended to obtain moments and cumulants beyond the fourth order.

Our careful analysis of existing measures of skewness and kurtosis via the third and fourth order cumulant vectors, reveals some hidden features and relationships among them.

Explicit formulae for cumulant vectors for several distributions have been obtained. Several results are new, such as those in Lemmas 1 and 3, while others complement and complete existing results available in the literature.

The availability of explicit formulae for κ3 and κ4 together with available computing formulae for commutators provides a systematic treatment of higher order moments and cumulants for general classes of symmetric and asymmetric multivariate distributions, which are needed in applications, estimation and testing.