Abstract
Classical symmetric distributions like the Gaussian are widely used. However, in reality data often display a lack of symmetry. Multiple distributions, grouped under the name “skewed distributions”, have been developed to specifically cope with asymmetric data. In this paper, we present a broad family of flexible multivariate skewed distributions for which statistical inference is a feasible task. The studied family of multivariate skewed distributions is derived by taking affine combinations of independent univariate distributions. These are members of a flexible family of univariate asymmetric distributions and are an important basis for achieving statistical inference. Besides basic properties of the proposed distributions, also statistical inference based on a maximum likelihood approach is presented. We show that under mild conditions, weak consistency and asymptotic normality of the maximum likelihood estimators hold. These results are supported by a simulation study confirming the developed theoretical results, and some data examples to illustrate practical applicability.
Similar content being viewed by others
References
Abtahi, A., Towhidi, M. (2013). The new unified representation of multivariate skewed distributions. Statistics, 47(1), 126–140.
Adcock, C., Azzalini, A. (2020). A selective overview of skew-elliptical and related distributions and of their applications. Symmetry, 12(1), 118.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Allman, E. S., Matias, C., Rhodes, J. A., et al. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A), 3099–3132.
Arellano-Valle, R. B., Gómez, H. W., Quintana, F. A. (2005). Statistical inference for a general class of asymmetric distributions. Journal of Statistical Planning and Inference, 128(2), 427–443.
Arnold, B. C., Castillo, E., Sarabia, J. M. (2006). Families of multivariate distributions involving the Rosenblatt construction. Journal of the American Statistical Association, 101(476), 1652–1662.
Azzalini, A. (2013). The skew-normal and related families. Institute of Mathematical Statistics Monographs. Cambridge:Cambridge University Press.
Azzalini, A., Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society, Series B, 65(2), 367–389.
Azzalini, A., Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83(4), 715–726.
Babić, S., Ley, C., Veredas, D. (2019). Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data. Symmetry, 11(10), 1216.
Balakrishnan, N., Captitanio, A. (2008). Discussion: The t family and their close and distant relations. Journal of The Korean Statistical Society, 37, 305–307.
Bauwens, L. (2005). A new class of multivariate skew densities, with application to GARCH models. Journal of Business & Economic Statistics, 23(3), 346–354.
Beckmann, C. F., Smith, S. M. (2004). Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging, 23(2), 137–152.
Cook, R. D., Weisberg, S. (1994). An introduction to regression graphics. Wiley series in probability and mathematical statistics. New York: Wiley.
Eriksson, J., Koivunen, V. (2004). Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters, 11(7), 601–604.
Fechner, G. (1897). Kollektivmasslehre. Liebzig: Engelmann.
Fernández, C., Steel, M. F. J. (1998). On Bayesian modeling of fat tails and skewness. Journal of the American Statistical Association, 93(441), 359–371.
Ferreira, J. T. A. S., Steel, M. F. J. (2007). A new class of skewed multivariate distributions with application in regression analysis. Statistica Sinica, 17, 505–529.
Gijbels, I., Karim, R., Verhasselt, A. (2019). On quantile-based asymmetric family of distributions: Properties and inference. International Statistical Review, 87(3), 471–504.
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: Statistics (pp. 221–233). Berkeley, California: University of California Press.
Jammalamadaka, S. R., Taufer, E., Terdik, G. H. (2020). On multivariate skewness and kurtosis. Sankhya A, 83, 1–38.
Johnson, S. G. (2018). The NLopt nonlinear-optimization package. http://ab-initio.mit.edu/nlopt
Jones, M. C. (2008). The t family and their close and distant relations. Journal of The Korean Statistical Society, 37, 293–302.
Jones, M. C. (2010). Distributions generated by transformation of scale using an extended Cauchy–Schlömilch transformation. Sankhya A, 72, 359–375.
Jones, M. C. (2016). On bivariate transformation of scale distributions. Communications in Statistics-Theory and Methods, 45(3), 577–588.
Kollo, T. (2008). Multivariate skewness and kurtosis measures with an application in ICA. Journal of Multivariate Analysis, 99(10), 2328–2338.
Kotz, S., Kozubowski, T. J., Podgórski, K. (2001). The Laplace distribution and generalizations. New York: Springer.
Ley, C., Paindaveine, D. (2010). Multivariate skewing mechanisms: A unified perspective based on the transformation approach. Statistics and Probability Letters, 80(23–24), 1685–1694.
Liu, R. Y., Parelius, J. M., Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh). The Annals of Statistics, 27(3), 783–858.
Louzada, F., Ara, A., Fernandes, G. (2017). The bivariate alpha-skew-normal distribution. Communications in Statistics—Theory and Methods, 46(14), 7147–7156.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530.
Móri, T. F., Rohatgi, V. K., Székely, G. (1994). On multivariate skewness and kurtosis. Theory of Probability & Its Applications, 38(3), 547–551.
Newey, W. K., McFadden, D. (1994). Large sample estimation and hypothesis testing. In R. F. Engle and D. McFadden (Eds.), Handbook of econometrics (Vol. 4, pp. 2111–2245). North Holland, Amsterdam: Elsevier.
Pircalabelu, E., Claeskens, G., Gijbels, I. (2017). Copula directed acyclic graphs. Statistics and Computing, 27(1), 55–78.
Powell, M. J. D. (2009). The BOBYQA algorithm for bound constrained optimization without derivatives. Report DAMTP 2009/NA06, Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, UK.
Punathumparambath, B. (2012). The multivariate asymmetric slash Laplace distribution and its applications. Statistica, 72(2), 235–249.
Rubio, F. J., Steel, M. F. J. (2013). Bayesian inference for \(P(X<Y)\) using assymetric dependent distibutions. Bayesian Analysis, 8(1), 44–62.
Rubio, F. J., Steel, M. F. J. (2014). Inference in two-piece location-scale models with Jeffreys priors. Bayesian Analysis, 9(1), 1–22.
Rubio, F. J., Steel, M. F. J. (2015). Bayesian modelling of skewness and kurtosis with two-piece scale and shape distributions. Electronic Journal of Statistics, 9(2), 1884–1912.
Stan Development Team. (2021). RStan: The R interface to Stan. R package version 2.21.3. https://mc-stan.org/
Struyf, A. J., Rousseeuw, P. J. (1999). Halfspace depth and regression depth characterize the empirical distribution. Journal of Multivariate Analysis, 69(1), 135–153.
Tan, F., Tang, Y., Peng, H. (2015). The multivariate slash and skew-slash student t distributions. Journal of Statistical Distributions and Applications, 2(1), 1–22.
Tukey, J. W. (1975). Mathematics and the picturing of data. Proceedings of the international Congress of mathematicians, Vancouver, 1975 (Vol. 2, pp. 523–531).
Villani, M., Larsson, R. (2007). The multivariate split normal distribution and asymmetric principal components analysis. Communications in Statistics-Theory and Methods, 35(6), 1123–1140.
Wallis, K. F. (2014). The two-piece normal, binormal, or double Gaussian distribution: Its origin and rediscoveries. Statistical Science, 29(1), 106–112.
Zhang, F. (2011). Matrix theory: Basic results and techniques (2nd ed.). New York: Springer.
Zuo, Y., Serfling, R. (2000). General notions of statistical depth function. The Annals of Statistics, 28(2), 461–482.
Acknowledgements
The authors thank the anonymous reviewers for their valuable comments that led to an improvement of the work. The first and second author gratefully acknowledge support from the Research Fund KU Leuven [C16/20/002 project]. The third author was supported by Special Research Fund (Bijzonder Onderzoeksfonds) of Hasselt University [BOF14NI06].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix: Proofs of Propositions 2, 3 and 4, and of Theorem 1
Appendix: Proofs of Propositions 2, 3 and 4, and of Theorem 1
Proof of Proposition 2
Suppose that \(f({\mathbf{x}};{\varvec{\theta}})=f({\mathbf{x}};{\varvec{\theta}}^*)\) and that we know \({\mathbf{Z}}\) up to its parameters (e.g., \(Z_1\) is of a QBA-logistic type etc.). We first prove that \({\varvec{\mu}}_a\) is identifiable. By construction, \(f_{Z_j}\), \(j=1,\dots ,d\) is unimodal with mode 0. Together with (5), this implies
Thus, \({\varvec{\mu}}_{a} = {\varvec{\mu}}_{a}^*\) and \(\left|\det ({\mathbf{A}})\right|=\left|\det ({\mathbf{A}}^*)\right|\). Hence \({\varvec{\mu}}_a\) is identifiable. Without loss of generality, we can assume that for the remainder of the proof, \({\varvec{\mu}}_a={\mathbf{0}}\).
The identifiability result we are aiming at is commonly referred to as uniqueness in the ICA-literature. In Eriksson and Koivunen (2004), necessary and sufficient conditions are provided for a noiseless ICA model (\({\mathbf{X}}={\mathbf{AZ}}\)) to be unique. These are
-
There are no Gaussian sources. Or,
-
If \({\mathbf{A}}\) has full column rank, there is at most one Gaussian source.
Since \({\mathbf{A}}\in {\mathbb{R}}^{d\times d}\) is non-singular, it has full column rank. If condition \(( I1 )\) holds, the mixing matrix \({\mathbf{A}}\) is unique, i.e., identifiable up to a possible permutation and rescaling together with the accompanying permutation and rescaling of \({\mathbf{Z}}\). A location difference is not possible as \({\mathbf{Z}}\) does not contain a location parameter.
For, the scale ambiguity note that by (3)
By restricting the sign of a single element of \(({\mathbf{A}}^{-1})_{\cdot ,j}\) as in \(( I2 )\), this problem can no longer occur. By
with \({\widetilde{\mathbf{I}}}_j\in {\mathbb{R}}^{d\times d}\) the identity matrix with \(-1\) at \((\widetilde{I}_j)_{j,j}\), fixing the signs of the diagonal elements of \({\mathbf{A}}\) also suffices.
Since each of the \(Z_j\)’s lacks a scaling parameter and none of the other parameters of \(Z_j\) affects the scaling in a linear way (otherwise it is considered a scaling parameter), any rescaling of \({\mathbf{A}}\) cannot be compensated by rescaling the parameters of \({\mathbf{Z}}\). Hence, \({\mathbf{A}}\) is identifiable up to a permutation. By the identifiability of each of the \(Z_j\), also its parameters are uniquely determined up to the same possible permutation. Thus, \({\varvec{\theta}}={\varvec{\theta}}^*\) up to a possible permutation of \({\mathbf{Z}}\) and \({\mathbf{A}}\). Therefore the model is identifiable. □
Proof of Proposition 3
We employ a proof by induction on the dimension of the matrix. For \(d=2\) this is trivial as \({\mathbf{A}}\) is invertible and thus has a non-zero determinant. Suppose the statement holds for any invertible \((d-1)\times (d-1)\)-matrix. Consider the matrix
with \({\mathbf{B}}\in {\mathbb{R}}^{(d-1) \times (d-1)}\), \({\mathbf{C}},{\mathbf{D}}^T\in {\mathbb{R}}^{d-1}\) and \(E\in {\mathbb{R}}\). Since \({\mathbf{A}}\) is invertible, it must hold that
where \(({\mathbf{B}}^{*})_j=\begin{bmatrix} ({\mathbf{B}})_{-j,.} \\ {\mathbf{D}} \end{bmatrix}\), so the \((d-1) \times (d-1)\)-matrix where the j-th row of \({\mathbf{B}}\) is omitted and \({\mathbf{D}}\) is added. Now consider the following two cases.
-
1.
\({\text{det}}({\mathbf{B}})\not =0\) and \(E \not = 0\). By induction, the statement holds for \({\mathbf{A}}\).
-
2.
{\({\text{det}}({\mathbf{B}})\not =0\) and \(E = 0\)} or \({\text{det}}({\mathbf{B}})=0\). In this case, by (25), \(\exists j \in \{1,\dots ,d-1\}\) such that \({\text{det}}(({\mathbf{B}}^{*})_j)\not =0\) and \(C_j\not =0\) . By swapping the j-th row of \({\mathbf{A}}\) with \(({\mathbf{D}}, E)\), the resulting matrix falls into case 1. This holds because the element replacing E is nonzero and the new matrix that takes the place of \({\mathbf{B}}\) is invertible as it is a row permutation of \(({\mathbf{B}}^{*})_j\), thus conserving the nonzero determinant. Hence, the statement holds.
This concludes the proof as the above two cases contain all possible configurations of \({\mathbf{A}}\). □
Proof of Proposition 4
The proof is largely based on similar arguments concerning the consistency of the maximum likelihood estimator for the univariate quantile-based asymmetric family of distributions: Theorem 3.3 in Gijbels et al. (2019), which in term uses Theorem 2.5 of Newey and McFadden (1994). The latter theorem states that under the following conditions (i) to (iv) the maximum likelihood estimator is weakly consistent, i.e., \(\widehat{\varvec{\theta}}_n^{\text{ML}} \overset{P}{\rightarrow }{\varvec{\theta}}_0\) for \(n\rightarrow \infty\).
-
(i)
If \({\varvec{\theta}} \not = {\varvec{\theta}}_0\) then \(f_{\mathbf{X}}({\mathbf{x}};{\varvec{\theta}}) \not = f_{\mathbf{X}}({\mathbf{x}};{\varvec{\theta}}_0)\).
-
(ii)
The true parameter \({\varvec{\theta}}_0 \in {\varvec{\varTheta}}\), with \({\varvec{\varTheta}}\) a parameter space which is compact.
-
(iii)
The log-likelihood function \(\ell \left( {\varvec{\theta}};{\mathbf{x}}\right)\) is continuous at each \({\varvec{\theta}}\in {\varvec{\varTheta}}\).
-
(iv)
It holds that \(E[{\sup }_{{\varvec{\theta}}\in {\varvec{\varTheta}}}\left\Vert \ell \left( {\varvec{\theta}};{\mathbf{X}}\right) \right\Vert ]<\infty\), where \(\left\Vert .\right\Vert\) is the Euclidean norm.
Condition (i) is fulfilled by Proposition 2, in which the identifiability of the parameters is guaranteed by assumption \(( C1 )\). Conditions (ii) and (iii) follow from respectively Assumption \(( C2 )\) and the continuity of both the natural logarithm and \(f_{Z_j}\). So only condition (iv) remains to be checked. From (5) and (12), we have that
where boundedness follows from the invertibility of \({\mathbf{A}}\) and Assumption \(( C3 )\), as proven in Theorem 3.3 of Gijbels et al. (2019). Since the inequality holds for all \({\varvec{\theta}}\in {\varvec{\varTheta}}_R\), condition (iv) is satisfied and consistency of the maximum likelihood estimator holds. □
Proof of Theorem 1
The proof is largely based on Theorem 3 in Huber (1967), which handles asymptotic normality of maximum likelihood estimators for non-differentiable likelihood functions when consistency has been established.
Since consistency is shown in Proposition 4, only the following four conditions from Huber (1967) need to be fulfilled for the theorem to hold
-
(I)
For each fixed \({\varvec{\theta}} \in {\varvec{\varTheta}}\), \({\varvec{\varPsi}}({\mathbf{x}};{\varvec{\theta}})\) is \(\varOmega\)-measurable and \({\varvec{\varPsi}}({\mathbf{x}};{\varvec{\theta}})\) is separable. [See Assumptions A-1 p. 222 of Huber (1967).]
-
(II)
There exists a \({\varvec{\theta}}_0\in {\varvec{\varTheta}}\) for which \({\varvec{\lambda}}({\varvec{\theta}}_0)={\mathbf{0}}\).
-
(III)
There are strictly positive numbers a, b, c, \(r_0\) such that
-
(i)
\(\left\Vert {\varvec{\lambda}}({\varvec{\theta}})\right\Vert \ge a\left\Vert {\varvec{\theta}}-{\varvec{\theta}}_0\right\Vert\) for \(\left\Vert {\varvec{\theta}}-{\varvec{\theta}}_0\right\Vert \le r_0\).
-
(ii)
\(E[u({\mathbf{X}};{\varvec{\theta}},r)]\le br\) for \(\left\Vert {\varvec{\theta}}-{\varvec{\theta}}_0\right\Vert +r\le r_0, \quad r\ge 0\).
-
(iii)
\(E[(u({\mathbf{X}};{\varvec{\theta}},r))^2]\le cr\) for \(\left\Vert {\varvec{\theta}}-{\varvec{\theta}}_0\right\Vert +r\le r_0, \quad r\ge 0\).
-
(i)
-
(IV)
The expectation \(E[\left\Vert {\varvec{\varPsi}}({\mathbf{X}};{\varvec{\theta}})\right\Vert ^2]\) is finite.
These conditions are checked in a similar way as in the proof of Theorem 3.4 in Gijbels et al. (2019), which is already quite general. We start with condition (I). By Lemma 2\({\varvec{\varPsi}}({\mathbf{x}};{\varvec{\theta}})\) is measurable. That \({\varvec{\varPsi}}({\mathbf{x}};{\varvec{\theta}})\) is separable holds under the stated assumptions. Indeed, each of the component functions \({\varvec{\varPsi}}_j({\mathbf{x}};{\varvec{\theta}})\), for \(j=1, \dots , d^2+2d\), is separable, and this is a finite number of functions. That each component function is separable follows from its continuity, except on a set with probability measure zero. Condition (II) is met by Proposition 5, whereas for condition (IV) we have by the definition of the Euclidean norm
where the finiteness follows from Proposition 6.
Remains to look into condition (III). The key property in this is continuity of \({\varvec{\lambda}}({\varvec{\theta}})\) in a neighborhood of \({\varvec{\theta}}_0\), which holds by Lemma 3. The proof can be completed similarly as in Gijbels et al. (2019). For details, the reader is referred to that paper. □
About this article
Cite this article
Baillien, J., Gijbels, I. & Verhasselt, A. Flexible asymmetric multivariate distributions based on two-piece univariate distributions. Ann Inst Stat Math 75, 159–200 (2023). https://doi.org/10.1007/s10463-022-00842-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00842-6