Multivariate cluster weighted models using skewed distributions

Gallaugher, Michael P. B.; Tomarchio, Salvatore D.; McNicholas, Paul D.; Punzo, Antonio

doi:10.1007/s11634-021-00480-5

Multivariate cluster weighted models using skewed distributions

Regular Article
Published: 15 November 2021

Volume 16, pages 93–124, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Michael P. B. Gallaugher¹,
Salvatore D. Tomarchio ORCID: orcid.org/0000-0003-2690-8546²,
Paul D. McNicholas³ &
…
Antonio Punzo²

517 Accesses
7 Citations
Explore all metrics

A Correction to this article was published on 09 December 2021

This article has been updated

Abstract

Much work has been done in the area of the cluster weighted model (CWM), which extends the finite mixture of regression model to include modelling of the covariates. Although many types of distributions have been considered for both the response(s) and covariates, to our knowledge skewed distributions have not yet been considered in this paradigm. Herein, a family of 24 novel CWMs is considered which allows both the responses and covariates to be modelled using one of four skewed distributions (the generalized hyberbolic and three of its skewed special cases, i.e., the skew-t, the variance-gamma and the normal-inverse Gaussian distributions) or the normal distribution. Parameter estimation is performed using the expectation-maximization algorithm and both simulated and real data are used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

Xiang Wan, Wenqian Wang, … Tiejun Tong

Change history

09 December 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11634-021-00487-y

References

Aas K, Hobæk Haff I (2005) NIG and skew student’s t: two special cases of the generalised hyperbolic distribution. Appl Res Dev Res Rep
Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373
Article MathSciNet MATH Google Scholar
Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate $t$-distributions: the $t$ EIGEN family. Stat Comput 22(5):1021–1029
Article MathSciNet MATH Google Scholar
Azzalini A (2020) The R package sn: the skew-normal and related distributions such as the skew-$t$ (version 1.6-1). Università di Padova, Italia. http://azzalini.stat.unipd.it/SN
Baricz Á (2010) Turán type inequalities for some probability density functions. Stud Sci Math Hung 47(2):175–189
MATH Google Scholar
Berta P, Ingrassia S, Punzo A, Vittadini G (2016) Multilevel cluster-weighted models for the evaluation of hospitals. METRON 74(3):275–292
Article MathSciNet MATH Google Scholar
Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43(2):176–198
Article MathSciNet MATH Google Scholar
Chamroukhi F (2017) Skew t mixture of experts. Neurocomputing 266:390–408
Article Google Scholar
Chen L, Pourahmadi M, Maadooliat M (2014) Regularized multivariate regression models with skew-t error distributions. J Stat Plan Inference 149:125–139
Article MathSciNet MATH Google Scholar
Crawford SL (1994) An application of the Laplace method to finite mixture distributions. J Am Stat Assoc 89(425):259–267
Article MathSciNet MATH Google Scholar
Dang UJ, Browne RP, McNicholas PD (2015) Mixtures of multivariate power exponential distributions. Biometrics 71(4):1081–1089
Article MathSciNet MATH Google Scholar
Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34
Article MathSciNet MATH Google Scholar
Dang UJ, Gallaugher MP, Browne RP, McNicholas PD (2019) Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions. arXiv preprint arXiv:1907.01938
Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83(401):173–178
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
MathSciNet MATH Google Scholar
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282
Article MathSciNet MATH Google Scholar
Di Mari R, Bakk Z, Punzo A (2020) A random-covariate approach for distal outcome prediction with latent class analysis. Struct Equ Model 27(3):351–368
Article Google Scholar
Doğru FZ, Arslan O (2017) Parameter estimation for mixtures of skew Laplace normal distributions and application in mixture regression modeling. Commun Stat Theory Methods 46(21):10879–10896
Article MathSciNet MATH Google Scholar
Ferreira CS, Lachos VH, Bolfarine H (2015) Inference and diagnostics in skew scale mixtures of normal regression models. J Stat Comput Simul 85(3):517–537
Article MathSciNet MATH Google Scholar
Frimpong EY, Gage TB, Stratton H (2008) Identifiability of bivariate mixtures: an application to infant mortality models. PhD thesis, Citeseer
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
MATH Google Scholar
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11(2):317–336
Article MATH Google Scholar
Galimberti G, Soffritti G (2020) A note on the consistency of the maximum likelihood estimator under multivariate linear cluster-weighted models. Stat Probab Lett 157:1089630
Article MathSciNet MATH Google Scholar
Gallaugher MPB, McNicholas PD (2017) A matrix variate skew-t distribution. Stat 6(1):160–170
Article MathSciNet Google Scholar
Gallaugher MPB, McNicholas PD (2019) Three skewed matrix variate distributions. Statist Probab Lett 145:103–109
Article MathSciNet MATH Google Scholar
Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24
Article Google Scholar
Göncü A, Yang H (2016) Variance-gamma and normal-inverse Gaussian models: goodness-of-fit to Chinese high-frequency index returns. North Am J Econ Finance 36:279–292
Article Google Scholar
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Article MATH Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Hung W-L, Chang-Chien S-J (2017) Learning-based EM algorithm for normal-inverse Gaussian mixture model with application to extrasolar planets. J Appl Stat 44(6):978–999
Article MathSciNet MATH Google Scholar
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
Article MathSciNet MATH Google Scholar
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Article MathSciNet MATH Google Scholar
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113
Article MathSciNet MATH Google Scholar
Ingrassia S, Punzo A (2016) Decision boundaries for mixtures of regressions. J Korean Stat Soc 45(2):295–306
Article MathSciNet MATH Google Scholar
Jorgensen B (2012) Statistical properties of the generalized inverse Gaussian distribution, vol 9. Springer, New York
Google Scholar
Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83
Article MathSciNet Google Scholar
Kim N-H, Browne R (2019) Subspace clustering for the finite mixture of generalized hyperbolic distributions. Adv Data Anal Classif 13(3):641–661
Article MathSciNet MATH Google Scholar
Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24:181–202
Article MathSciNet MATH Google Scholar
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100(2):257–265
Article MathSciNet MATH Google Scholar
Lin TI (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20(3):343–356
Article MathSciNet Google Scholar
Lin T, McNicholas PD, Hsiu JH (2014) Capturing patterns via parsimonious t mixture models. Statist Probab Lett 88:80–87
Article MathSciNet MATH Google Scholar
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
Article Google Scholar
McNeil AJ, Frey R, Embrechts P (2005) Quantitative risk management: concepts, techniques and tools. Princeton University Press, Princeton
MATH Google Scholar
McNicholas PD (2016a) Mixture model-based classification. Chapman & Hall/CRC Press, Boca Raton
McNicholas PD (2016b) Model-based clustering. J Classif 33(3):331–373
McNicholas SM, McNicholas PD, Browne RP (2017) A mixture of variance-gamma factor analyzers. In: Ahmed SE (ed) Big and complex data analysis, contributions to statistics. Springer, Cham, pp 369–385
Chapter Google Scholar
Murphy K, Murphy TB (2020a) Gaussian parsimonious clustering models with covariates and a noise component. Adv Data Anal Classif 14:293–325
Murphy K, Murphy TB (2020b) MoEClust: Gaussian parsimonious clustering models with covariates and a noise component. R package version 1.3.3. https://cran.r-project.org/package=MoEClust
Murray PM, Browne RB, McNicholas PD (2014a) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335
Murray PM, McNicholas PD, Browne RB (2014b) A mixture of common skew-$t$ factor analyzers. Stat 3(1):68–82
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
Article Google Scholar
Počuča N, Jevtić P, McNicholas PD, Miljkovic T (2020) Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models. Math Econ Insur
Punzo A (2014) Flexible mixture modelling with the polynomial Gaussian cluster-weighted model. Stat Model 14(3):257–291
Article MathSciNet MATH Google Scholar
Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis, studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 201–209
Chapter Google Scholar
Punzo A, Ingrassia S (2016) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Statist 31(3):989–1013
Article MathSciNet MATH Google Scholar
Punzo A, Bagnato L (2021) The multivariate tail-inflated normal distribution and its application in finance. J Stat Comput Simul 91(1):1–36
Article MathSciNet MATH Google Scholar
Punzo A, Ingrassia S, Maruotti A (2018) Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population. Stat Med 37(19):2797–2808
Article MathSciNet Google Scholar
Punzo A, Ingrassia S, Maruotti A (2021) Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. Stat Pap 62(3):1519–1555
Article MathSciNet MATH Google Scholar
Pyne S, Hu X, Wang K, Rossin E, Lin T-I, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA et al (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci 106(21):8519–8524
Article Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
Soffritti G, Galimberti G (2011) Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21(4):523–536
Article MathSciNet MATH Google Scholar
Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat Simul Comput 41(4):510–523
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted $t$-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Article MathSciNet MATH Google Scholar
Tiedeman DV (1955) On the study of types. In: Sells SB (ed) Symposium on pattern analysis. Air University, U.S.A.F. School of Aviation Medicine, Randolph Field, Texas
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Tomarchio SD, McNicholas PD, Punzo A (2021) Matrix normal cluster-weighted models. J Classif 38(3)
Tortora C, Browne RP, ElSherbiny A, Franczak BC, McNicholas PD (2021) Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. J Stat Softw 98(3):1–24
Article Google Scholar
Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Statist Probab Lett 82(6):1169–1174
Article MathSciNet MATH Google Scholar
Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210
Article MathSciNet MATH Google Scholar
Wang K, Ng SK, McLachlan GJ (2009) Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Digital image computing: techniques and applications. IEEE, pp 526–531
Wolfe JH (1965) A computer program for the maximum likelihood analysis of types, technical bulletin. U.S, Naval Personnel Research Activity, pp. 65–15
Zarei S, Mohammadpour A, Ingrassia S, Punzo A (2019) On the use of the sub-Gaussian $\alpha $-stable distribution in the cluster-weighted model. Iran J Sci Technol Trans A Sci 43(3):1059–1069
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Science, Baylor University, Waco, TX, USA
Michael P. B. Gallaugher
Department of Economics and Business, University of Catania, Catania, Italy
Salvatore D. Tomarchio & Antonio Punzo
Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada
Paul D. McNicholas

Authors

Michael P. B. Gallaugher
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore D. Tomarchio
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. McNicholas
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Punzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvatore D. Tomarchio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: “the error in the line after equation (5) has been corrected in original article”.

Appendices

Appendix

Technical details on the ST distribution

In the fashion of Kim and Browne (2019), it is possible to show that the pdf in (12) can be obtained from the pdf in (8), by forcing $\lambda $ and $\omega $ to be a convenient function of $\nu $, by letting $\varvec{\Sigma }$ and $\varvec{\alpha }$ to become large in a controlled way, and by letting $\omega $ to become small in a controlled way. Specifically, let

$$\begin{aligned} {\varvec{\theta }}=\left( {\varvec{\mu }},\gamma ^{-1}\varvec{\alpha },\gamma ^{-1}\varvec{\Sigma },-\frac{\nu }{2},\nu \gamma \right) , \end{aligned}$$

where $\gamma >0$ is a scaling factor. By substituting these parameter values into (8) we obtain

$$\begin{aligned} f_{\text {GH}}(\varvec{v}; {\varvec{\theta }})=&\, \frac{\exp \left[ (\varvec{v}-{\varvec{\mu }})'\varvec{\Sigma }^{-1}\varvec{\alpha }\right] }{(2\pi )^{\frac{d}{2}}| \gamma ^{-1} \varvec{\Sigma }|^{\frac{1}{2}}K_{-\frac{\nu }{2}}(\nu \gamma )} \left[ \frac{\gamma \delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu \gamma }{\gamma ^{-1}\rho (\varvec{\alpha },\varvec{\Sigma })+\nu \gamma }\right] ^{-\frac{\nu +d}{4}}\\&\times K_{-\frac{\nu +d}{2}}\left( \sqrt{\left[ \gamma ^{-1}\rho (\varvec{\alpha },\varvec{\Sigma })+\nu \gamma \right] \left[ \gamma \delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu \gamma \right] }\right) , \end{aligned}$$

which after some manipulation becomes

$$\begin{aligned} f_{\text {GH}}(\varvec{v}; {\varvec{\theta }})=&\, \frac{\gamma ^{-\frac{\nu }{2}}}{K_{-\frac{\nu }{2}}(\nu \gamma )} \frac{\exp \left[ (\varvec{v}-{\varvec{\mu }})'\varvec{\Sigma }^{-1}\varvec{\alpha }\right] }{(2\pi )^{\frac{d}{2}} |\varvec{\Sigma }|^{\frac{1}{2}}} \left[ \frac{\delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu }{\rho (\varvec{\alpha },\varvec{\Sigma })+\nu \gamma ^2}\right] ^{-\frac{\nu +d}{4}}\\&\times K_{-\frac{\nu +d}{2}}\left( \sqrt{\left[ \rho (\varvec{\alpha },\varvec{\Sigma })+\nu \gamma ^2\right] \left[ \delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu \right] }\right) . \end{aligned}$$

Now, letting $\gamma \rightarrow 0$ and by using the following asymptotic relation

$$\begin{aligned} K_{\lambda }\left( x\right) \sim \Gamma \left( -\lambda \right) 2^{-\lambda -1}x^{\lambda },\quad \text {for } x\rightarrow 0 \text { and } \lambda < 0, \end{aligned}$$

we obtain

$$\begin{aligned}&\frac{2\left( \frac{\nu }{2}\right) ^{\frac{\nu }{2}}\exp \left[ (\varvec{v}-{\varvec{\mu }})'\varvec{\Sigma }^{-1}\varvec{\alpha }\right] }{(2\pi )^{\frac{d}{2}}| \varvec{\Sigma }|^{\frac{1}{2}}\Gamma (\frac{\nu }{2})} \left( \frac{\delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu }{\rho (\varvec{\alpha },\varvec{\Sigma })}\right) ^{-\frac{\nu +d}{4}} \\&\qquad \qquad \qquad \qquad \times K_{-\frac{\nu +d}{2}}\left( \sqrt{\rho (\varvec{\alpha },\varvec{\Sigma })\left[ \delta (\varvec{v};{\varvec{\mu }},\varvec{\Sigma })+\nu \right] }\right) , \end{aligned}$$

which is the density reported in (12).

Parameter estimation

Let $\left( \varvec{x}_{1}',\varvec{y}_{1}'\right) ',\ldots ,\left( \varvec{x}_{N}',\varvec{y}_{N}'\right) '$ be a random sample of N independent observations from (15). In the context of the EM algorithm, the random sample is considered incomplete. Specifically, we have two sources of incompleteness. The first source arises from the fact that, for each observation, we do not know its component membership; to govern this source, we use an indicator vector $\varvec{z}_i=\left( z_{i1},\ldots ,z_{iG}\right) $, where $z_{ig}=1$ if observation i is in group g, and $z_{ig}=0$ otherwise. The second source arises if $f\left( \varvec{y}|\varvec{x};{\varvec{\theta }}_{\varvec{Y}|g}\right) $ or $f\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) $ are skewed; to govern this source, we need the latent variables $W_{\varvec{Y}|g}$ and $W_{\varvec{X}|g}$ introduced in (17).

Based on this source of incompleteness, we can write the complete-data log-likelihood in the following way

$$\begin{aligned} l({\varvec{\vartheta }})=l_1(\varvec{\pi })+l_2({\varvec{\theta }}_{\varvec{X}})+l_3({\varvec{\theta }}_{\varvec{Y}}), \end{aligned}$$

(22)

where $\varvec{\pi }=(\pi _1,\ldots ,\pi _G)'$, and

$$\begin{aligned} l_1(\varvec{\pi })=\sum _{g=1}^G\sum _{i=1}^Nz_{ig}\log \left( \pi _g\right) . \end{aligned}$$

If $\varvec{X}$ in component g, $g=1,\ldots ,G$, follows one of the four skewed distributions,

$$\begin{aligned} l_2({\varvec{\theta }}_{\varvec{X}}) =&\sum _{g=1}^G\sum _{i=1}^N z_{ig} \log \left[ h(w_{ig\varvec{X}};{\varvec{\phi }}_{W_{\varvec{X}}|g})\right] + C_{\varvec{X}} \\&\quad -\frac{1}{2}\sum _{g=1}^G\sum _{i=1}^N z_{ig}\big [\log (|\varvec{\Sigma }_{\varvec{X}|g}|)+w_{ig{\varvec{X}}}\varvec{\alpha }_{\varvec{X}|g}'\varvec{\Sigma }_{\varvec{X}|g}^{-1}\varvec{\alpha }_{\varvec{X}|g} \\&\quad + \frac{1}{w_{ig\varvec{X}}}(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g})'\varvec{\Sigma }_{\varvec{X}|g}^{-1}(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g}) \\&\quad -(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g})'\varvec{\Sigma }_{\varvec{X}|g}^{-1}\varvec{\alpha }_{\varvec{X}|g}-\varvec{\alpha }_{\varvec{X}|g}'\varvec{\Sigma }_{\varvec{X}|g}^{-1}(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g})\big ], \end{aligned}$$

where $h(w_{ig\varvec{X}};{\varvec{\phi }}_{W_{\varvec{X}}|g})$ is the appropriate pdf for $W_{ig\varvec{X}}$ discussed in Sect. 2, with parameters notated as ${\varvec{\phi }}_{W_{\varvec{X}}|g}$, while $C_{\varvec{X}}$ is constant with respect to the parameters. On the other hand, if $\varvec{X}$ in component g, $g=1,\ldots ,G$, is normally distributed then

$$\begin{aligned} l_2({\varvec{\theta }}_{\varvec{X}}) = C_{\varvec{X}}-\frac{1}{2}\sum _{g=1}^G\sum _{i=1}^Nz_{ig}[\log (|\varvec{\Sigma }_{\varvec{X}|g}|)+(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g})'\varvec{\Sigma }_{\varvec{X}|g}^{-1}(\varvec{x}_i-{\varvec{\mu }}_{\varvec{X}|g})]. \end{aligned}$$

Similarly, if $\varvec{Y}|\varvec{x}$ in component g, $g=1,\ldots ,G$, is distributed according to one of the four skewed distributions,

$$\begin{aligned} l_3({\varvec{\theta }}_{\varvec{Y}}) =&\sum _{g=1}^G\sum _{i=1}^N z_{ig} \log \left[ h(w_{ig\varvec{Y}};{\varvec{\phi }}_{W_{\varvec{Y}}|g})\right] +C_{\varvec{Y}}\\&\quad -\frac{1}{2}\sum _{g=1}^G\sum _{i=1}^N z_{ig}\big [\log (|\varvec{\Sigma }_{\varvec{Y}|g}|)+w_{ig\varvec{Y}}\varvec{\alpha }_{\varvec{Y}|g}'\varvec{\Sigma }_{\varvec{Y}|g}^{-1}\varvec{\alpha }_{\varvec{Y}|g} \\&\quad + \frac{1}{w_{ig\varvec{Y}}}(\varvec{y}_i-{\varvec{B}_g'\varvec{x}_i^*})'\varvec{\Sigma }_{\varvec{Y}|g}^{-1}(\varvec{y}_i-{\varvec{B}_g'\varvec{x}_i^*})\\&\quad -(\varvec{y}_i-{\varvec{B}_g'\varvec{x}_i^*})'\varvec{\Sigma }_{\varvec{Y}|g}^{-1}\varvec{\alpha }_{\varvec{Y}|g}-\varvec{\alpha }_{\varvec{Y}|g}'\varvec{\Sigma }_{\varvec{Y}|g}^{-1}(\varvec{y}_i-{\varvec{B}_g'\varvec{x}_i^*})\big ], \end{aligned}$$

where $h(w_{ig\varvec{Y}};{\varvec{\phi }}_{W_{\varvec{Y}}|g})$ is the appropriate pdf for $W_{ig\varvec{Y}}$ discussed in Sect. 2, with parameters notationally compacted as ${\varvec{\phi }}_{W_{\varvec{Y}}|g}$, while $C_{\varvec{Y}}$ is constant with respect to the parameters. Conversely, if $\varvec{Y}|\varvec{x}$ in component g, $g=1,\ldots ,G$, is normally distributed,

$$\begin{aligned} l_3({\varvec{\theta }}_{\varvec{Y}})=C_{\varvec{Y}}-\frac{1}{2}\sum _{g=1}^G\sum _{i=1}^Nz_{ig}[\log (|\varvec{\Sigma }_{\varvec{Y}|g}|)+(\varvec{x}_i-{\varvec{\mu }}_{\varvec{Y}|g})'\varvec{\Sigma }_{\varvec{Y}|g}^{-1}(\varvec{x}_i-{\varvec{\mu }}_{\varvec{Y}|g})]. \end{aligned}$$

After initialization, the EM algorithm proceeds iterating the following two steps until convergence.

E-Step. The E-step requires the calculation of the conditional expectation of (22). Thus, we first need to calculate

$$\begin{aligned} \hat{z}_{ig}=\frac{\hat{\pi }_g f\left( \varvec{y}_i|\varvec{x}_i;\hat{{\varvec{\theta }}}_{\varvec{Y}|g}\right) f\left( \varvec{x}_i;\hat{{\varvec{\theta }}}_{\varvec{X}|g}\right) }{\displaystyle \sum _{h=1}^G\hat{\pi }_h f\left( \varvec{y}_i|\varvec{x}_i;\hat{{\varvec{\theta }}}_{\varvec{Y}|h}\right) f\left( \varvec{x}_i;\hat{{\varvec{\theta }}}_{\varvec{X}|h}\right) }, \end{aligned}$$

which corresponds to the posterior probability that the unlabeled observation $\left( \varvec{X}_{i}',\varvec{Y}_{i}'\right) '$ belongs to the gth component of the CWM. In addition, if the distribution of $\varvec{X}$ in component g, $g=1,\ldots ,G$, is skewed, the following values need to be updated:

$$\begin{aligned} \begin{aligned} \hat{l}_{ig\varvec{X}}&{:}{=}\, \mathbb {E}[W_{ig\varvec{X}}|z_{ig}=1,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{X}}|g}],\\ \hat{m}_{ig\varvec{X}}&{:}{=}\, \mathbb {E}[1/W_{ig\varvec{X}}|z_{ig}=1,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{X}}|g}],\\ \hat{n}_{ig\varvec{X}}&{:}{=}\, \mathbb {E}[\log (W_{ig\varvec{X}})|z_{ig}=1,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{X}}|g}].\\ \end{aligned} \end{aligned}$$

If the distribution of $\varvec{Y}|\varvec{x}$ in component g, $g=1,\ldots ,G$, is skewed, then the following values are also updated:

$$\begin{aligned} \begin{aligned} \hat{l}_{ig\varvec{Y}}&{:}{=}\, \mathbb {E}[W_{ig\varvec{Y}}|z_{ig}=1,\varvec{y}_i,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{Y}}|g}],\\ \hat{m}_{ig\varvec{Y}}&{:}{=}\, \mathbb {E}[1/W_{ig\varvec{Y}}|z_{ig}=1,\varvec{y}_i,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{Y}}|g}],\\ \hat{n}_{ig\varvec{Y}}&{:}{=}\, \mathbb {E}[\log (W_{ig\varvec{Y}})|z_{ig}=1,\varvec{y}_i,\varvec{x}_i,\hat{{\varvec{\phi }}}_{W_{\varvec{Y}}|g}].\\ \end{aligned} \end{aligned}$$

These updates depend on which of the skewed distributions is considered. However, as shown in Sect. 2.2, the conditional latent variables are all GIG distributed. Therefore, all of the required expectations can be calculated using (2)–(4).

M-Step. The M-step involves the maximization of the conditional expectation of the complete-data log-likelihood, allowing all parameters to be updated. Specifically, the update for $\pi _g$ is

$$\begin{aligned} \hat{\pi }_g = \frac{1}{N}\sum _{i=1}^N\hat{z}_{ig}. \end{aligned}$$

The parameters related to the distribution of $\varvec{X}$ in component g, $g=1,\ldots ,G$, are updated as follows. For skewed distributions, we have the following updates for ${\varvec{\mu }}_{\varvec{X}|g}$ and $\varvec{\alpha }_{\varvec{X}|g}$:

$$\begin{aligned} \hat{{\varvec{\mu }}}_{\varvec{X}|g}=\frac{\displaystyle \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i\left( \overline{l}_{\varvec{X}|g} \hat{m}_{ig\varvec{X}}-1\right) }{\displaystyle \sum _{i=1}^N\hat{z}_{ig}\overline{l}_{\varvec{X}|g} \hat{m}_{ig\varvec{X}}-T_g}\quad \text {and}\quad \hat{{\varvec{\alpha }}}_{\varvec{X}|g}=\frac{\displaystyle \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i\left( \overline{m}_{\varvec{X}|g}-\hat{m}_{ig\varvec{X}}\right) }{\displaystyle \sum _{i=1}^N\hat{z}_{ig}\overline{l}_{\varvec{X}|g} \hat{m}_{ig\varvec{X}}-T_g}, \end{aligned}$$

where $T_g=\sum _{i=1}^N\hat{z}_{ig}$, $\overline{l}_{\varvec{X}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{l}_{ig\varvec{X}}$ and $\overline{m}_{\varvec{X}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{X}}$. The update for $\varvec{\Sigma }_{\varvec{X}|g}$ is

$$\begin{aligned} \hat{\varvec{\Sigma }}_{\varvec{X}|g} =&\frac{1}{T_g}\sum _{i=1}^N\hat{z}_{ig}\big [\hat{m}_{ig\varvec{X}}(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})'\\&-(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})\hat{\varvec{\alpha }}_{\varvec{X}|g}'-\hat{\varvec{\alpha }}_{\varvec{X}|g}(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})' \\&+ \hat{l}_{ig\varvec{X}}\hat{\varvec{\alpha }}_{\varvec{X}|g}\hat{\varvec{\alpha }}_{\varvec{X}|g}'\big ]. \end{aligned}$$

Instead, for the normal distribution, we have

$$\begin{aligned} \hat{{\varvec{\mu }}}_{\varvec{X}|g}=\frac{1}{T_g}\sum _{g=1}^G\hat{z}_{ig}\varvec{x}_i, \qquad \hat{\varvec{\Sigma }}_{\varvec{X}|g}=\frac{1}{T_g}\sum _{g=1}^G\hat{z}_{ig}(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})(\varvec{x}_i-\hat{{\varvec{\mu }}}_{\varvec{X}|g})'. \end{aligned}$$

The parameters related to the distribution of $\varvec{Y}|\varvec{x}$ in component g, $g=1,\ldots ,G$, are updated as follows. For skewed distributions the updates for $\varvec{B}_g$ and $\varvec{\alpha }_{\varvec{Y}|g}$ are

$$\begin{aligned} \hat{\varvec{B}}_g=\varvec{P}_g^{-1}\varvec{R}_g \quad \text {and}\quad \hat{\varvec{\alpha }}_{\varvec{Y}|g}=\frac{1}{T_g\overline{l}_{\varvec{Y}|g}}\left( \sum _{i=1}^N\hat{z}_{ig}\varvec{y}_i-\varvec{R}_g'\varvec{P}_g^{-1}\sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i^*\right) , \end{aligned}$$

where

$$\begin{aligned} \varvec{P}_g=\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{Y}}\varvec{x}_i^*{\varvec{x}_i^*}'-\frac{1}{T_g\overline{l}_{\varvec{Y}|g}}\left( \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i^*\right) \left( \sum _{i=1}^N\hat{z}_{ig}{\varvec{x}_i^*}'\right) \end{aligned}$$

and

$$\begin{aligned} \varvec{R}_g=\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{Y}}\varvec{x}_i^*{\varvec{y}_i}'-\frac{1}{T_g\overline{l}_{\varvec{Y}|g}}\left( \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i^*\right) \left( \sum _{i=1}^N\hat{z}_{ig}{\varvec{y}_i}'\right) , \end{aligned}$$

with $\overline{l}_{\varvec{Y}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{l}_{ig\varvec{Y}}$. The update for $\varvec{\Sigma }_{\varvec{Y}|g}$ is

$$\begin{aligned} \hat{\varvec{\Sigma }}_{\varvec{Y}|g}= & {} \frac{1}{T_g}\sum _{i=1}^N\hat{z}_{ig}\Big [\hat{m}_{ig\varvec{Y}}\left( \varvec{y}-\hat{\varvec{B}}_g'\varvec{x}_i^*\right) \left( \varvec{y}-\hat{\varvec{B}}_g'\varvec{x}_i^*\right) '\\&-\left( \varvec{y}-\hat{\varvec{B}}_g'\varvec{x}_i^*\right) \hat{\varvec{\alpha }}_{\varvec{Y}|g}'-\hat{\varvec{\alpha }}_{\varvec{Y}|g}\left( \varvec{y}-\hat{\varvec{B}}_g'\varvec{x}_i^*\right) ' + \hat{l}_{ig\varvec{Y}}\hat{\varvec{\alpha }}_{\varvec{Y}|g}\hat{\varvec{\alpha }}_{\varvec{Y}|g}'\Big ]. \end{aligned}$$

Conversely, in the case of a a multivariate normal distribution, the updates for $\varvec{B}_g$ and $\varvec{\Sigma }_{\varvec{Y}|g}$ are

$$\begin{aligned}&\hat{\varvec{B}}_g=\left( \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i^*{\varvec{x}_i^*}'\right) ^{-1}\left( \sum _{i=1}^N\hat{z}_{ig}\varvec{x}_i^*{\varvec{y}_i}'\right) \quad \text {and} \\&\hat{\varvec{\Sigma }}_{\varvec{Y}|g}=\frac{1}{T_g}\sum _{i=1}^N\hat{z}_{ig}(\varvec{y}_i-\hat{\varvec{B}}_g'\varvec{x}_i)(\varvec{y}_i-\hat{\varvec{B}}_g'\varvec{x}_i)'. \end{aligned}$$

Finally, if either $\varvec{X}$ or $\varvec{Y}|\varvec{x}$ in component g, $g=1,\ldots ,G$, follows one of the skewed distributions, then there are the additional tailedness and, in the case of the GH distribution, the index parameters that need to be updated. The updates for each distribution are now given.

1.1 Skew-t distribution

In the case of the ST distribution, we need to update the degrees of freedom $\nu _g$. This update cannot be obtained in closed form, and thus needs to be performed numerically. For the covariates the update for $\nu _{\varvec{X}|g}$ is obtained by solving the equation

$$\begin{aligned} \log \left( \frac{\nu _{\varvec{X}|g}}{2}\right) +1-\varphi \left( \frac{\nu _{\varvec{X}|g}}{2}\right) -\frac{1}{T_g}\sum _{i=1}^N\hat{z}_{ig}(\hat{m}_{ig\varvec{X}}+\hat{n}_{ig\varvec{X}})=0, \end{aligned}$$

(23)

where $\varphi (\cdot )$ denotes the digamma function. When the responses are considered, the update for $\nu _{\varvec{Y}|g}$ is obtained via (23), after the replacement of $\nu _{\varvec{X}|g}$, $\hat{m}_{ig\varvec{X}}$ and $\hat{n}_{ig\varvec{X}}$ with $\nu _{\varvec{Y}|g}$, $\hat{m}_{ig\varvec{Y}}$ and $\hat{n}_{ig\varvec{Y}}$, respectively.

1.2 Generalized hyperbolic distribution

For the GH distribution, we would update $\lambda _g$ and $\omega _g$. These updates are derived from Browne and McNicholas (2015), and rely on the log convexity of $K_{s}(t)$ in both s and t (Baricz 2010). For notational purposes in this section, the superscript “prev” is used to distinguish the previous update from the current one. The resulting updates, when $\varvec{X}$ is considered, are

$$\begin{aligned} \hat{\lambda }_{\varvec{X}|g}&=\overline{n}_{\varvec{X}|g}\hat{\lambda }_{\varvec{X}|g}^{\text {prev}}\left[ \left. \frac{\partial }{\partial s}\log (K_{s}(\hat{\omega }_{\varvec{X}|g}^{\text {prev}}))\right| _{s=\hat{\lambda }_{\varvec{X}|g}^{\text {prev}}}\right] ^{-1}, \end{aligned}$$

(24)

$$\begin{aligned} \hat{\omega }_{\varvec{X}|g}&=\hat{\omega }_{\varvec{X}|g}^{\text {prev}}-\left[ \left. \frac{\partial }{\partial s}q(\hat{\lambda }_{\varvec{X}|g},s)\right| _{s=\hat{\omega }_{\varvec{X}|g}^{\text {prev}}}\right] \left[ \left. \frac{\partial ^2}{\partial s^2}q(\hat{\lambda }_{\varvec{X}|g},s)\right| _{s=\hat{\omega }_{\varvec{X}|g}^{\text {prev}}}\right] ^{-1}, \end{aligned}$$

(25)

where the derivative in (24) is calculated numerically,

$$\begin{aligned} q(\lambda _{\varvec{X}|g},\omega _{\varvec{X}|g})=\sum _{i=1}^Nz_{ig}\left[ \log (K_{\lambda _{\varvec{X}|g}}(\omega _{\varvec{X}|g}))-\lambda _{\varvec{X}|g}\overline{n}_{\varvec{X}|g}-\frac{1}{2}\omega _{\varvec{X}|g}\left( \overline{l}_{\varvec{X}|g}+\overline{m}_{\varvec{X}|g}\right) \right] , \end{aligned}$$

and $\overline{n}_{\varvec{X}|g}=({1}/{T_g})\sum _{i=1}^N\hat{z}_{ig}\hat{n}_{ig\varvec{X}}$. When $\varvec{Y}$ is considered, $\lambda _{\varvec{X}|g}$, $\omega _{\varvec{X}|g}$, $\overline{l}_{\varvec{X}|g}$, $\overline{m}_{\varvec{X}|g}$, and $\overline{n}_{\varvec{X}|g}$ are replaced with $\lambda _{\varvec{Y}|g}$, $\omega _{\varvec{Y}|g}$, $\overline{l}_{\varvec{Y}|g}$, $\overline{m}_{\varvec{Y}|g}$, and $\overline{n}_{\varvec{Y}|g}$, respectively, where $\overline{m}_{\varvec{Y}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{Y}}$ and $\overline{n}_{\varvec{Y}|g}=({1}/{T_g})\sum _{i=1}^N\hat{z}_{ig}\hat{n}_{ig\varvec{Y}}$.

1.3 Variance-gamma distribution

For the VG distribution, the update for $\psi _g$ cannot be obtained in closed form. When $\varvec{X}$ is considered, this update is obtained by solving the equation

$$\begin{aligned} \log \psi _{\varvec{X}|g}+1-\varphi (\psi _{\varvec{X}|g})+\overline{n}_{\varvec{X}|g}-\overline{l}_{\varvec{X}|g}=0. \end{aligned}$$

(26)

Clearly, when $\varvec{Y}$ is considered, $\psi _{\varvec{X}|g}$, $\overline{n}_{\varvec{X}|g}$ and $\overline{l}_{\varvec{X}|g}$ are replaced with $\psi _{\varvec{Y}|g}$, $\overline{n}_{\varvec{Y}|g}$ and $\overline{l}_{\varvec{Y}|g}$, respectively.

1.4 Normal inverse Gaussian distribution

The NIG distribution is the only having a closed form expression for its tailedness parameter. In detail, when we consider the covariates, the update of $\kappa _{\varvec{X}|g}$ is

$$\begin{aligned} \hat{\kappa }_{\varvec{X}|g}=\frac{1}{\overline{l}_{\varvec{X}|g}}. \end{aligned}$$

If the responses are considered, we replace $\kappa _{\varvec{X}|g}$ and $\overline{l}_{\varvec{X}|g}$ with $\kappa _{\varvec{Y}|g}$ and $\overline{l}_{\varvec{Y}|g}$, respectively.

1.5 Initialization of the algorithm

To initialize the EM algorithm, we followed the approach discussed in Dang et al. (2017). Specifically, the $z_{ig}$ are initialized in two different ways: 10 times using a random soft initialization and once with a k-means (hard) initialization. Therefore, for each G, the algorithms are run 11 times until convergence, and the solution producing the highest log-likelihood value is chosen. Notice that, for the k-means initialization, the initial $z_{ig}$ are selected from the best k-means clustering results from 10 random starting values, and it is implemented by using the kmeans() function of the R statistical software (R Core Team 2019).

Proof of Theorem 3.1

Proof

Suppose that

$$\begin{aligned} p_{\text {GH-GH}}\left( \varvec{x},\varvec{y}; \varvec{\vartheta }\right) =p_{\text {GH-GH}}\left( \varvec{x},\varvec{y}; \widetilde{\varvec{\vartheta }}\right) . \end{aligned}$$

(27)

Integrating out $\varvec{y}$ from each side of (27) yields an equality on the marginal distribution of $\varvec{X}$, i.e.,

$$\begin{aligned} \sum _{g=1}^G\pi _gf_{\text {GH}}\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right)&= \sum _{j=1}^{\widetilde{G}}\widetilde{\pi }_jf_{\text {GH}}\left( \varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{X}|j}\right) \nonumber \\ p_{\text {GH}}\left( \varvec{x}; \varvec{\pi },{\varvec{\theta }}_{\varvec{X}}\right)&= p_{\text {GH}}\left( \varvec{x}; \widetilde{\varvec{\pi }},\widetilde{{\varvec{\theta }}}_{\varvec{X}}\right) , \end{aligned}$$

(28)

where ${\varvec{\theta }}_{\varvec{X}}=\left\{ {\varvec{\theta }}_{\varvec{X}|g}; \, g=1,\ldots ,G\right\} $, $\widetilde{{\varvec{\theta }}}_{\varvec{X}}=\left\{ \widetilde{{\varvec{\theta }}}_{\varvec{X}|j}; \, j=1,\ldots ,\widetilde{G}\right\} $, $\varvec{\pi }=\{\pi _g; \, g=1,\ldots ,G \}$ and $\widetilde{\varvec{\pi }}=\{\widetilde{\pi }_j; \, j=1,\ldots ,\widetilde{G}\}$. Dividing the left-hand (right-hand) side of (27) by the left-hand (right-hand) side of (28) leads to

$$\begin{aligned} \sum _{g=1}^{\widetilde{G}} \frac{\pi _gf_{\text {GH}}\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) }{p_{\text {GH}}\left( \varvec{x}; \varvec{\pi },{\varvec{\theta }}_{\varvec{X}}\right) } f_{\text {GH}}\left( \varvec{y}|\varvec{x};{\varvec{\theta }}_{\varvec{Y}|g}\right)&= \sum _{j=1}^{\widetilde{G}} \frac{\widetilde{\pi }_jf_{\text {GH}}\left( \varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{X}|j}\right) }{p_{\text {GH}}\left( \varvec{x}; \widetilde{\varvec{\pi }}, \widetilde{{\varvec{\theta }}}_{\varvec{X}}\right) } f_{\text {GH}}\left( \varvec{y}|\varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{Y}|j}\right) \nonumber \\ p_{\text {GH}}\left( \varvec{y}|\varvec{x};\varvec{\vartheta }\right)&= p_{\text {GH}}\left( \varvec{y}|\varvec{x};\widetilde{\varvec{\vartheta }}\right) . \end{aligned}$$

(29)

For each fixed value of $\varvec{x}$, $p_{\text {GH}}\left( \varvec{y}|\varvec{x};\varvec{\vartheta }\right) $ and $p_{\text {GH}}\left( \varvec{y}|\varvec{x};\widetilde{\varvec{\vartheta }}\right) $ are mixtures of $d_{\varvec{Y}}$-variate GH distributions for $\varvec{Y}$ (see Browne and McNicholas 2015).

Now, recall from Sect. 3.1 that the location parameter $\varvec{\mu }_{\varvec{Y}|g}$ of the $d_{\varvec{Y}}$-variate GH distribution of $\varvec{Y}$ in the gth mixture component is related to the covariates $\varvec{X}$, through the regression coefficients $\varvec{B}_g$, by the relation $\varvec{B}'_g \varvec{x}^*$, $g=1, \ldots , G$. Define the set of all covariate points $\varvec{x}$ which can be used to distinct different regression coefficients $\varvec{B}_g$ by different values of $\varvec{B}'_g \varvec{x}^*$, i.e.

$$\begin{aligned} \mathcal {X} := \Bigl \{ \varvec{x}\in IR^{d_{\varvec{X}}}:&\forall g,s \in \{1, \ldots , G \} \text { and } j,t \in \{1, \ldots , \widetilde{G}\},\Bigr . \nonumber \\&\Bigl . \varvec{B}'_g \varvec{x}^*=\varvec{B}'_s\varvec{x}^* \ \Rightarrow \ \varvec{B}_g=\varvec{B}_s, \Bigr . \nonumber \\&\Bigl . \varvec{B}'_g \varvec{x}^*=\widetilde{\varvec{B}}'_j\varvec{x}^* \ \Rightarrow \ \varvec{B}_g=\widetilde{\varvec{B}}_j , \Bigr . \nonumber \\&\Bigl . \widetilde{\varvec{B}}'_j\varvec{x}^*=\widetilde{\varvec{B}}'_t\varvec{x}^* \ \Rightarrow \ \widetilde{\varvec{B}}_j=\widetilde{\varvec{B}}_t\Bigr \} . \end{aligned}$$

Note that $\mathcal {X}$ is complement of a finite union of hyperplanes of $IR^{d_{\varvec{X}}}$. Therefore,

$$\begin{aligned} \int _{\mathcal {X}}p_{\text {GH}}\left( \varvec{x}; \varvec{\pi },{\varvec{\theta }}_{\varvec{X}}\right) d\varvec{x}=1. \end{aligned}$$

For $\varvec{x}\in \mathcal {X}$, all $\left\{ \varvec{B}'_g \varvec{x}^*,\varvec{\Sigma }_{\varvec{Y}|g},\varvec{\alpha }_{\varvec{Y}|g},\lambda _{\varvec{Y}|g},\omega _{\varvec{Y}|g}\right\} $, $g=1,\ldots ,G$, are pairwise distinct because all $\left\{ \varvec{B}_g,\varvec{\Sigma }_{\varvec{Y}|g},\varvec{\alpha }_{\varvec{Y}|g},\lambda _{\varvec{Y}|g},\omega _{\varvec{Y}|g}\right\} $, $g=1,\ldots ,G$, are pairwise distinct for the hypothesis of the theorem. As mentioned above, for each fixed value of $\varvec{x}$, $p_{\text {GH}}\left( \varvec{y}|\varvec{x};\varvec{\vartheta }\right) $ is a mixture of $d_{\varvec{Y}}$-variate GH distributions, which being identifiable (Browne and McNicholas 2015) implies that $G=\widetilde{G}$ and that, for each $g\in \left\{ 1,\ldots ,G\right\} $, there exists a $j\in \left\{ 1,\ldots ,G\right\} $ such that

$$\begin{aligned} \varvec{B}_g=\widetilde{\varvec{B}}_j, \quad \varvec{\Sigma }_{\varvec{Y}|g}=\widetilde{\varvec{\Sigma }}_{\varvec{Y}|j}, \quad \varvec{\alpha }_{\varvec{Y}|g}=\widetilde{\varvec{\alpha }}_{\varvec{Y}|j}, \quad \lambda _{\varvec{Y}|g}=\widetilde{\lambda }_{\varvec{Y}|j}, \quad \omega _{\varvec{Y}|g}=\widetilde{\omega }_{\varvec{Y}|j} \end{aligned}$$

and

$$\begin{aligned} \frac{\pi _gf_{\text {GH}}\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) }{p_{\text {GH}}\left( \varvec{x}; \varvec{\pi },{\varvec{\theta }}_{\varvec{X}}\right) } = \frac{\widetilde{\pi }_jf_{\text {GH}}\left( \varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{X}|j}\right) }{p_{\text {GH}}\left( \varvec{x}; \widetilde{\varvec{\pi }},\widetilde{{\varvec{\theta }}}_{\varvec{X}}\right) }. \end{aligned}$$

(30)

Now, based on (28), the equality in (30) simplifies to

$$\begin{aligned} \pi _gf_{\text {GH}}\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) =\widetilde{\pi }_jf_{\text {GH}}\left( \varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{X}|j}\right) ,\quad \forall \ \varvec{x}\in \mathcal {X}. \end{aligned}$$

(31)

Integrating (31) over $\varvec{x}\in \mathcal {X}$ yields $\pi _g=\widetilde{\pi }_j$. Therefore, the condition (31) further simplifies as

$$\begin{aligned} f_{\text {GH}}\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) = f_{\text {GH}}\left( \varvec{x};\widetilde{{\varvec{\theta }}}_{\varvec{X}|j}\right) ,\quad \forall \ \varvec{x}\in \mathcal {X}. \end{aligned}$$

The equalities $\varvec{\mu }_{\varvec{X}|g}=\widetilde{\varvec{\mu }}_{\varvec{X}|j}$, $\varvec{\Sigma }_{\varvec{X}|g}=\widetilde{\varvec{\Sigma }}_{\varvec{X}|j}$, $\varvec{\alpha }_{\varvec{X}|g}=\widetilde{\varvec{\alpha }}_{\varvec{X}|j}$, $\lambda _{\varvec{X}|g}=\widetilde{\lambda }_{\varvec{X}|j}$, and $\omega _{\varvec{X}|g}=\widetilde{\omega }_{\varvec{X}|j}$ simply arise from the identifiability of the $d_{\varvec{X}}$-variate GH distribution, and this completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gallaugher, M.P.B., Tomarchio, S.D., McNicholas, P.D. et al. Multivariate cluster weighted models using skewed distributions. Adv Data Anal Classif 16, 93–124 (2022). https://doi.org/10.1007/s11634-021-00480-5

Download citation

Received: 31 January 2021
Revised: 02 October 2021
Accepted: 23 October 2021
Published: 15 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11634-021-00480-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate cluster weighted models using skewed distributions

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Change history

09 December 2021

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Technical details on the ST distribution

Parameter estimation

1.1 Skew-t distribution

1.2 Generalized hyperbolic distribution

1.3 Variance-gamma distribution

1.4 Normal inverse Gaussian distribution

1.5 Initialization of the algorithm

Proof of Theorem 3.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Multivariate cluster weighted models using skewed distributions

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Change history

09 December 2021

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Technical details on the ST distribution

Parameter estimation

1.1 Skew-t distribution

1.2 Generalized hyperbolic distribution

1.3 Variance-gamma distribution

1.4 Normal inverse Gaussian distribution

1.5 Initialization of the algorithm

Proof of Theorem 3.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation