Abstract
Much work has been done in the area of the cluster weighted model (CWM), which extends the finite mixture of regression model to include modelling of the covariates. Although many types of distributions have been considered for both the response(s) and covariates, to our knowledge skewed distributions have not yet been considered in this paradigm. Herein, a family of 24 novel CWMs is considered which allows both the responses and covariates to be modelled using one of four skewed distributions (the generalized hyberbolic and three of its skewed special cases, i.e., the skew-t, the variance-gamma and the normal-inverse Gaussian distributions) or the normal distribution. Parameter estimation is performed using the expectation-maximization algorithm and both simulated and real data are used for illustration.
Similar content being viewed by others
Change history
09 December 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11634-021-00487-y
References
Aas K, Hobæk Haff I (2005) NIG and skew student’s t: two special cases of the generalised hyperbolic distribution. Appl Res Dev Res Rep
Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373
Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate \(t\)-distributions: the \(t\) EIGEN family. Stat Comput 22(5):1021–1029
Azzalini A (2020) The R package sn: the skew-normal and related distributions such as the skew-\(t\) (version 1.6-1). Università di Padova, Italia. http://azzalini.stat.unipd.it/SN
Baricz Á (2010) Turán type inequalities for some probability density functions. Stud Sci Math Hung 47(2):175–189
Berta P, Ingrassia S, Punzo A, Vittadini G (2016) Multilevel cluster-weighted models for the evaluation of hospitals. METRON 74(3):275–292
Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43(2):176–198
Chamroukhi F (2017) Skew t mixture of experts. Neurocomputing 266:390–408
Chen L, Pourahmadi M, Maadooliat M (2014) Regularized multivariate regression models with skew-t error distributions. J Stat Plan Inference 149:125–139
Crawford SL (1994) An application of the Laplace method to finite mixture distributions. J Am Stat Assoc 89(425):259–267
Dang UJ, Browne RP, McNicholas PD (2015) Mixtures of multivariate power exponential distributions. Biometrics 71(4):1081–1089
Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34
Dang UJ, Gallaugher MP, Browne RP, McNicholas PD (2019) Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions. arXiv preprint arXiv:1907.01938
Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83(401):173–178
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282
Di Mari R, Bakk Z, Punzo A (2020) A random-covariate approach for distal outcome prediction with latent class analysis. Struct Equ Model 27(3):351–368
Doğru FZ, Arslan O (2017) Parameter estimation for mixtures of skew Laplace normal distributions and application in mixture regression modeling. Commun Stat Theory Methods 46(21):10879–10896
Ferreira CS, Lachos VH, Bolfarine H (2015) Inference and diagnostics in skew scale mixtures of normal regression models. J Stat Comput Simul 85(3):517–537
Frimpong EY, Gage TB, Stratton H (2008) Identifiability of bivariate mixtures: an application to infant mortality models. PhD thesis, Citeseer
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11(2):317–336
Galimberti G, Soffritti G (2020) A note on the consistency of the maximum likelihood estimator under multivariate linear cluster-weighted models. Stat Probab Lett 157:1089630
Gallaugher MPB, McNicholas PD (2017) A matrix variate skew-t distribution. Stat 6(1):160–170
Gallaugher MPB, McNicholas PD (2019) Three skewed matrix variate distributions. Statist Probab Lett 145:103–109
Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24
Göncü A, Yang H (2016) Variance-gamma and normal-inverse Gaussian models: goodness-of-fit to Chinese high-frequency index returns. North Am J Econ Finance 36:279–292
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Hung W-L, Chang-Chien S-J (2017) Learning-based EM algorithm for normal-inverse Gaussian mixture model with application to extrasolar planets. J Appl Stat 44(6):978–999
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113
Ingrassia S, Punzo A (2016) Decision boundaries for mixtures of regressions. J Korean Stat Soc 45(2):295–306
Jorgensen B (2012) Statistical properties of the generalized inverse Gaussian distribution, vol 9. Springer, New York
Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83
Kim N-H, Browne R (2019) Subspace clustering for the finite mixture of generalized hyperbolic distributions. Adv Data Anal Classif 13(3):641–661
Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24:181–202
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100(2):257–265
Lin TI (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20(3):343–356
Lin T, McNicholas PD, Hsiu JH (2014) Capturing patterns via parsimonious t mixture models. Statist Probab Lett 88:80–87
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
McNeil AJ, Frey R, Embrechts P (2005) Quantitative risk management: concepts, techniques and tools. Princeton University Press, Princeton
McNicholas PD (2016a) Mixture model-based classification. Chapman & Hall/CRC Press, Boca Raton
McNicholas PD (2016b) Model-based clustering. J Classif 33(3):331–373
McNicholas SM, McNicholas PD, Browne RP (2017) A mixture of variance-gamma factor analyzers. In: Ahmed SE (ed) Big and complex data analysis, contributions to statistics. Springer, Cham, pp 369–385
Murphy K, Murphy TB (2020a) Gaussian parsimonious clustering models with covariates and a noise component. Adv Data Anal Classif 14:293–325
Murphy K, Murphy TB (2020b) MoEClust: Gaussian parsimonious clustering models with covariates and a noise component. R package version 1.3.3. https://cran.r-project.org/package=MoEClust
Murray PM, Browne RB, McNicholas PD (2014a) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335
Murray PM, McNicholas PD, Browne RB (2014b) A mixture of common skew-\(t\) factor analyzers. Stat 3(1):68–82
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
Počuča N, Jevtić P, McNicholas PD, Miljkovic T (2020) Modeling frequency and severity of claims with the zero-inflated generalized cluster-weighted models. Math Econ Insur
Punzo A (2014) Flexible mixture modelling with the polynomial Gaussian cluster-weighted model. Stat Model 14(3):257–291
Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis, studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 201–209
Punzo A, Ingrassia S (2016) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Statist 31(3):989–1013
Punzo A, Bagnato L (2021) The multivariate tail-inflated normal distribution and its application in finance. J Stat Comput Simul 91(1):1–36
Punzo A, Ingrassia S, Maruotti A (2018) Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population. Stat Med 37(19):2797–2808
Punzo A, Ingrassia S, Maruotti A (2021) Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. Stat Pap 62(3):1519–1555
Pyne S, Hu X, Wang K, Rossin E, Lin T-I, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA et al (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci 106(21):8519–8524
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Soffritti G, Galimberti G (2011) Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21(4):523–536
Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat Simul Comput 41(4):510–523
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Tiedeman DV (1955) On the study of types. In: Sells SB (ed) Symposium on pattern analysis. Air University, U.S.A.F. School of Aviation Medicine, Randolph Field, Texas
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
Tomarchio SD, McNicholas PD, Punzo A (2021) Matrix normal cluster-weighted models. J Classif 38(3)
Tortora C, Browne RP, ElSherbiny A, Franczak BC, McNicholas PD (2021) Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. J Stat Softw 98(3):1–24
Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Statist Probab Lett 82(6):1169–1174
Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210
Wang K, Ng SK, McLachlan GJ (2009) Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Digital image computing: techniques and applications. IEEE, pp 526–531
Wolfe JH (1965) A computer program for the maximum likelihood analysis of types, technical bulletin. U.S, Naval Personnel Research Activity, pp. 65–15
Zarei S, Mohammadpour A, Ingrassia S, Punzo A (2019) On the use of the sub-Gaussian \(\alpha \)-stable distribution in the cluster-weighted model. Iran J Sci Technol Trans A Sci 43(3):1059–1069
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: “the error in the line after equation (5) has been corrected in original article”.
Appendices
Appendix
Technical details on the ST distribution
In the fashion of Kim and Browne (2019), it is possible to show that the pdf in (12) can be obtained from the pdf in (8), by forcing \(\lambda \) and \(\omega \) to be a convenient function of \(\nu \), by letting \(\varvec{\Sigma }\) and \(\varvec{\alpha }\) to become large in a controlled way, and by letting \(\omega \) to become small in a controlled way. Specifically, let
where \(\gamma >0\) is a scaling factor. By substituting these parameter values into (8) we obtain
which after some manipulation becomes
Now, letting \(\gamma \rightarrow 0\) and by using the following asymptotic relation
we obtain
which is the density reported in (12).
Parameter estimation
Let \(\left( \varvec{x}_{1}',\varvec{y}_{1}'\right) ',\ldots ,\left( \varvec{x}_{N}',\varvec{y}_{N}'\right) '\) be a random sample of N independent observations from (15). In the context of the EM algorithm, the random sample is considered incomplete. Specifically, we have two sources of incompleteness. The first source arises from the fact that, for each observation, we do not know its component membership; to govern this source, we use an indicator vector \(\varvec{z}_i=\left( z_{i1},\ldots ,z_{iG}\right) \), where \(z_{ig}=1\) if observation i is in group g, and \(z_{ig}=0\) otherwise. The second source arises if \(f\left( \varvec{y}|\varvec{x};{\varvec{\theta }}_{\varvec{Y}|g}\right) \) or \(f\left( \varvec{x};{\varvec{\theta }}_{\varvec{X}|g}\right) \) are skewed; to govern this source, we need the latent variables \(W_{\varvec{Y}|g}\) and \(W_{\varvec{X}|g}\) introduced in (17).
Based on this source of incompleteness, we can write the complete-data log-likelihood in the following way
where \(\varvec{\pi }=(\pi _1,\ldots ,\pi _G)'\), and
If \(\varvec{X}\) in component g, \(g=1,\ldots ,G\), follows one of the four skewed distributions,
where \(h(w_{ig\varvec{X}};{\varvec{\phi }}_{W_{\varvec{X}}|g})\) is the appropriate pdf for \(W_{ig\varvec{X}}\) discussed in Sect. 2, with parameters notated as \({\varvec{\phi }}_{W_{\varvec{X}}|g}\), while \(C_{\varvec{X}}\) is constant with respect to the parameters. On the other hand, if \(\varvec{X}\) in component g, \(g=1,\ldots ,G\), is normally distributed then
Similarly, if \(\varvec{Y}|\varvec{x}\) in component g, \(g=1,\ldots ,G\), is distributed according to one of the four skewed distributions,
where \(h(w_{ig\varvec{Y}};{\varvec{\phi }}_{W_{\varvec{Y}}|g})\) is the appropriate pdf for \(W_{ig\varvec{Y}}\) discussed in Sect. 2, with parameters notationally compacted as \({\varvec{\phi }}_{W_{\varvec{Y}}|g}\), while \(C_{\varvec{Y}}\) is constant with respect to the parameters. Conversely, if \(\varvec{Y}|\varvec{x}\) in component g, \(g=1,\ldots ,G\), is normally distributed,
After initialization, the EM algorithm proceeds iterating the following two steps until convergence.
E-Step. The E-step requires the calculation of the conditional expectation of (22). Thus, we first need to calculate
which corresponds to the posterior probability that the unlabeled observation \(\left( \varvec{X}_{i}',\varvec{Y}_{i}'\right) '\) belongs to the gth component of the CWM. In addition, if the distribution of \(\varvec{X}\) in component g, \(g=1,\ldots ,G\), is skewed, the following values need to be updated:
If the distribution of \(\varvec{Y}|\varvec{x}\) in component g, \(g=1,\ldots ,G\), is skewed, then the following values are also updated:
These updates depend on which of the skewed distributions is considered. However, as shown in Sect. 2.2, the conditional latent variables are all GIG distributed. Therefore, all of the required expectations can be calculated using (2)–(4).
M-Step. The M-step involves the maximization of the conditional expectation of the complete-data log-likelihood, allowing all parameters to be updated. Specifically, the update for \(\pi _g\) is
The parameters related to the distribution of \(\varvec{X}\) in component g, \(g=1,\ldots ,G\), are updated as follows. For skewed distributions, we have the following updates for \({\varvec{\mu }}_{\varvec{X}|g}\) and \(\varvec{\alpha }_{\varvec{X}|g}\):
where \(T_g=\sum _{i=1}^N\hat{z}_{ig}\), \(\overline{l}_{\varvec{X}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{l}_{ig\varvec{X}}\) and \(\overline{m}_{\varvec{X}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{X}}\). The update for \(\varvec{\Sigma }_{\varvec{X}|g}\) is
Instead, for the normal distribution, we have
The parameters related to the distribution of \(\varvec{Y}|\varvec{x}\) in component g, \(g=1,\ldots ,G\), are updated as follows. For skewed distributions the updates for \(\varvec{B}_g\) and \(\varvec{\alpha }_{\varvec{Y}|g}\) are
where
and
with \(\overline{l}_{\varvec{Y}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{l}_{ig\varvec{Y}}\). The update for \(\varvec{\Sigma }_{\varvec{Y}|g}\) is
Conversely, in the case of a a multivariate normal distribution, the updates for \(\varvec{B}_g\) and \(\varvec{\Sigma }_{\varvec{Y}|g}\) are
Finally, if either \(\varvec{X}\) or \(\varvec{Y}|\varvec{x}\) in component g, \(g=1,\ldots ,G\), follows one of the skewed distributions, then there are the additional tailedness and, in the case of the GH distribution, the index parameters that need to be updated. The updates for each distribution are now given.
1.1 Skew-t distribution
In the case of the ST distribution, we need to update the degrees of freedom \(\nu _g\). This update cannot be obtained in closed form, and thus needs to be performed numerically. For the covariates the update for \(\nu _{\varvec{X}|g}\) is obtained by solving the equation
where \(\varphi (\cdot )\) denotes the digamma function. When the responses are considered, the update for \(\nu _{\varvec{Y}|g}\) is obtained via (23), after the replacement of \(\nu _{\varvec{X}|g}\), \(\hat{m}_{ig\varvec{X}}\) and \(\hat{n}_{ig\varvec{X}}\) with \(\nu _{\varvec{Y}|g}\), \(\hat{m}_{ig\varvec{Y}}\) and \(\hat{n}_{ig\varvec{Y}}\), respectively.
1.2 Generalized hyperbolic distribution
For the GH distribution, we would update \(\lambda _g\) and \(\omega _g\). These updates are derived from Browne and McNicholas (2015), and rely on the log convexity of \(K_{s}(t)\) in both s and t (Baricz 2010). For notational purposes in this section, the superscript “prev” is used to distinguish the previous update from the current one. The resulting updates, when \(\varvec{X}\) is considered, are
where the derivative in (24) is calculated numerically,
and \(\overline{n}_{\varvec{X}|g}=({1}/{T_g})\sum _{i=1}^N\hat{z}_{ig}\hat{n}_{ig\varvec{X}}\). When \(\varvec{Y}\) is considered, \(\lambda _{\varvec{X}|g}\), \(\omega _{\varvec{X}|g}\), \(\overline{l}_{\varvec{X}|g}\), \(\overline{m}_{\varvec{X}|g}\), and \(\overline{n}_{\varvec{X}|g}\) are replaced with \(\lambda _{\varvec{Y}|g}\), \(\omega _{\varvec{Y}|g}\), \(\overline{l}_{\varvec{Y}|g}\), \(\overline{m}_{\varvec{Y}|g}\), and \(\overline{n}_{\varvec{Y}|g}\), respectively, where \(\overline{m}_{\varvec{Y}|g}=(1/T_g)\sum _{i=1}^N\hat{z}_{ig}\hat{m}_{ig\varvec{Y}}\) and \(\overline{n}_{\varvec{Y}|g}=({1}/{T_g})\sum _{i=1}^N\hat{z}_{ig}\hat{n}_{ig\varvec{Y}}\).
1.3 Variance-gamma distribution
For the VG distribution, the update for \(\psi _g\) cannot be obtained in closed form. When \(\varvec{X}\) is considered, this update is obtained by solving the equation
Clearly, when \(\varvec{Y}\) is considered, \(\psi _{\varvec{X}|g}\), \(\overline{n}_{\varvec{X}|g}\) and \(\overline{l}_{\varvec{X}|g}\) are replaced with \(\psi _{\varvec{Y}|g}\), \(\overline{n}_{\varvec{Y}|g}\) and \(\overline{l}_{\varvec{Y}|g}\), respectively.
1.4 Normal inverse Gaussian distribution
The NIG distribution is the only having a closed form expression for its tailedness parameter. In detail, when we consider the covariates, the update of \(\kappa _{\varvec{X}|g}\) is
If the responses are considered, we replace \(\kappa _{\varvec{X}|g}\) and \(\overline{l}_{\varvec{X}|g}\) with \(\kappa _{\varvec{Y}|g}\) and \(\overline{l}_{\varvec{Y}|g}\), respectively.
1.5 Initialization of the algorithm
To initialize the EM algorithm, we followed the approach discussed in Dang et al. (2017). Specifically, the \(z_{ig}\) are initialized in two different ways: 10 times using a random soft initialization and once with a k-means (hard) initialization. Therefore, for each G, the algorithms are run 11 times until convergence, and the solution producing the highest log-likelihood value is chosen. Notice that, for the k-means initialization, the initial \(z_{ig}\) are selected from the best k-means clustering results from 10 random starting values, and it is implemented by using the kmeans() function of the R statistical software (R Core Team 2019).
Proof of Theorem 3.1
Proof
Suppose that
Integrating out \(\varvec{y}\) from each side of (27) yields an equality on the marginal distribution of \(\varvec{X}\), i.e.,
where \({\varvec{\theta }}_{\varvec{X}}=\left\{ {\varvec{\theta }}_{\varvec{X}|g}; \, g=1,\ldots ,G\right\} \), \(\widetilde{{\varvec{\theta }}}_{\varvec{X}}=\left\{ \widetilde{{\varvec{\theta }}}_{\varvec{X}|j}; \, j=1,\ldots ,\widetilde{G}\right\} \), \(\varvec{\pi }=\{\pi _g; \, g=1,\ldots ,G \}\) and \(\widetilde{\varvec{\pi }}=\{\widetilde{\pi }_j; \, j=1,\ldots ,\widetilde{G}\}\). Dividing the left-hand (right-hand) side of (27) by the left-hand (right-hand) side of (28) leads to
For each fixed value of \(\varvec{x}\), \(p_{\text {GH}}\left( \varvec{y}|\varvec{x};\varvec{\vartheta }\right) \) and \(p_{\text {GH}}\left( \varvec{y}|\varvec{x};\widetilde{\varvec{\vartheta }}\right) \) are mixtures of \(d_{\varvec{Y}}\)-variate GH distributions for \(\varvec{Y}\) (see Browne and McNicholas 2015).
Now, recall from Sect. 3.1 that the location parameter \(\varvec{\mu }_{\varvec{Y}|g}\) of the \(d_{\varvec{Y}}\)-variate GH distribution of \(\varvec{Y}\) in the gth mixture component is related to the covariates \(\varvec{X}\), through the regression coefficients \(\varvec{B}_g\), by the relation \(\varvec{B}'_g \varvec{x}^*\), \(g=1, \ldots , G\). Define the set of all covariate points \(\varvec{x}\) which can be used to distinct different regression coefficients \(\varvec{B}_g\) by different values of \(\varvec{B}'_g \varvec{x}^*\), i.e.
Note that \(\mathcal {X}\) is complement of a finite union of hyperplanes of \(IR^{d_{\varvec{X}}}\). Therefore,
For \(\varvec{x}\in \mathcal {X}\), all \(\left\{ \varvec{B}'_g \varvec{x}^*,\varvec{\Sigma }_{\varvec{Y}|g},\varvec{\alpha }_{\varvec{Y}|g},\lambda _{\varvec{Y}|g},\omega _{\varvec{Y}|g}\right\} \), \(g=1,\ldots ,G\), are pairwise distinct because all \(\left\{ \varvec{B}_g,\varvec{\Sigma }_{\varvec{Y}|g},\varvec{\alpha }_{\varvec{Y}|g},\lambda _{\varvec{Y}|g},\omega _{\varvec{Y}|g}\right\} \), \(g=1,\ldots ,G\), are pairwise distinct for the hypothesis of the theorem. As mentioned above, for each fixed value of \(\varvec{x}\), \(p_{\text {GH}}\left( \varvec{y}|\varvec{x};\varvec{\vartheta }\right) \) is a mixture of \(d_{\varvec{Y}}\)-variate GH distributions, which being identifiable (Browne and McNicholas 2015) implies that \(G=\widetilde{G}\) and that, for each \(g\in \left\{ 1,\ldots ,G\right\} \), there exists a \(j\in \left\{ 1,\ldots ,G\right\} \) such that
and
Now, based on (28), the equality in (30) simplifies to
Integrating (31) over \(\varvec{x}\in \mathcal {X}\) yields \(\pi _g=\widetilde{\pi }_j\). Therefore, the condition (31) further simplifies as
The equalities \(\varvec{\mu }_{\varvec{X}|g}=\widetilde{\varvec{\mu }}_{\varvec{X}|j}\), \(\varvec{\Sigma }_{\varvec{X}|g}=\widetilde{\varvec{\Sigma }}_{\varvec{X}|j}\), \(\varvec{\alpha }_{\varvec{X}|g}=\widetilde{\varvec{\alpha }}_{\varvec{X}|j}\), \(\lambda _{\varvec{X}|g}=\widetilde{\lambda }_{\varvec{X}|j}\), and \(\omega _{\varvec{X}|g}=\widetilde{\omega }_{\varvec{X}|j}\) simply arise from the identifiability of the \(d_{\varvec{X}}\)-variate GH distribution, and this completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Gallaugher, M.P.B., Tomarchio, S.D., McNicholas, P.D. et al. Multivariate cluster weighted models using skewed distributions. Adv Data Anal Classif 16, 93–124 (2022). https://doi.org/10.1007/s11634-021-00480-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-021-00480-5