Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction

  • 258 Accesses

  • 25 Citations

Abstract

Cluster-weighted models represent a convenient approach for model-based clustering, especially when the covariates contribute to defining the cluster-structure of the data. However, applicability may be limited when the number of covariates is high and performance may be affected by noise and outliers. To overcome these problems, common/uncommon \(t\)-factor analyzers for the covariates, and a \(t\)-distribution for the response variable, are here assumed in each mixture component. A family of twenty parsimonious variants of this model is also presented and the alternating expectation-conditional maximization algorithm, for maximum likelihood estimation of the parameters of all models in the family, is described. Artificial and real data show that these models have very good clustering performance and that the algorithm is able to recover the parameters very well.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. Airoldi J, Hoffmann R (1984) Age variation in voles (Microtus californicus, M. ochrogaster) and its significance for systematic studies. Occasional papers of the Museum of Natural History 111, University of Kansas, Lawrence, KS

  2. Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. In: Proceedings of the Royal Society of Edinburgh, vol 46, pp 289–305

  3. Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate \(t\)-factor analyzers. Stat Comput 21(3):361–373

  4. Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27(9):1269–1276

  5. Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309

  6. Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

  7. Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388

  8. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

  9. DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282

  10. Ehrlich I (1973) Participation in illegitimate activities: a theoretical and empirical investigation. J Polit Econ 81(3):521–565

  11. Flury B (1997) A first course in multivariate statistics. Springer, New York

  12. Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann NY Acad Sci 808(1):18–24

  13. Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35

  14. Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296

  15. Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via the cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401

  16. Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182

  17. Ingrassia S, Punzo A, Vittadini G (2015) The generalized linear mixed cluster-weighted model. J Classif 32 (in press)

  18. Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in \({\sf R}\). J Stat Softw 11(8):1–18

  19. Lindsay BG (1995) Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward

  20. Lo Y (2008) A likelihood ratio test of a homoscedastic normal mixture against a heteroscedastic normal mixture. Stat Comput 18(3):233–240

  21. McLachlan GJ (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. J R Stat Soc Ser C (Appl Stat) 36(3):318–324

  22. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

  23. McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51(11):5327–5338

  24. Meng XL, van Dyk D (1997) The EM algorithm—an old folk-song sung to a fast new tune. J R Stat Soc Ser B (Stat Methodol) 59(3):511–567

  25. Punzo A (2014) Flexible mixture modeling with the polynomial Gaussian cluster-weighted model. Stat Model 14(3):257–291

  26. Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Palumbo F (eds) Advances in statistical models for data analysis, studies in classification, data analysis and knowledge organization. Springer, Switzerland

  27. Punzo A, Browne RP, McNicholas PD (2014) Hypothesis testing for parsimonious Gaussian mixture models. http://arxiv.org/abs/1405.0377

  28. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  29. Sakamoto Y, Ishiguro M, Kitagawa G (1983) Akaike information criterion statistics. Reidel, Boston

  30. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

  31. Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40

  32. Vandaele W (1987) Participation in illegitimate activities: Ehrlich revisited. Report, U.S. Department of Justice, National Institute of Justice

Download references

Author information

Correspondence to Sanjeena Subedi.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 195 KB)

Appendices

Appendix 1: Estimation of the parameters for the linear \(t\) CWM with \(t\)-factor analyzers using the AECM algorithm

Let \({\mathcal {S}}=\{({\varvec{x}}_i',y_i)';i=1,\ldots ,n\}\) be a sample of size \(n\). In the EM framework (Dempster et al. 1977), the generic observation \(({\varvec{x}}_i',y_i)'\) is viewed as being incomplete and its complete counterpart is given by \(({\varvec{x}}_i',y_i,{\varvec{u}}_{ig}',w_{ig},\varvec{z}_i')',\) where \(\varvec{z}_i\) is the component-label vector such that \(z_{ig}=1\) if \(({\varvec{x}}_i',y_i)'\) comes from component \(g\) and \(z_{ig}=0\) otherwise. The idea of the AECM algorithm (Meng and van Dyk 1997) is to partition \({\varvec{\theta }}\), say \({\varvec{\theta }}=({\varvec{\theta }}'_1,{\varvec{\theta }}'_2)'\), in such a way that the likelihood is easy to maximize for \({\varvec{\theta }}_1\) given \({\varvec{\theta }}_2\) and vice versa. The AECM algorithm consists of two cycles, with an E-step and a CM-step for each cycle. The two CM-steps correspond to the partition of \({\varvec{\theta }}\) into \({\varvec{\theta }}_1\) and \({\varvec{\theta }}_2\). The algorithm is iterated until convergence.

First cycle

Here, \({\varvec{\theta }}_1=\{\pi _g,{\varvec{\mu }}_g,{\varvec{\beta }}_g,\sigma ^2_g,\nu _g;g=1, \ldots , G\}\), where the missing data are the unobserved group labels \(\varvec{z}_i\), \(i=1,\ldots ,n\), and \(w_{ig}\). The complete-data likelihood is

$$\begin{aligned} L_{c1}({\varvec{\theta }}_1)= & {} \prod _{i=1}^n\prod _{g=1}^G \Bigg [ \pi _g \phi \left( y_i|\varvec{x}_i,w_{ig};\mu (\varvec{x}_i;{\varvec{\beta }}_g),\displaystyle \frac{\sigma ^2_g}{w_{ig}}\right) \phi \left( \varvec{x}_i|w_{ig};{\varvec{\mu }}_g,\displaystyle \frac{{\varvec{\varSigma }}_g}{w_{ig}}\right) \\&\times \,\kappa \left( w_{ig};\frac{\nu _g}{2},\frac{\nu _g}{2}\right) \Bigg ]^{z_{ig}}. \end{aligned}$$

The E-step on the first cycle of the \((k+1)\)th iteration requires the calculation of \(Q_1({\varvec{\theta }}_1; {\varvec{\theta }}^{(k)}) = {\mathbb {E}}_{{\varvec{\theta }}^{(k)}}\left[ l_{c1} ({\varvec{\theta }}_1)|{\mathcal {S}}\right] \), the expected complete-data log-likelihood given the observed data and \({\varvec{\theta }}^{(k)}\). In practice, it requires calculating \({\mathbb {E}}_{{\varvec{\theta }}^{(k)}}\left[ Z_{ig}|{\mathcal {S}}\right] \) and \({\mathbb {E}}_{{\varvec{\theta }}^{(k)}}\left[ W_{ig}|{\mathcal {S}}\right] \); this step is achieved by respectively replacing each \(z_{ig}\) by \(z_{ig}^{(k+1)} = \frac{\tau _{ig}}{\sum _{j=1}^G\tau _{ij}},\) where

$$\begin{aligned} \tau _{ig}= & {} \pi _j^{(k)}\phi \left( y_i|{\varvec{x}}_i,w_{ig};\mu \left( \varvec{x}_i; {\varvec{\beta }}_g^{(k)}\right) ,\frac{\sigma _g^{2(k)}}{w_{ig}}\right) \phi \left( {\varvec{x}}_i|w_{ig};{\varvec{\mu }}_g^{(k)},\frac{{\varvec{\varSigma }}^{(k)}_g}{w_{ig}}\right) \\&\times \, \kappa \left( w_{ig};\frac{\nu _g^{(k)}}{2},\frac{\nu _g^{(k)}}{2}\right) \end{aligned}$$

and each \(w_{ig}\) by

$$\begin{aligned} w_{ig}^{(k+1)}=\frac{v_g^{(k)}+p+1}{v_g^{(k)}+\left( {\varvec{x}}-{\varvec{\mu }}_g^{(k)} \right) '\left( {\varvec{\varSigma }}^{(k)}_g\right) ^{-1} \left( {\varvec{x}}-{\varvec{\mu }}_g^{(k)}\right) + \displaystyle \frac{\left( y-\beta _{0g}^{(k)}-{\varvec{\beta }}_{1g}^{(k)'}{\varvec{x}}\right) ^2}{\sigma _g^{2(k)}}}, \end{aligned}$$

where \({\varvec{\varSigma }}^{(k)}_g={\varvec{\varLambda }}_g^{(k)}{\varvec{\varLambda }}_g^{(k)'}+{\varvec{\varPsi }}_g^{(k)}\). For the M-step, the maximization of \(L_{c1}({\varvec{\theta }}_1)\) yields

$$\begin{aligned}&\displaystyle \pi _g^{(k+1)} = \frac{1}{n}\sum _{i=1}^n z_{ig}^{(k+1)}\nonumber \\&\displaystyle {\varvec{\mu }}_g^{(k+1)} = \frac{\sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{x}}_i }{\sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)} } \nonumber \end{aligned}$$
$$\begin{aligned} {\varvec{\beta }}_{1g}^{(k+1)}&= \left[ \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}} \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} y_i \left( {\varvec{x}}_i - {\varvec{\mu }}_g^{(k+1)} \right) \right] \\&\quad \times \left[ \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}}\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} {\varvec{x}}'_i{\varvec{x}}_i - {\varvec{\mu }}_g^{(k+1)'}{\varvec{\mu }}_g^{(k+1)}\right] ^{-1}\\ \beta _{0g}^{(k+1)}&= \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}}\displaystyle \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} y_i-{\varvec{\beta }}_{1g}^{(k+1)'}{\varvec{\mu }}_g^{(k+1)}\\ \sigma _g^{2(k+1)}&= \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}} \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} \left[ y_i-\left( \beta _{0g}^{(k+1)}+{\varvec{\beta }}_{1g}^{(k+1)'}{\varvec{x}}_i \right) \right] ^2. \end{aligned}$$

With regard to updating for the degrees of freedom \(v_g\), their formulation does not exist in closed form. McLachlan et al. (2007) show that \(v_g^{(k+1)}\) is the solution of the equation

$$\begin{aligned}&\Bigg \{ -\psi \left( \displaystyle \frac{v_g}{2}\right) +\ln \left( \displaystyle \frac{v_g}{2} \right) +1+ \displaystyle \frac{1}{n_g^{(k+1)}}\displaystyle \sum _{i=1}^nz_{ig}^{(k+1)}\left[ \ln \left( w_{ig}^{(k+1)}\right) -w_{ig}^{(k+1)}\right]&\nonumber \\&\quad +\psi \left( \displaystyle \frac{v_g^{(k)}+p}{2} \right) -\ln \left( \displaystyle \frac{v_g^{(k)}+p}{2} \right) \Bigg \}=0,&\end{aligned}$$
(6)

where \(\psi (\cdot )\) denotes the digamma function and \(n_g^{(k+1)}=\sum _{i=1}^nz_{ig}^{(k+1)}\). Following the notation in McLachlan and Peel (2000), we set \({\varvec{\theta }}^{(k+1/2)}=\{{\varvec{\theta }}_1^{(k+1)},{\varvec{\theta }}_2^{(k)}\}\).

Second cycle

Here, \({\varvec{\theta }}_2=\left\{ {\varvec{\varSigma }}_g ; g=1, \ldots , G \right\} = \left\{ {\varvec{\varLambda }}_g,{\varvec{\varPsi }}_g;g=1, \ldots , G\right\} \), where the missing data are the unobserved group labels \(\varvec{z}_i\) and the latent factors \(\varvec{u}_{ig}\), \(i=1,\ldots ,n\) and \(g=1,\ldots ,G\). Therefore, the complete-data likelihood is

$$\begin{aligned} L_{c2}({\varvec{\theta }}_2)&= \prod _{i=1}^n\prod _{g=1}^G \Bigg [\pi _g^{(k+1)} \phi \left( y_i|{\varvec{x}}_i,w_{ig}; \mu \left( {\varvec{x}}_i;{\varvec{\beta }}_g^{(k+1)}\right) ,\sigma _g^{2(k+1)}\right) \\&\times \,\phi \left( {\varvec{x}}_i|{\varvec{u}}_{ig},w_{ig};{\varvec{\mu }}_g^{(k+1)}+{\varvec{\varLambda }}_g {\varvec{u}}_{ig},\frac{{\varvec{\varPsi }}_g}{w_{ig}}\right) \\&\times \, \phi \left( {\varvec{u}}_{ig}|w_{ig};{\varvec{0}},\frac{{\varvec{I}}_q}{w_{ig}}\right) \kappa \left( w_{ig};\frac{\nu _g^{(k+1)}}{2},\frac{\nu _g^{(k+1)}}{2} \right) \Bigg ]^{z_{ig}}, \end{aligned}$$

where \(Y\) is conditionally independent of \({\varvec{U}}\) given \({\varvec{X}}={\varvec{x}}\).

The E-step on the second cycle of the \((k+1)\)th iteration requires the calculation of \(Q_2({\varvec{\theta }}_2; {\varvec{\theta }}^{(k+1/2)}) = {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} \left[ l_{c2} ({\varvec{\theta }}_2)|{\mathcal {S}}\right] \). This involves calculating the following conditional expectations: \({\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}(Z_{ig}|{\mathcal {S}})\), \({\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}(W_{ig}|{\mathcal {S}})\), \({\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}(Z_{ig} W_{ig}{\varvec{U}}_{ig} | {\mathcal {S}})\), and \({\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig}{\varvec{U}}_{ig} {\varvec{U}}'_{ig} |{\mathcal {S}})\). We have

$$\begin{aligned} {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig} {\varvec{U}}_{ig} | {\mathcal {S}}) =z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\gamma }}_g^{(k)}\left( {\varvec{x}}_i- {\varvec{\mu }}_g^{(k+1)}\right) \end{aligned}$$

and

$$\begin{aligned} {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig}{\varvec{U}}_{ig} {\varvec{U}}'_{ig} | {\mathcal {S}}) = z_{ig}^{(k+1)} w_{ig}^{(k+1)}{\varvec{\varTheta }}^{(k)}_g, \end{aligned}$$

where

$$\begin{aligned} {\varvec{S}}_g^{(k+1)}= & {} \frac{1}{n_g^{(k+1)}}\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} \left( {\varvec{x}}_i - {\varvec{\mu }}_g^{(k+1)} \right) \left( {\varvec{x}}_i - {\varvec{\mu }}_g^{(k+1)} \right) ' \nonumber \\ {\varvec{\gamma }}^{(k)}_g= & {} {\varvec{\varLambda }}^{(k)'}_g \left( {\varvec{\varLambda }}^{(k)}_g{\varvec{\varLambda }}^{(k)'}_g+{\varvec{\varPsi }}^{(k)}_g \right) ^{-1} \nonumber \\ {\varvec{\varTheta }}^{(k)}_g= & {} {\varvec{I}}_q-{\varvec{\gamma }}^{(k)}_g {\varvec{\varLambda }}^{(k)}_g+ {\varvec{\gamma }}^{(k)}_g {\varvec{S}}_g^{(k+1)} {\varvec{\gamma }}^{(k)'}_g. \end{aligned}$$
(7)

Hence, the \(g\)th term of the expected complete-data log-likelihood \(Q_2({\varvec{\theta }}_2; {\varvec{\theta }}^{(k+1/2)})\) is

$$\begin{aligned} Q_2({\varvec{\varLambda }}_g, {\varvec{\varPsi }}_g; {\varvec{\theta }}^{(k+1/2)})&= \text {C} \left( {\varvec{\theta }}_1^{(k+1)} \right) \!-\! \frac{1}{2} n_g^{(k+1)}\ln | {\varvec{\varPsi }}^{-1}_g| - \frac{1}{2} n_g^{(k+1)} \text {tr}\left( {\varvec{S}}_g^{(k+1)} {\varvec{\varPsi }}_g^{-1} \right) \nonumber \\&\quad + n_g^{(k+1)}\text {tr}\left( {\varvec{\varLambda }}_g {\varvec{\gamma }}_g^{(k)}{\varvec{S}}_g^{(k+1)} {\varvec{\varPsi }}^{-1}_g \right) \nonumber \\&\quad -\frac{1}{2} n_g^{(k+1)}\text {tr} \left( {\varvec{\varLambda }}_g'{\varvec{\varPsi }}_g^{-1} {\varvec{\varLambda }}_g {\varvec{\varTheta }}^{(k)}_g \right) , \end{aligned}$$
(8)

where \(\text {C}({\varvec{\theta }}_1^{(k+1)})\) includes the terms in the complete-data log-likelihood that do not depend on \({\varvec{\theta }}_2\). Then, maximization of (8) with respect to \({\varvec{\varLambda }}_g\) and \({\varvec{\varPsi }}_g\) means that they must satisfy

$$\begin{aligned}&\frac{\partial Q_2}{\partial {\varvec{\varLambda }}_g} = n_g^{(k+1)} {\varvec{\varPsi }}_g^{-1} {\varvec{S}}_g^{(k+1)} {\varvec{\gamma }}_g^{(k)'} - n_g^{(k+1)} {\varvec{\varPsi }}_g^{-1} {\varvec{\varLambda }}_g {\varvec{\varTheta }}^{(k)}_g = {\varvec{0}}\\ \frac{\partial Q_2}{\partial {\varvec{\varPsi }}^{-1}_g}&\!=\! \frac{1}{2}n_g^{(k+1)}{\varvec{\varPsi }}_g-\frac{1}{2}n_g^{(k+1)}{\varvec{S}}_g^{(k+1)} \!+\! n_g^{(k+1)} {\varvec{S}}_g^{(k+1)'} {\varvec{\gamma }}_g^{(k)'} {\varvec{\varLambda }}_g'\\&\quad -\frac{1}{2} n_g^{(k+1)}{\varvec{\varLambda }}_g{\varvec{\varTheta }}_g^{(k)}{\varvec{\varLambda }}_g'= {\varvec{0}}. \end{aligned}$$

Therefore,

$$\begin{aligned}&\displaystyle {\varvec{S}}_g^{(k+1)} {\varvec{\gamma }}_g^{(k)'} - {\varvec{\varLambda }}_g {\varvec{\varTheta }}^{(k)}_g = {\varvec{0}}\end{aligned}$$
(9)
$$\begin{aligned}&\displaystyle {\varvec{\varPsi }}_g-{\varvec{S}}_g^{(k+1)} +2{\varvec{S}}_g^{(k+1)'} {\varvec{\gamma }}_g^{(k)'} {\varvec{\varLambda }}_g'-{\varvec{\varLambda }}_g{\varvec{\varTheta }}_g^{(k)}{\varvec{\varLambda }}_g' = {\varvec{0}}. \end{aligned}$$
(10)

From (9), we get \({\varvec{\varLambda }}_g^{(k+1)} = {\varvec{S}}^{(k+1)}_g {\varvec{\gamma }}^{(k)'}_g\left( {\varvec{\varTheta }}_g^{(k)}\right) ^{-1}\) and, substituting in (10), we obtain

$$\begin{aligned}&{\varvec{\varPsi }}_g^{(k+1)} -{\varvec{S}}_g^{(k+1)}+2 {\varvec{S}}_g^{(k+1)} {\varvec{\gamma }}^{(k)'}_g \left( {\varvec{S}}_g^{(k+1)}{\varvec{\gamma }}^{(k)'}_g{\varvec{\varTheta }}_g^{(k)-1} \right) '\\&\quad -\left( \mathbf {S}_g^{(k+1)}{\varvec{\gamma }}^{(k)'}_g{\varvec{\varTheta }}_g^{(k)-1}\right) {\varvec{\varTheta }}_g^{(k)}\left( \mathbf {S}_g^{(k+1)}{\varvec{\gamma }}_g^{(k)}{\varvec{\varTheta }}_g^{(k)-1} \right) ' = {\varvec{0}}, \end{aligned}$$

which yields \( {\varvec{\varPsi }}_g^{(k+1)} =\text {diag} ( {\varvec{S}}^{(k+1)}_g- {\varvec{\varLambda }}_g^{(k+1)} {\varvec{\gamma }}^{(k)'}_g{\varvec{S}}^{(k+1)}_g ). \)

Appendix 2: Estimation of the parameters for the linear \(t\) CWM with common \(t\)-factor analyzers using the AECM algorithm

Analogous to “Appendix 1”, the AECM algorithm consists of two cycles, with an E-step and a CM-step for each cycle, which are iterated until convergence is achieved.

First cycle

Here, \({\varvec{\theta }}_1=\left\{ \pi _g,{\varvec{\xi }}_g,{\varvec{\beta }}_g,\sigma ^2_g,v_g;g=1, \ldots , G\right\} \), where the missing data are the unobserved group labels \(\varvec{z}_i\), \(i=1,\ldots ,n\). The complete-data likelihood is

$$\begin{aligned} L_{c1}({\varvec{\theta }}_1)= & {} \prod _{i=1}^n\prod _{g=1}^G \Bigg [ \pi _g \phi \left( y_i|\varvec{x}_i,w_{ig};\mu (\varvec{x}_i;{\varvec{\beta }}_g), \displaystyle \frac{\sigma ^2_g}{w_{ig}}\right) \phi \left( \varvec{x}_i|w_{ig};{\varvec{\varLambda }}{\varvec{\xi }}_g,\displaystyle \frac{{\varvec{\varSigma }}_g}{w_{ig}}\right) \\&\times \,\kappa \left( w_{ig};\frac{\nu _g}{2},\frac{\nu _g}{2}\right) \Bigg ]^{z_{ig}}, \end{aligned}$$

where \({\varvec{\varSigma }}_g={\varvec{\varLambda }}{\varvec{\varOmega }}_g{\varvec{\varLambda }}'+{\varvec{\varPsi }}\).

Similar to “First cycle” section of Appendix 1, \({\mathbb {E}}_{{\varvec{\theta }}^{(k)}}\left[ Z_{ig}|{\mathcal {S}}\right] \) is calculated by replacing each \(z_{ig}\) by \(z_{ig}=\frac{\tau _{ig}}{\sum _{j=1}^G\tau _{ij}}\) where

$$\begin{aligned} \tau _{ig}= & {} \pi _g^{(k)} \phi \left( y_i|{\varvec{x}}_i,w_{ig};\mu (\varvec{x}_i;{\varvec{\beta }}_g^{(k)}), \frac{\sigma _g^{2(k)}}{w_{ig}}\right) \phi \left( {\varvec{x}}_i|w_{ig}; {\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k)},\frac{{\varvec{\varSigma }}_g^{(k)}}{w_{ig}} \right) \\&\times \kappa \left( w_{ig};\frac{\nu _g^{(k)}}{2},\frac{\nu _g^{(k)}}{2} \right) \end{aligned}$$

and \({\mathbb {E}}\left[ W_{ig}|{\mathcal {S}}\right] \) is computed by replacing each \(w_{ig}\) by

$$\begin{aligned} w_{ig}^{(k+1)}&=(v_g^{(k)}+p+1)\Bigg [v_g^{(k)}+\left( {\varvec{x}}-{\varvec{\varLambda }}^{(k)} {\varvec{\xi }}_g^{(k)'}\right) \left( {\varvec{\varSigma }}_g^{(k)}\right) ^{-1} \left( {\varvec{x}}-{\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k)}\right) \\&\quad +\displaystyle \frac{\left( y-\beta _{0g}^{(k)}-{\varvec{\beta }}_{1g}^{(k)'}{\varvec{x}}\right) ^2}{\sigma _g^{2(k)}}\Bigg ]^{-1}, \end{aligned}$$

where \({\varvec{\varSigma }}_g^{(k)}={\varvec{\varLambda }}^{(k)}{\varvec{\varOmega }}_g^{(k)}{\varvec{\varLambda }}^{(k)'}+{\varvec{\varPsi }}^{(k)}\). For the M-step, the maximization of this complete-data log-likelihood yields

$$\begin{aligned} \pi _g^{(k+1)}&= \frac{1}{n}\displaystyle \sum _{i=1}^n z_{ig}^{(k+1)}, \qquad {\varvec{\xi }}_g^{(k+1)} = \frac{ \sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\varLambda }}^{(k)'}{\varvec{x}}_i }{\sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)} } \nonumber \\ {\varvec{\beta }}_{1g}^{(k+1)}&= \left[ \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}} \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} y_i \left( {\varvec{x}}_i - {\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k+1)}\right) \right] \\&\quad \!\times \! \left[ \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}}\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} {\varvec{x}}'_i{\varvec{x}}_i \!-\!( {\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k+1)})'\left( {\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k+1)}\right) \right] ^{-1}\\ \beta _{0g}^{(k+1)}&= \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}}\displaystyle \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)} y_i-{\varvec{\beta }}_{1g}^{(k+1)'}{\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k+1)}\\ \sigma _g^{2(k+1)}&= \frac{1}{\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}} \sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}\left[ y_i-\left( \beta _{0g}^{(k+1)}+{\varvec{\beta }}_{1g}^{(k+1)'}{\varvec{x}}_i\right) \right] ^2. \end{aligned}$$

As in “First cycle” section of Appendix 1, the degrees of freedom \(v_g\) are updated according to (6), and we set \({\varvec{\theta }}^{(k+1/2)}=\left\{ {\varvec{\theta }}_1^{(k+1)},{\varvec{\theta }}_2^{(k)}\right\} \).

Second cycle

Here, \({\varvec{\theta }}_2=\left\{ {\varvec{\varSigma }}_g ; g=1, \ldots , G \right\} = \left\{ {\varvec{\varLambda }},{\varvec{\varOmega }}_g,{\varvec{\varPsi }};g=1, \ldots , G\right\} \), where the missing data are the unobserved group labels \(\varvec{z}_i\) and the latent factors \(\varvec{u}_{ig}\), \(i=1,\ldots ,n\) and \(g=1,\ldots ,G\). Therefore, the complete-data likelihood is

$$\begin{aligned} L_{c2}({\varvec{\theta }}_2)&= \prod _{i=1}^n\prod _{g=1}^G \Bigg [ \pi _g^{(k+1)} \phi \left( y_i|{\varvec{x}}_i,w_{ig};\mu ({\varvec{x}}_i;{\varvec{\beta }}_g^{(k+1)}),\frac{\sigma _g^{2(k+1)}}{w_{ig}}\right) \\&\times \, \phi \left( {\varvec{x}}_i|{\varvec{u}}_{ig},w_{ig};{\varvec{\varLambda }}{\varvec{u}}_{ig},\frac{{\varvec{\varPsi }}}{w_{ig}}\right) \phi \left( {\varvec{u}}_{ig}|w_{ig};{\varvec{\xi }}_g^{(k+1)},\frac{{\varvec{\varOmega }}_g}{w_{ig}} \right) \kappa \left( w_{ig};\frac{\nu _g^{(k)}}{2},\frac{\nu _g^{(k)}}{2} \right) \Bigg ]^{z_{ig}} \end{aligned}$$

because \(Y\) is conditionally independent of \({\varvec{U}}\) given \({\varvec{X}}={\varvec{x}}\). The E-step on the second cycle of the \((k+1)\)th iteration requires the calculation of \(Q_2({\varvec{\theta }}_2; {\varvec{\theta }}^{(k+1/2)})={\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ l_{c2}({\varvec{\theta }}_2)|{\mathcal {S}}\right] \). Based on (5), the expectations involved in calculating \(Q_2({\varvec{\theta }}_2; {\varvec{\theta }}^{(k+1/2)})\) are given by

$$\begin{aligned}&{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} \left[ Z_{ig}W_{ig} ({\varvec{X}}_{ig}-{\varvec{\varLambda }}{\varvec{U}}_{ig})({\varvec{X}}_{ig}-{\varvec{\varLambda }}{\varvec{U}}_{ig})'| {\mathcal {S}}\right] = z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{x}}_i{\varvec{x}}_i' \nonumber \\&\quad \quad -{\varvec{\varLambda }}{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig} {\varvec{U}}_{ig}|{\mathcal {S}}){\varvec{x}}'_i -{\varvec{x}}_i{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig}{\varvec{U}}_{ig}'|{\mathcal {S}}){\varvec{\varLambda }}' \nonumber \\&\quad \quad +{\varvec{\varLambda }}{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} (Z_{ig}W_{ig}{\varvec{U}}_{ig}{\varvec{U}}'_{ig}|{\mathcal {S}}){\varvec{\varLambda }}', \end{aligned}$$
(11)
$$\begin{aligned}&{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}(Z_{ig}W_{ig}{\varvec{U}}_{ig}|{\mathcal {S}}) = z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\eta }}_{ig}^{(k+1/2)}, \end{aligned}$$
(12)
$$\begin{aligned}&{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig} ({\varvec{U}}_{ig}-{\varvec{\xi }}_g)|{\mathcal {S}}\right] = z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\gamma }}_g^{(k)'}{\varvec{\delta }}_{ig}^{(k+1/2)}, \end{aligned}$$
(13)
$$\begin{aligned} {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}(Z_{ig}W_{ig}{\varvec{U}}_{ig}{\varvec{U}}_{ig}'|{\mathcal {S}})= & {} z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\eta }}_{ig}^{(k+1/2)}{\varvec{\eta }}_{ig}^{(k+1/2)'} \nonumber \\&z_{ig}^{(k+1)} \left[ \left( {\varvec{I}}_q-{\varvec{\gamma }}_g^{(k)'}{\varvec{\varLambda }}^{(k)}\right) {\varvec{\varOmega }}_g^{(k)}\right] , \end{aligned}$$
(14)
$$\begin{aligned} {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig}({\varvec{U}}_{ig}-{\varvec{\xi }}_g)({\varvec{U}}_{ig}-{\varvec{\xi }}_g)'\big | {\mathcal {S}}\right]= & {} {\varvec{\gamma }}_g^{(k)'}\left[ z_{ig}^{(k+1)}w_{ig}^{(k+1)} {\varvec{\delta }}_{ig}^{(k+1/2)}{\varvec{\delta }}_{ig}^{(k+1/2)'}\right] {\varvec{\gamma }}_g^{(k)} \nonumber \\&+z_{ig}^{(k+1)}\left( {\varvec{I}}_q-{\varvec{\gamma }}_g^{(k)'}{\varvec{\varLambda }}^{(k)}\right) {\varvec{\varOmega }}_g^{(k)}, \end{aligned}$$
(15)

where

$$\begin{aligned} {\varvec{\delta }}_{ig}^{(k+1/2)}= & {} {\varvec{x}}_i - {\varvec{\varLambda }}^{(k)}{\varvec{\xi }}_g^{(k+1)},\\ {\varvec{\eta }}_{ig}^{(k+1/2)}= & {} {\varvec{\gamma }}_g^{(k)'}{\varvec{\delta }}_{ig}^{(k+1/2)} +{\varvec{\xi }}_g^{(k+1)}, \end{aligned}$$

and

$$\begin{aligned} {\varvec{\gamma }}^{(k)}_g = \left( {\varvec{\varLambda }}^{(k)}{\varvec{\varOmega }}_g^{(k)}{\varvec{\varLambda }}^{(k)'}+{\varvec{\varPsi }}^{(k)}\right) ^{-1} {\varvec{\varLambda }}^{(k)}{\varvec{\varOmega }}_g^{(k)}. \end{aligned}$$

Alternatively, the expectation in (11 12 13) can be written as

$$\begin{aligned}&{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left\{ Z_{ig}W_{ig}\left[ {\varvec{X}}_{ig}-{\varvec{\varLambda }}{\varvec{\xi }}_g -({\varvec{\varLambda }}{\varvec{U}}_{ig}\!-\!{\varvec{\varLambda }}{\varvec{\xi }}_g)\right] \left[ {\varvec{X}}_{ig}\!-\!{\varvec{\varLambda }}{\varvec{\xi }}_g\!-\!({\varvec{\varLambda }}{\varvec{U}}_{ig}\!-\!{\varvec{\varLambda }}{\varvec{\xi }}_g)\right] ' \Big |{\mathcal {S}}\right\} \nonumber \\&\quad \!=\!+z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\delta }}_{ig}^{(k+1/2)}{\varvec{\delta }}_{ig}^{(k+1/2)'} -{\varvec{\delta }}_{ig}^{(k+1/2)} {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} \left[ Z_{ig}W_{ig} ({\varvec{U}}_{ig}-{\varvec{\xi }}_g)\big | {\mathcal {S}} \right] {\varvec{\varLambda }}^{(k)'}\nonumber \\&\quad \quad -{\varvec{\varLambda }}^{(k)}{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig} ({\varvec{U}}_{ig} -{\varvec{\xi }}_g)\big |{\mathcal {S}}\right] {\varvec{\delta }}_{ig}^{(k+1/2)'}\nonumber \\&\quad \quad +{\varvec{\varLambda }}^{(k)}{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig} ({\varvec{U}}_{ig}-{\varvec{\xi }}_g)\big |{\mathcal {S}}\right] {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig}({\varvec{U}}_{ig}-{\varvec{\xi }}_g)'\Big | {\mathcal {S}}\right] {\varvec{\varLambda }}^{(k)'}.\nonumber \\ \end{aligned}$$
(16)

Then, the \(g\)th term of the expected complete-data log-likelihood \(Q_2({\varvec{\theta }}_2; {\varvec{\theta }}^{(k+1/2)})\) becomes

$$\begin{aligned}&Q_2\left( {\varvec{\varLambda }},{\varvec{\varPsi }},{\varvec{\varOmega }}_g;{\varvec{\theta }}^{(k+1/2)}\right) = \text {C}\left( {\varvec{\theta }}_1^{(k+1)}\right) +\frac{1}{2} n_g^{(k+1)}\ln \left| {\varvec{\varPsi }}^{-1}\right| + \frac{1}{2}n_g^{(k+1)}\ln \left| {\varvec{\varOmega }}_g^{-1}\right| \nonumber \\&\quad -\frac{1}{2}\text {tr}\left\{ \sum _{g=1}^G\sum _{i=1}^n{\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} \left[ Z_{ig}W_{ig}({\varvec{X}}_{ig}-{\varvec{\varLambda }}{\varvec{U}}_{ig}) ({\varvec{X}}_{ig}-{\varvec{\varLambda }}{\varvec{U}}_{ig})'|{\mathcal {S}}\right] {\varvec{\varPsi }}^{-1}\right\} \nonumber \\&\quad -\frac{1}{2}\text {tr}\left\{ \sum _{i=1}^n {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}} \left[ Z_{ig}W_{ig}\left( {\varvec{U}}_{ig}-{\varvec{\xi }}_g^{(k+1)}\right) \left( {\varvec{U}}_{ig}-{\varvec{\xi }}_g^{(k+1)}\right) '\Big | {\mathcal {S}}\right] {\varvec{\varOmega }}_g^{-1}\right\} , \end{aligned}$$

where \(\text {C}({\varvec{\theta }}_1^{(k+1)})\) denotes the terms in the complete-data log-likelihood that do not depend on \({\varvec{\theta }}_2\). The expected complete-data log-likelihood from the second cycle is maximized for \({\varvec{\varOmega }}_g\) by satisfying

$$\begin{aligned} \frac{\partial Q_2}{\partial {\varvec{\varOmega }}_g^{-1}} \!=\! \frac{1}{2} n_g^{(k+1)} {\varvec{\varOmega }}_g \!-\!\frac{1}{2} n_g^{(k+1)} \left\{ {\mathbb {E}}_{{\varvec{\theta }}^{(k+1/2)}}\left[ Z_{ig}W_{ig}({\varvec{U}}_{ig}\!-\!{\varvec{\xi }}_g)({\varvec{U}}_{ig}\!-\!{\varvec{\xi }}_g)' \Big | {\mathcal {S}}\right] \right\} ' \!=\! {\varvec{0}},\nonumber \\ \end{aligned}$$
(17)

which yields

$$\begin{aligned} {\varvec{\varOmega }}_g^{(k+1)}={\varvec{\gamma }}_g^{(k)'}{\varvec{S}}_g^{(k+1)}{\varvec{\gamma }}_g^{(k)} +{\varvec{\varOmega }}_g^{(k)} \left( {\varvec{I}}_q-{\varvec{\varLambda }}^{(k)'}{\varvec{\gamma }}_g^{(k)} \right) , \end{aligned}$$

where

$$\begin{aligned} {\varvec{S}}_g^{(k+1)} = \frac{1}{n_g^{(k+1)}}\sum _{i=1}^n z_{ig}^{(k+1)} w_{ig}^{(k+1)}{\varvec{\delta }}_{ig}^{(k+1/2)} {\varvec{\delta }}_{ig}^{(k+1/2)'}. \end{aligned}$$

The expected complete-data log-likelihood from the second cycle is maximized for \({\varvec{\varPsi }}\) using (16) and satisfies

$$\begin{aligned} \frac{\partial Q_2}{\partial {\varvec{\varPsi }}^{-1}}&= \sum _{g=1}^G \Bigg (\frac{1}{2}n_g^{(k+1)}{\varvec{\varPsi }}-\frac{1}{2}n_g^{(k+1)}{\varvec{S}}_g^{(k+1)} + \frac{1}{2}n_g^{(k+1)} {\varvec{S}}_g^{(k+1)} {\varvec{\gamma }}_g^{(k)} {\varvec{\varLambda }}^{(k)'}\\&\quad \quad +\frac{1}{2}n_g^{(k+1)} {\varvec{S}}_g^{(k+1)} {\varvec{\varLambda }}^{(k)}{\varvec{\gamma }}_g^{(k)'} \Bigg ) \\&= -\frac{1}{2}\sum _{g=1}^G n_g^{(k+1)}{\varvec{\varLambda }}^{(k)}\left[ {\varvec{\gamma }}_g^{(k)'}{\varvec{S}}_g^{(k+1)}{\varvec{\gamma }}_g^{(k)} +\left( {\varvec{I}}_q-{\varvec{\gamma }}_g^{(k)'}{\varvec{\varLambda }}^{(k)}\right) {\varvec{\varOmega }}_g^{(k)}\right] ' {\varvec{\varLambda }}^{(k)'} = {\varvec{0}}. \end{aligned}$$

This yields

$$\begin{aligned} {\varvec{\varPsi }}^{(k+1)}&=\text {diag}\Bigg (\frac{1}{n}\sum _{g=1}^G n_g^{(k+1)}\Bigg [ \left( {\varvec{\varLambda }}^{(k)}{\varvec{\gamma }}_g^{(k)'}-{\varvec{I}}_p\right) {\varvec{S}}_g^{(k+1)} \left( {\varvec{\varLambda }}^{(k)}{\varvec{\gamma }}_g^{(k)'}-{\varvec{I}}_p\right) '\\&+{\varvec{\varLambda }}^{(k)}{\varvec{\varOmega }}_g^{(k)} \left( {\varvec{I}}_q-{\varvec{\varLambda }}^{(k)'}{\varvec{\gamma }}_g^{(k)} \right) {\varvec{\varLambda }}^{(k)'}\Bigg ]\Bigg ) \end{aligned}$$

The expected complete-data log-likelihood from the second cycle is maximized for \({\varvec{\varLambda }}\) using (11 12 13) and satisfies

$$\begin{aligned} \frac{\partial Q_2}{\partial {\varvec{\varLambda }}}= & {} -\sum _{g=1}^G\sum _{i=1}^nz_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{x}}_i{\varvec{\eta }}_{ig}^{(k+1/2)'} +{\varvec{\varLambda }}\sum _{g=1}^G\sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\eta }}_{ig}^{(k+1/2)} {\varvec{\eta }}_{ig}^{(k+1/2)'}\\&+{\varvec{\varLambda }}\sum _{g=1}^G\sum _{i=1}^nz_{ig}^{(k+1)} \left[ \left( {\varvec{I}}_q-{\varvec{\gamma }}_g^{(k)'}{\varvec{\varLambda }}^{(k)}\right) {\varvec{\varOmega }}_g^{(k)} \right] = {\varvec{0}}, \end{aligned}$$

which yields

$$\begin{aligned} {\varvec{\varLambda }}^{(k+1)}&= \left( \sum _{g=1}^G\sum _{i=1}^nz_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{x}}_i{\varvec{\eta }}_{ig}^{(k+1/2)'} \right) \\&\!\times \!\, \left( \sum _{g=1}^G\sum _{i=1}^n z_{ig}^{(k+1)}w_{ig}^{(k+1)}{\varvec{\eta }}_{ig}^{(k+1/2)} {\varvec{\eta }}_{ig}^{(k+1/2)'}+ \sum _{g=1}^Gn_g^{(k+1)} ({\varvec{I}}_q-{\varvec{\gamma }}_g^{(k)'}{\varvec{\varLambda }}^{(k)}){\varvec{\varOmega }}_g^{(k)} \right) ^{-1}. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Subedi, S., Punzo, A., Ingrassia, S. et al. Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24, 623–649 (2015). https://doi.org/10.1007/s10260-015-0298-7

Download citation

Keywords

  • Cluster-weighted models
  • Factor analyzers
  • Multivariate \(t\)-distributions
  • Parsimonious models