Skip to main content
Log in

Mixtures of restricted skew-t factor analyzers with common factor loadings

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Mixtures of common t factor analyzers (MCtFA) have been shown its effectiveness in robustifying mixtures of common factor analyzers (MCFA) when handling model-based clustering of the high-dimensional data with heavy tails. However, the MCtFA model may still suffer from a lack of robustness against observations whose distributions are highly asymmetric. This paper presents a further robust extension of the MCFA and MCtFA models, called the mixture of common restricted skew-t factor analyzers (MCrstFA), by assuming a restricted multivariate skew-t distribution for the common factors. The MCrstFA model can be used to accommodate severely non-normal (skewed and leptokurtic) random phenomena while preserving its parsimony in factor-analytic representation and performing graphical visualization in low-dimensional plots. A computationally feasible expectation conditional maximization either algorithm is developed to carry out maximum likelihood estimation. The numbers of factors and mixture components are simultaneously determined based on common likelihood penalized criteria. The usefulness of our proposed model is illustrated with simulated and real datasets, and experimental results signify its superiority over some existing competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305

    Article  MATH  Google Scholar 

  • Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A (2014) The skew-normal and related families. IMS monographs series. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Azzalini A, Browne RP, Genton MG, McNicholas PD (2016) On nomenclature for, and the relative merits of, two formulations of skew distributions. Stat Probab Lett 110:201–206

    Article  MathSciNet  MATH  Google Scholar 

  • Baek J, McLachlan GJ (2011) Mixtures of common \(t\)-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276

    Article  Google Scholar 

  • Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32:1–13

    Article  Google Scholar 

  • Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J Roy Stat Soc Ser B 63:167–241

    Article  MathSciNet  MATH  Google Scholar 

  • Beal MJ (2003) Variational algorithms for approximate Bayesian inference. Ph.D. thesis, The University of London, London, UK

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725

    Article  Google Scholar 

  • Cabral CR, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    Article  MathSciNet  MATH  Google Scholar 

  • Castro LM, Costa DR, Prates MO, Lachos VH (2015) Likelihood-based inference for Tobit confirmatory factor analysis using the multivariate Student-\(t\) distribution. Stat Comput 25:1163–1183

    Article  MathSciNet  MATH  Google Scholar 

  • Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D, Brown PO (2002) Gene expression patterns in human liver cancers. Mol Biol Cell 13:1929–1939

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 9:1–38

    MATH  Google Scholar 

  • Ghahramani Z, Beal M (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems. MIT Press, Cambridge

    Google Scholar 

  • Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Technical Report No. CRG-TR-96-1, The University of Toronto, Toronto

  • Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc C 28:100–108

    MATH  Google Scholar 

  • Hubert LJ, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  MATH  Google Scholar 

  • Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233

    Article  MATH  Google Scholar 

  • Lachos VH, Morenoa EJL, Chen K, Cabralc CRB (2017) Finite mixture modeling of censored data using the multivariate Student-\(t\) distribution. J Multivar Anal 159:151–167

    Article  MathSciNet  Google Scholar 

  • Lee SX, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comp 24:181–202

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: the unication of the restricted and unrestricted skew \(t\)-mixture models. Stat Comp 26:573–589

    Article  MATH  Google Scholar 

  • Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, pp 1–61. School of Social Science, University of Manchester

  • Lee WL, Chen YC, Hsieh KS (2003) Ultrasonic liver tissues classification by fractal feature vector based on M-band wavelet transform. IEEE Trans Med Imaging 22:382–392

    Article  Google Scholar 

  • Lin TI (2014) Learning from incomplete data via parameterized \(t\) mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24:510–531

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Model 28:50–72

    Article  MathSciNet  Google Scholar 

  • Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:33–648

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New York

    MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comp 18:285–296

    Article  MathSciNet  Google Scholar 

  • McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723

    Article  MathSciNet  MATH  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    Article  MathSciNet  MATH  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  MATH  Google Scholar 

  • Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82

    Article  MATH  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2017a) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168

    Article  MathSciNet  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2017b) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156

    Article  MathSciNet  MATH  Google Scholar 

  • Ouyang M, Welsh W, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20:917–923

    Article  Google Scholar 

  • Prates MO, Cabral CR, Lachos VH (2013) mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions. J Stat Soft 54:1–20

    Article  Google Scholar 

  • Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524

    Article  Google Scholar 

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Subedi S, McNicholas PD (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv Data Anal Classif 8:167–193

    Article  MathSciNet  Google Scholar 

  • Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21:3025–3033

    Article  Google Scholar 

  • Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440

    Article  MathSciNet  Google Scholar 

  • Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128

    Article  Google Scholar 

  • Wang WL (2013) Mixtures of common factor analyzers for high-dimensional data with missing information. J Multivar Anal 117:120–133

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL (2015) Mixtures of common \(t\)-factor analyzers for modeling high-dimensional data with missing values. Comput Stat Data Anal 83:223–235

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2016) Maximum likelihood inference for the multivariate t mixture model. J Multivar Anal 149:54–64

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2017) Flexible clustering via extended mixtures of common \(t\)-factor analyzers. AStA Adv Stat Anal 101:227–252

    Article  MathSciNet  MATH  Google Scholar 

  • Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12

  • Wang WL, Castro LM, Lin TI (2017a) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL, Liu M, Lin TI (2017b) Robust skew-\(t\) factor analysis models for handling missing data. Stat Methods Appl 26:649–672

    Article  MathSciNet  MATH  Google Scholar 

  • Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the Coordinating Editor, Maurizio Vichi, the Associate Editor and three anonymous referees for their comments and suggestions that greatly improved this paper. W.L. Wang and T.I. Lin would like to acknowledge the support of the Ministry of Science and Technology of Taiwan under Grant Nos. MOST 105-2118-M-035-004-MY2 and MOST 105-2118-M- 005-003-MY2, respectively. L.M. Castro acknowledges support from Grant FONDECYT 1170258 from Chilean government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Appendices

Appendix A: Proof of hierarchical representation (8)

It follows from (7) that

$$\begin{aligned}&E\left( {\varvec{Y}}_j{\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right) \nonumber \\&\quad =E\left[ E\left( {\varvec{Y}}_j{\varvec{U}}_{ij}^{\top }\mid {\varvec{U}}_{ij},\gamma _j,\tau _j,Z_{ij}=1\right) \mid \gamma _j,\tau _j,Z_{ij}=1\right] \\&\quad =E\left[ E({\varvec{Y}}_j\mid {\varvec{U}}_{ij},\gamma _j,\tau _j,Z_{ij}=1){\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right] \\&\quad =E\left( {\varvec{A}}{\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right) \\&\quad ={\varvec{A}}\left[ \tau _j^{-1}{\varvec{\varOmega }}_i+({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)^{\top }\right] , \end{aligned}$$

and

$$\begin{aligned}&\text{ cov }\left( {\varvec{Y}}_j, {\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right) \\&\quad =E\left( {\varvec{Y}}_j{\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right) -E({\varvec{Y}}_j\mid \gamma _j,\tau _j,Z_{ij}=1)E\left( {\varvec{U}}_{ij}^{\top }\mid \gamma _j,\tau _j,Z_{ij}=1\right) \\&\quad ={\varvec{A}}\left[ \tau _j^{-1}{\varvec{\varOmega }}_i+({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)^{\top }\right] -({\varvec{\mu }}_i+{\varvec{\alpha }}_i\gamma _j)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)^{\top }\\&\quad ={\varvec{A}}\left[ \tau _j^{-1}{\varvec{\varOmega }}_i+({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)^{\top }\right] -{\varvec{A}}({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)^{\top }\\&\quad =\tau _j^{-1}{\varvec{A}}{\varvec{\varOmega }}_i. \end{aligned}$$

This gives rise to the following joint distribution:

$$\begin{aligned}\left[ \begin{array}{c} {\varvec{Y}}_j\\ {\varvec{U}}_{ij} \end{array} \right] \bigg |(\gamma _j,\tau _j,Z_{ij}=1)\sim N_{p+q} \left( \left[ \begin{array}{c} {\varvec{\mu }}_i+{\varvec{\alpha }}_i\gamma _j\\ {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j \end{array} \right] , \tau _j^{-1}\left[ \begin{array}{cc} {\varvec{\varSigma }}_i &{} {\varvec{A}}{\varvec{\varOmega }}_i\\ {\varvec{\varOmega }}_i^{\top }{\varvec{A}}^{\top } &{} {\varvec{\varOmega }}_i \end{array} \right] \right) . \end{aligned}$$

We then have the following standard results:

$$\begin{aligned}&E({\varvec{U}}_{ij}\mid {\varvec{y}}_j,\gamma _j,\tau _j,Z_{ij}=1)\\&\quad =({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j)+\left( \tau _j^{-1}{\varvec{\varOmega }}_i^{\top }{\varvec{A}}^{\top }\right) (\tau _j^{-1}{\varvec{\varSigma }}_i)^{-1}({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)\\&\quad ={\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j), \end{aligned}$$

and

$$\begin{aligned} \text{ cov }({\varvec{U}}_{ij}\mid {\varvec{y}}_j,\gamma _j,\tau _j,Z_{ij}=1)= & {} \tau _j^{-1}{\varvec{\varOmega }}_i-\left( \tau _j^{-1}{\varvec{\varOmega }}_i^{\top }{\varvec{A}}^{\top }\right) (\tau _j^{-1}{\varvec{\varSigma }}_i)^{-1}(\tau _j^{-1}{\varvec{A}}{\varvec{\varOmega }}_i)\\= & {} \tau _j^{-1}\left( {\varvec{I}}_q-{\varvec{\beta }}_i^{\top }{\varvec{A}}\right) {\varvec{\varOmega }}_i, \end{aligned}$$

where \({\varvec{\beta }}_i={\varvec{\varSigma }}_i^{-1}{\varvec{A}}{\varvec{\varOmega }}_i\). Using the characterization of the multivariate normal distribution, we can obtain

$$\begin{aligned}&{\varvec{U}}_{ij}\mid ({\varvec{y}}_j,\gamma _j,\tau _j,Z_{ij}=1) \sim N_q\left( {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j),\tau _j^{-1}\left( {\varvec{I}}_q\right. \right. \\&\quad \left. -\,{\varvec{\beta }}_i^{\top }\left. {\varvec{A}}\right) {\varvec{\varOmega }}_i\right) . \end{aligned}$$

With similar arguments, we have

$$\begin{aligned}&f\left( {\varvec{y}}_j,\gamma _j,\tau _j\mid z_{ij}=1\right) \\&\quad =f\left( {\varvec{y}}_j\mid \gamma _j,\tau _j,z_{ij}=1\right) f\left( \gamma _j\mid \tau _j,z_{ij}=1\right) f\left( \tau _j\mid z_{ij}=1\right) \\&\quad =\frac{2|{\varvec{\varSigma }}_i|^{-\frac{1}{2}}\left( \frac{\nu _i}{2}\right) ^{\frac{\nu _i}{2}}}{\left( 2\pi \right) ^{\frac{p+1}{2}}{\varGamma }\left( \frac{\nu _i}{2}\right) }\tau _j^{\frac{p+\nu _i+1}{2}-1}\exp \left\{ -\frac{\tau _j}{2}\left[ \frac{\left( \gamma _j-h_i\right) ^2}{\sigma _i^2}+\delta _{ij}+\nu _i\right] \right\} ,\\&f\left( {\varvec{y}}_j,\tau _j\mid z_{ij}=1\right) =\frac{2|{\varvec{\varSigma }}_i|^{-\frac{1}{2}}\sigma _i\left( \frac{\nu _i}{2}\right) ^{\frac{\nu _i}{2}}}{\left( 2\pi \right) ^{\frac{p}{2}}{\varGamma }\left( \frac{\nu _i}{2}\right) }\tau _j^{\frac{p+\nu _i}{2}-1}\nonumber \\&\quad \exp \left\{ -\frac{\tau _j}{2}\left[ \delta _{ij}+\nu _i\right] \right\} {\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) ,\\&f\left( \gamma _j\mid {\varvec{y}}_j,\tau _j,z_{ij}=1\right) =\frac{f\left( {\varvec{y}}_j,\gamma _j,\tau _j\mid z_{ij}=1\right) }{f\left( {\varvec{y}}_j,\tau _j\mid z_{ij}=1\right) } =\frac{\phi \left( \gamma _j;h_i,\tau _j^{-1}\sigma _i^2\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }. \end{aligned}$$

Hence, it is trivial to establish that \(\gamma _j\mid ({\varvec{y}}_j,\tau _j,Z_{ij}=1) \sim TN(h_i,\tau _j^{-1}\sigma _i^2;(0,\infty ))\). Furthermore, standard calculation gives

$$\begin{aligned} f(\tau _j\mid {\varvec{y}}_j,Z_{ij}=1)= & {} \frac{f({\varvec{y}}_j,\tau _j\mid Z_{ij}=1)}{f({\varvec{y}}_j\mid Z_{ij}=1)}\\= & {} \frac{\tau _j^{\frac{\nu _i+p}{2}-1}}{{\varGamma }\left( \frac{\nu _i+p}{2}\right) }\left( \frac{\nu _i+\delta _{ij}}{2}\right) ^{\frac{\nu _i+p}{2}}\frac{{\varPhi }(\sqrt{\tau _j}M_{ij})}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\\&\times \exp \left\{ -\frac{\tau _j}{2}\left[ \delta _{ij}+\nu _i\right] \right\} \\= & {} \frac{{\varPhi }(\sqrt{\tau _j}M_{ij})}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) . \end{aligned}$$

Appendix B: Proof of Proposition 1

  1. (a)

    Standard calculation of conditional expectation yields

    $$\begin{aligned}&E(\tau _j\mid {\varvec{y}}_j,\,z_{ij}=1)=\int _{0}^{\infty }\tau _j f(\tau _j\mid {\varvec{y}}_j,\,z_{ij}=1)d\tau _j\nonumber \\&\quad =\int _{0}^{\infty }\tau _j\frac{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) g\left( \tau _j;\frac{\nu _i+p+2}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \frac{T\left( M_{ij}\sqrt{\frac{\nu _i+p+2}{\nu _i+\delta _{ij}}};\nu _i+p+2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$
    (B.1)
  2. (b)

    Because \(\gamma _j\mid ({\varvec{y}}_j,\tau _j,Z_{ij}=1) \sim TN(h_i,\tau _j^{-1}\sigma _i^2;(0,\infty ))\), we obtain

    $$\begin{aligned} E(\gamma _j\mid {\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)=h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }. \end{aligned}$$
    (B.2)
  3. (c)

    We first need to show

    $$\begin{aligned}&E\left( \tau _j^{\frac{k}{2}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right) \nonumber \\&\quad =\frac{1}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }\tau _j^{\frac{k}{2}}\phi \left( \sqrt{\tau _j}M_{ij}\right) g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{1}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\int _{0}^{\infty }\tau _j^{\frac{k-1}{2}}\phi \big (M_{ij};0,\,\tau _j^{-1}\big )g\left( \tau _j;\frac{\nu _i+p}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j\nonumber \\&\quad =\frac{{\varGamma }\left( \frac{\nu _i+p+k-1}{2}\right) \int _{0}^{\infty }\phi \left( M_{ij};0,\,\tau _j^{-1}\right) g\left( \tau _j;\frac{\nu _i+p+k-1}{2},\,\frac{\nu _i+\delta _{ij}}{2}\right) d\tau _j}{{\varGamma }\left( \frac{\nu _i+p}{2}\right) \left( \frac{\nu _i+\delta _{ij}}{2}\right) ^{\frac{k-1}{2}}T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\nonumber \\&\quad =\frac{{\varGamma }\left( \frac{\nu _i+p+k-1}{2}\right) \sqrt{\frac{\nu _i+p+k-1}{\nu _i+\delta _{ij}}}\,t\left( M_{ij}\sqrt{\frac{\nu _i+p+k-1}{\nu _i+\delta _{ij}}};\nu _i+p+k-1\right) }{{\varGamma }\left( \frac{\nu _i+p}{2}\right) \left( \frac{\nu _i+\delta _{ij}}{2}\right) ^{\frac{k-1}{2}}T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$
    (B.3)

    Applying the result in (B.3) with \(k=-\,1\) and (B.2) yields

    $$\begin{aligned}&E(\gamma _j\mid {\varvec{y}}_j,\,z_{ij}=1) =E[E(\gamma _j\mid {\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)\mid {\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E\left[ h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij}+\sigma _iE\left( \frac{1}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right) \nonumber \\&\quad =h_{ij}+\frac{\sigma _i}{\sqrt{\frac{\nu _i+p-2}{\nu _i+\delta _{ij}}}}\frac{t\left( M_{ij}\sqrt{\frac{\nu _i+p-2}{\nu _i+\delta _{ij}}};\nu _i+p-2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$
    (B.4)
  4. (d)

    Using (B.1), (B.2) and (B.3) with \(k=1\), we have

    $$\begin{aligned}&E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1) =E[E(\tau _j\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E[\tau _jE(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1]\nonumber \\&\quad =E\left[ \tau _j\left( h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\right) \bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij} E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)+\sigma _i E\left[ \sqrt{\tau _j}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\bigg |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =h_{ij}\left[ \frac{\nu _i+p}{\nu _i+\delta _{ij}}\frac{T\left( M_{ij}\sqrt{\frac{\nu _i+p+2}{\nu _i+\delta _{ij}}};\nu _i+p+2\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\right] \nonumber \\&\qquad +\,\sigma _i\left[ \sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}}\frac{t\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\right] . \end{aligned}$$
    (B.5)
  5. (e)

    Using the result of (B.2), the second moment of a truncated normal distribution is given by

    $$\begin{aligned} E\left( \gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right)= & {} h_{ij}E(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)+\frac{\sigma _i^2}{\tau _j}\nonumber \\= & {} h_{ij}\left( h_{ij}+\frac{\sigma _i}{\sqrt{\tau _j}}\frac{\phi \left( \sqrt{\tau _j}M_{ij}\right) }{{\varPhi }\left( \sqrt{\tau _j}M_{ij}\right) }\right) +\frac{\sigma _i^2}{\tau _j}. \end{aligned}$$
    (B.6)
  6. (f)

    Applying the double expectation and using (B.5) and (B.6), we have

    $$\begin{aligned} E\left( \tau _j\gamma _j^2|{\varvec{y}}_j,\,z_{ij}=1\right)= & {} E\left[ E\left( \tau _j\gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\= & {} E\left[ \tau _jE\left( \gamma _j^2|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\= & {} E\left\{ \tau _j\left[ h_{ij}E(\gamma _j|{\varvec{y}}_j,\,\tau _j,\,z_{ij}=1)+\tau _j^{-1}\sigma _i^2\right] |{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\= & {} h_{ij} E\left( \tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1\right) +\sigma _i^2. \end{aligned}$$
    (B.7)
  7. (g)

    Applying the double expectation and the result of (B.4), we have

    $$\begin{aligned}&E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)=E\left[ E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad ={\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)+({\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i)E(\gamma _j|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$
    (B.8)
  8. (h)

    Applying the double expectation and using (B.1) and (B.5), we have

    $$\begin{aligned}&E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)=E\left[ E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ \tau _j E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left\{ \tau _j\left[ {\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)\right] |{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\&\quad =\left[ {\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)\right] E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)\nonumber \\&\qquad +\,\left( {\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i\right) E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$
    (B.9)
  9. (i)

    Applying the double expectation and using (B.5) and (B.7), we have

    $$\begin{aligned}&E\left( \tau _j\gamma _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1\right) \\&\quad =E\left[ E(\tau _j\gamma _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \nonumber \\&\quad =E\left[ \tau _j\gamma _j E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\left\{ \tau _j\gamma _j[{\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^\top ({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j)]|{\varvec{y}}_j,\,z_{ij}=1\right\} \nonumber \\&\quad =\left[ {\varvec{\xi }}_i+{\varvec{\beta }}_i^\top ({\varvec{y}}_j-{\varvec{\mu }}_i)\right] E(\tau _j\gamma _j|{\varvec{y}}_j,\,z_{ij}=1)\\&\qquad +\,({\varvec{\lambda }}_i-{\varvec{\beta }}_i^\top {\varvec{\alpha }}_i)E(\tau _j\gamma _j^2|{\varvec{y}}_j,\,z_{ij}=1). \end{aligned}$$
  10. (j)

    Applying the double expectation and using (B.8) and (B.9), we have

    $$\begin{aligned}&E\left( \tau _j{\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,z_{ij}=1\right) =E\left[ E\left( \tau _j{\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1\right) |{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\left[ \tau _jE({\varvec{U}}_{ij}{\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)|{\varvec{y}}_j,\,z_{ij}=1\right] \\&\quad =E\big \{\tau _j[E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)E({\varvec{U}}_{ij}^{\top }|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)\\&\qquad +\,\text{ cov }({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)]|{\varvec{y}}_j,\,z_{ij}=1\big \}\\&\quad =E\big \{\tau _j[E({\varvec{U}}_{ij}|{\varvec{y}}_j,\,\gamma _j,\,\tau _j,\,z_{ij}=1)({\varvec{\xi }}_i+{\varvec{\lambda }}_i\gamma _j+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i-{\varvec{\alpha }}_i\gamma _j))^{\top }\\&\qquad +\,\tau _j^{-1}({\varvec{I}}_q-{\varvec{\beta }}_i^{\top }{\varvec{A}}){\varvec{\varOmega }}_i]|{\varvec{y}}_j,\,z_{ij}=1\big \}\\&\quad =E(\gamma _j\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)({\varvec{\lambda }}_i-{\varvec{\beta }}_i^{\top }{\varvec{\alpha }}_i)^{\top }\\&\qquad +\,E(\tau _j{\varvec{U}}_{ij}|{\varvec{y}}_j,\,z_{ij}=1)[{\varvec{\xi }}_i+{\varvec{\beta }}_i^{\top }({\varvec{y}}_j-{\varvec{\mu }}_i)]^{\top }+({\varvec{I}}_q-{\varvec{\beta }}_i^{\top }{\varvec{A}}){\varvec{\varOmega }}_i. \end{aligned}$$
  11. (k)

    It is known that \(\int _0^{\infty }f(\tau _j|{\varvec{y}}_j,\,Z_{ij}=1)d\tau _j=1\), that is,

    $$\begin{aligned} \int _0^{\infty }\frac{{\varPhi }\big (\sqrt{\tau _j}M_{ij}\big )}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }\frac{{(\frac{\nu _i+\delta _{ij}}{2})}^{(\frac{\nu _i+p}{2})}}{{\varGamma }(\frac{\nu _i+p}{2})}\exp \left\{ -\frac{\nu _i+\delta _{ij}}{2}\tau _j\right\} d\tau _j =1. \end{aligned}$$

    Then

    $$\begin{aligned} \frac{d}{d\nu _i}\int _0^{\infty }b_j{\varPhi }\big (\sqrt{\tau _j}M_{ij}\big )\exp \left\{ -\frac{\nu _i+\delta _{ij}}{2}\tau _j\right\} d\tau _j=0, \end{aligned}$$

    where

    $$\begin{aligned} b_j=\frac{{\left( \frac{\nu _i+\delta _{ij}}{2}\right) }^{\left( \nu _i+p\right) /2}}{{\varGamma }\left( \frac{\nu _i+p}{2}\right) T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$

    By Leibnitz’s rule, we can obtain

    $$\begin{aligned}&E(\log \tau _j|{\varvec{y}}_j,\,z_{ij}=1)-E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)+\log \left( \frac{\nu _i+\delta _{ij}}{2}\right) +\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \\&\quad -\,\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) -\frac{\int _{-\infty }^{M_{ij}}t\left( x;0,\frac{\nu _i+\delta _{ij}}{\nu _i+p},\nu _i+p\right) f_{\nu _i}(x)dx}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }=0, \end{aligned}$$

    where

    $$\begin{aligned} f_{\nu _i}(x)= & {} \mathrm{DG}\left( \frac{\nu _i+p+1}{2}\right) -\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) -\frac{1}{\pi (\nu _i+\delta _{ij})}\nonumber \\&-\,\log \left( 1+\frac{x^2}{\nu _i+\delta _{ij}}\right) +\frac{(\nu _i+p+1)x^2}{(\nu _i+\delta _{ij})(x^2+\nu _i+\delta _{ij})}. \end{aligned}$$
    (B.10)

    It follows that

    $$\begin{aligned} E(\log \tau _j|{\varvec{y}}_j,\,z_{ij}=1)= & {} E(\tau _j|{\varvec{y}}_j,\,z_{ij}=1)-\log \left( \frac{\nu _i+\delta _{ij}}{2}\right) -\left( \frac{\nu _i+p}{\nu _i+\delta _{ij}}\right) \\&+\,\mathrm{DG}\left( \frac{\nu _i+p}{2}\right) +\,\frac{\int _{-\infty }^{M_{ij}}t\left( x;0,\frac{\nu _i+\delta _{ij}}{\nu _i+p},\nu _i+p\right) f_{\nu _i}(x)dx}{T\left( M_{ij}\sqrt{\frac{\nu _i+p}{\nu _i+\delta _{ij}}};\nu _i+p\right) }. \end{aligned}$$

Appendix C: Proof of CM-steps

  1. (a)

    By the Lagrange multiplier method, we define

    $$\begin{aligned} L(\pi _i,\lambda )=Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})-\lambda \left( \sum _{i=1}^g\pi _i-1\right) , \end{aligned}$$

    and then take partial derivatives, yielding

    $$\begin{aligned} \frac{\partial L(\pi _i,\lambda )}{\partial \pi _i} = \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\frac{1}{\pi _i}-\lambda =0,\quad \text{ and } \quad \frac{\partial L(\pi _i,\lambda )}{\partial \lambda } = -\left( \sum _{i=1}^g\pi _i-1\right) =0. \end{aligned}$$

    Since \(\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}=n\), we obtain \(\hat{\pi }_i^{(k+1)}=\sum _{j=1}^n{\hat{z}}_{ij}^{(k)}/n\).

  2. (b)

    Differentiating \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\xi }}_i\) leads to

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\xi }}_i}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{\xi }}_i}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}\mathrm{tr}\bigg [-{\hat{{\varvec{\eta }}}_{ij}}^{(k)}{\varvec{\xi }}_i^\top -{\varvec{\xi }}_i{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }\\&+\,{\varvec{\xi }}_i{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i^\top +{\varvec{\xi }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i^\top +{\varvec{\lambda }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i^\top \bigg ]\\= & {} \mathrm{tr}\bigg \{{\varvec{\varOmega }}_i^{-1}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ {\hat{{\varvec{\eta }}}_{ij}}^{(k)}-{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i\right] \bigg \}. \end{aligned}$$

    Moreover, the partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\lambda }}_i\) is

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\lambda }}_i}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{\lambda }}_i}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}\mathrm{tr}\bigg [-{\hat{{\varvec{\zeta }}}_{ij}}^{(k)}{\varvec{\lambda }}_i^\top +{\varvec{\xi }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i^\top \\&-\,{\varvec{\lambda }}_i{\hat{{\varvec{\zeta }}}_{ij}}^{(k)^\top }+{\varvec{\lambda }}_i{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i^\top +{\varvec{\lambda }}_i{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i^\top \bigg ]\\= & {} \mathrm{tr}\bigg \{{\varvec{\varOmega }}_i^{-1}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ {\hat{{\varvec{\zeta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i-{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i\right] \bigg \}. \end{aligned}$$

    Solving the above two equations, we get

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\xi }}_i}= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}({\hat{{\varvec{\eta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i)-\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i=\mathbf 0, \end{aligned}$$
    (C.1)
    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\lambda }}_i}= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}({\hat{{\varvec{\zeta }}}_{ij}}^{(k)}-{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i)-\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{\varOmega }}_i^{-1}{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i=\mathbf 0. \end{aligned}$$
    (C.2)

    After rearrangement, (C.1) and (C.2) can be rewritten as

    $$\begin{aligned} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{\tau }_{ij}}^{(k)}{\varvec{\xi }}_i+\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{1ij}}^{(k)}{\varvec{\lambda }}_i= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\eta }}}_{ij}}^{(k)},\\ \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{1ij}}^{(k)}{\varvec{\xi }}_i+\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{{\hat{s}}_{2ij}}^{(k)}{\varvec{\lambda }}_i= & {} \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\zeta }}}_{ij}}^{(k)}. \end{aligned}$$

    Using Cramer’s law, the solutions of the two linear equations are

    $$\begin{aligned} \hat{{\varvec{\xi }}}_i^{\left( k+1\right) }= \frac{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\eta }}}_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\zeta }}}_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) }{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) ^2}, \end{aligned}$$

    and

    $$\begin{aligned} \hat{{\varvec{\lambda }}}_i^{\left( k+1\right) }=\frac{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\zeta }}}_{ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{{\varvec{\eta }}}_{ij}}^{\left( k\right) }\right) }{ \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{\hat{\tau }_{ij}}^{\left( k\right) }\right) \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{2ij}}^{\left( k\right) }\right) - \left( \sum _{j=1}^n{{\hat{z}}_{ij}}^{\left( k\right) }{{\hat{s}}_{1ij}}^{\left( k\right) }\right) ^2}. \end{aligned}$$
  3. (c)

    The partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{A}}\) is

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{A}}}= & {} -\frac{1}{2}\frac{\partial }{\partial {\varvec{A}}}\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\mathrm{tr}\bigg (-{\varvec{D}}_i^{-1}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }{\varvec{A}}^\top \nonumber \\&-\,{\varvec{D}}_i^{-1}{\varvec{A}}{\hat{{\varvec{\eta }}}_{ij}}^{(k)}{\varvec{y}}_j^\top +{\varvec{D}}_i^{-1}{\varvec{A}}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}{\varvec{A}}^\top \bigg )\nonumber \\= & {} \mathrm{tr}\left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{D}}_i^{-1}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }-\sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{D}}_i^{-1}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}{\varvec{A}}\right) . \end{aligned}$$
    (C.3)

    Equating (C.3) to the zero matrix, we have

    $$\begin{aligned} \hat{{\varvec{A}}}^{(k+1)}=\left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\varvec{y}}_j{\hat{{\varvec{\eta }}}_{ij}}^{(k)^\top }\right) \left( \sum _{i=1}^g\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varPsi }}}_{ij}}^{(k)}\right) ^{-1}. \end{aligned}$$
  4. (d)

    The partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{\varOmega }}_i\) is

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{\varOmega }}_i^{-1}}= & {} \frac{1}{2}\frac{\partial }{\partial {\varvec{\varOmega }}_i^{-1}}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left\{ \log |{\varvec{\varOmega }}_i^{-1}|-\mathrm{tr}\left( {\varvec{\varOmega }}_i^{-1}{\varvec{\varLambda }}_{ij}\right) \right\} \nonumber \\= & {} \frac{1}{2}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\Big [2{\varvec{\varOmega }}_i-\mathrm{Diag}\{{\varvec{\varOmega }}_i\}-\big (2{\varvec{\varLambda }}_{ij}-\mathrm{Diag}\{{\varvec{\varLambda }}_{ij}\}\big )\Big ]. \end{aligned}$$
    (C.4)

    Equating (C.4) to the zero vector gives

    $$\begin{aligned} {\hat{{\varvec{\varOmega }}}_i}^{(k+1)}=\frac{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varLambda }}}_{ij}}^{(k+1)}}{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}}. \end{aligned}$$
  5. (e)

    Taking the partial derivative of \(Q({\varvec{\varTheta }}\mid \hat{{\varvec{\varTheta }}}^{(k)})\) with respect to \({\varvec{D}}_i\) yields

    $$\begin{aligned} \frac{\partial Q}{\partial {\varvec{D}}_i^{-1}}= & {} \frac{1}{2}\frac{\partial }{\partial {\varvec{D}}_i^{-1}}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}\left[ \log |{\varvec{D}}_i^{-1}|-\mathrm{tr}\left( {\varvec{D}}_i^{-1}{\varvec{\varUpsilon }}_{ij}\right) \right] \nonumber \\= & {} \frac{1}{2}\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}({\varvec{D}}_i-{\varvec{\varUpsilon }}_{ij}). \end{aligned}$$
    (C.5)

    We have the following estimator

    $$\begin{aligned} {\hat{{\varvec{D}}}_i}^{(k+1)}=\frac{\mathrm{Diag}\left\{ \sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}{\hat{{\varvec{\varUpsilon }}}_{ij}}^{(k+1)}\right\} }{\sum _{j=1}^n{{\hat{z}}_{ij}}^{(k)}} \end{aligned}$$

    obtained by equating (C.5) to the zero matrix.

Appendix D: Parameter estimation for the MCghstFA model using the ECM algorithm

Table 5 Comparison of some characterizations between the MCrstFA and MCghstFA models

According to Table 5, the MCghstFA model admits a three-level hierarchy:

$$\begin{aligned} \left[ \begin{array}{c}{\varvec{Y}}_{j} \\ {\varvec{U}}_{ij} \end{array}\right] \Big | (W_{j}, Z_{ij}=1)\sim & {} N_{p+q} \left( \left[ \begin{array}{c}{\varvec{A}} {\varvec{\xi }}_i + W_{j}{\varvec{A}}{\varvec{\lambda }}_i \\ {\varvec{\xi }}_i + W_{j}{\varvec{\lambda }}_i \end{array}\right] , W_{j}\left[ \begin{array}{cc}{\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}+{\varvec{D}} &{} {\varvec{A}}{\varvec{{\varOmega }}}_i \\ {\varvec{{\varOmega }}}_i{\varvec{A}}^\top &{} {\varvec{{\varOmega }}}_i \end{array}\right] \right) , \nonumber \\ W_{j}\mid Z_{ij}=1\sim & {} {\varGamma }^{-1}\left( \frac{\nu _i}{2}, \frac{\nu _i}{2}\right) ,\nonumber \\ {\varvec{Z}}_j\sim & {} {\mathscr {M}}(1;\pi _1,\ldots ,\pi _g). \end{aligned}$$
(D.1)

From (D.1), it can be verified that

$$\begin{aligned} {\varvec{Y}}_j \mid (W_{j}, Z_{ij}=1) \sim N_p \big ({\varvec{A}} {\varvec{\xi }}_i + W_{j}{\varvec{A}}{\varvec{\lambda }}_i, W_{j}({\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}+{\varvec{D}})\big ), \end{aligned}$$

and

$$\begin{aligned} {\varvec{U}}_{ij}\mid ({\varvec{y}}_j,W_j,Z_{ij}=1) \sim N_q({\varvec{\mu }}_{2\cdot 1},{\varvec{{\varSigma }}}_{22\cdot 1}), \end{aligned}$$
(D.2)

where \({\varvec{\mu }}_{2\cdot 1}={\varvec{\xi }}_i+W_j{\varvec{\lambda }}_i+ {\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }({\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i-W_j{\varvec{A}}{\varvec{\lambda }}_i)\) and \({\varvec{{\varSigma }}}_{22\cdot 1}=W_j({\varvec{{\varOmega }}}_i-{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }({\varvec{A}}{\varvec{{\varOmega }}}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}{\varvec{A}}{\varvec{{\varOmega }}}_i)= W_j({\varvec{{\varOmega }}}^{-1}_i-{\varvec{A}}^{\top }{\varvec{D}}^{-1}{\varvec{A}})^{-1}\).

A positive random variable X is said to follow the Generalized Inverse Gaussian (GIG) distribution (Good 1953), denoted by \(W\sim \mathrm{GIG}(\psi ,\chi ,r)\), if it has the pdf

$$\begin{aligned} f_{GIG}(w;\psi ,\chi ,r)=\frac{\chi ^{-r}(\sqrt{\chi \psi })^{r} }{2K_{r}(\sqrt{\chi \psi }) }w^{q-1}\exp \left\{ -\frac{1}{2}(\chi w^{-1}+\psi w) \right\} , \end{aligned}$$
(D.3)

where \(\psi ,\chi \in {\mathbb {R}}^+\), \(r\in {\mathbb {R}}\), and \(K_q\) is the modified Bessel function of the third kind with index r. Some particular moments of the GIG distribution have tractable forms, for instance,

$$\begin{aligned} E(W^a)= (\chi /\psi )^{a/2} \frac{K_{r+a}(\sqrt{\psi \chi })}{K_r(\sqrt{\psi \chi })},~a\in {\mathbb {R}}, \end{aligned}$$
(D.4)

and

$$\begin{aligned} E(\log W)= \log (\chi /\psi )^{1/2} +\frac{K^{\prime }_{r}(\sqrt{\psi \chi })}{K_r(\sqrt{\psi \chi })}, \end{aligned}$$
(D.5)

where

$$\begin{aligned} K'_{r}(x)=\frac{dK_{r}(x)}{dr}=\frac{1}{2}\int _{0}^{\infty }\log (y)y^{r-1}\exp \left\{ -\frac{x}{2}\left( y+\frac{1}{y} \right) \right\} d y, \quad x>0.\nonumber \\ \end{aligned}$$
(D.6)

By Bayes’ Theorem, the conditional pdf of \(W_j\) given \({\varvec{y}}_j\) can be written as

$$\begin{aligned} f(w_j\mid {\varvec{y}}_j,Z_{ij}=1)\propto & {} w_j^{-(\nu _i+p)/2-1} \exp \Big \{-\frac{1}{2}\Big [(\nu _i+{\varDelta }_{ij})w_j^{-1}\\&+\,({\varvec{\lambda }}^{\top }_i{\varvec{A}}^{\top }\big ({\varvec{A}}{\varvec{\varOmega }}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}{\varvec{A}}{\varvec{\lambda }}_i\big )w_j\Big ]\Big \}, \end{aligned}$$

where \({\varDelta }_{ij}=({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i)^{\top }({\varvec{A}}{\varvec{\varOmega }}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}({\varvec{y}}_j-{\varvec{A}}{\varvec{\xi }}_i)\). It follows from (D.3) that

$$\begin{aligned} W_j\mid ({\varvec{y}}_j,Z_{ij}=1)\sim \mathrm{GIG}\left( {\varvec{\lambda }}^{\top }_i{\varvec{A}}^{\top }({\varvec{A}}{\varvec{\varOmega }}_i{\varvec{A}}^{\top }+{\varvec{D}})^{-1}{\varvec{A}}{\varvec{\lambda }}_i,\nu _i+{\varDelta }_{ij},-\frac{\nu _i+p}{2}\right) . \end{aligned}$$

Alternatively, the MCghstFA model can be represented by a four-level hierarchy:

$$\begin{aligned} {\varvec{Y}}_{j} \mid ({\varvec{U}}_{ij}, W_{j}, Z_{ij}=1)\sim & {} N_p({\varvec{A}}{\varvec{U}}_{ij}, W_{j}{\varvec{D}}),\nonumber \\ {\varvec{U}}_{ij} \mid (W_{j}, Z_{ij}=1)\sim & {} N_q({\varvec{\xi }}_i + W_{j} {\varvec{\lambda }}_i, W_{j}{\varvec{{\varOmega }}}_i),\nonumber \\ W_{j}\mid Z_{ij}=1\sim & {} {\varGamma }^{-1}\left( \frac{\nu _i}{2}, \frac{\nu _i}{2}\right) ,\nonumber \\ {\varvec{Z}}_j\sim & {} {\mathscr {M}}(1;\pi _1,\ldots ,\pi _g). \end{aligned}$$
(D.7)

From (D.7), the complete-data log-likelihood function for \({\varvec{\varTheta }}\) on the basis of \({\varvec{Y}}_{c}=\{{\varvec{y}}_{j},{\varvec{U}}_{ij},W_{j},{\varvec{Z}}_j\}^{n}_{j=1}\), for \(i=1,\ldots ,g\), is given by

$$\begin{aligned}&\ell _c({\varvec{{\varTheta }}}\mid {\varvec{Y}}_{c}) = \sum ^{g}_{i=1} \sum ^{n}_{j=1} Z_{ij} \log \bigg \{ \pi _i \log \phi _p({\varvec{y}}_{j}\mid {\varvec{A}}{\varvec{U}}_{ij}, W_{j}{\varvec{D}}) \phi _p({\varvec{U}}_{ij}\mid {\varvec{\xi }}_i \nonumber \\&\qquad + \,W_{j} {\varvec{\lambda }}_i, W_{j}{\varvec{{\varOmega }}}_i)\times \, f\left( w_{j}\mid \frac{\nu _i}{2},\frac{\nu _i}{2}\right) \bigg \}\nonumber \\&\quad =\sum ^{g}_{i=1} \sum ^{n}_{j=1} Z_{ij}\bigg \{\log \pi _i-\frac{1}{2}\log |W_j{\varvec{D}}|-\frac{1}{2}\log |W_j{\varvec{{\varOmega }}}_i|\nonumber \\&\qquad -\,\frac{W^{-1}_j}{2}\Big [({\varvec{y}}_j-{\varvec{A}}{\varvec{U}}_{ij})^{\top }{\varvec{D}}^{-1}({\varvec{y}}_j-{\varvec{A}}{\varvec{U}}_{ij})\nonumber \\&\qquad +\,({\varvec{U}}_{ij}-{\varvec{\xi }}_{i}-W_j {\varvec{\lambda }}_{i})^{\top }{\varvec{{\varOmega }}}^{-1}_i({\varvec{U}}_{ij}-{\varvec{\xi }}_{i}-W_j {\varvec{\lambda }}_{i})\Big ]\nonumber \\&\qquad + \,\frac{\nu _i}{2}\log \left( \frac{\nu _i}{2}\right) - \log {\varGamma }\left( \frac{\nu _i}{2}\right) -\left( \frac{\nu _i}{2}+1\right) \log W_{j} -\frac{\nu _i}{2W_{j}}\bigg \}. \end{aligned}$$
(D.8)

To evaluate the expected value of (D.8), called the Q function, we first calculate

$$\begin{aligned} {\hat{z}}_{ij}=E\left( Z_{ij}\mid {\varvec{y}}_{j},\hat{{\varvec{\varTheta }} }\right) =\frac{\hat{\pi }_{i}\zeta \left( {\varvec{y}}_{j};\hat{{\varvec{A}}}\hat{{\varvec{\xi }}}_i,\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}}}, \hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i,\hat{\nu }_i\right) }{f({\varvec{y}}_j; \hat{{\varvec{\varTheta }}})}, \end{aligned}$$
(D.9)

which is the posterior probability of \({\varvec{y}}_j\) belonging to the ith component of the mixture. In addition, we utilize the results (D.4) and (D.5) to calculate of the following conditional expectations:

$$\begin{aligned} {\hat{s}}_{1ij}= & {} E(W_{j} \mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}})\nonumber \\= & {} \left( \frac{\hat{\nu }_i+\hat{{\varDelta }}_{ij}}{\hat{{\varvec{\lambda }}}^{\top }_i\hat{{\varvec{A}}}^{\top }(\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}})}^{-1}\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i}\right) ^{1/2} \frac{K_{-\frac{(\hat{\nu }+p)}{2}+1}(\hat{\omega }_{ij})}{K_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})},\nonumber \\ {\hat{s}}_{2ij}= & {} E(W_{j}^{-1} \mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}})\nonumber \\= & {} \left( \frac{\hat{\nu }_i+\hat{{\varDelta }}_{ij}}{\hat{{\varvec{\lambda }}}^{\top }_i\hat{{\varvec{A}}}^{\top }(\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}})}^{-1}\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i}\right) ^{-1/2} \frac{K_{-\frac{(\hat{\nu }+p)}{2}-1}(\hat{\omega }_{ij})}{K_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})},\nonumber \\ {\hat{s}}_{3ij}= & {} E(\log W_{j} \mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}})\nonumber \\= & {} \log \left( \frac{\hat{\nu }_i+\hat{{\varDelta }}_{ij}}{\hat{{\varvec{\lambda }}}^{\top }_i\hat{{\varvec{A}}}^{\top } (\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}})}^{-1}\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i}\right) ^{1/2} +\frac{K^{\prime }_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})}{K_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})}, \end{aligned}$$
(D.10)

where \(\hat{\omega }_{ij}=\sqrt{(\hat{\nu }_i+\hat{{\varDelta }}_{ij})\hat{{\varvec{\lambda }}}^{\top }_i\hat{{\varvec{A}}}^{\top }(\hat{{\varvec{A}}}\hat{{\varvec{\varOmega }}}_i\hat{{\varvec{A}}}^{\top }+\hat{{\varvec{D}}})^{-1}\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i}\) and \(K^{\prime }_{-\frac{(\hat{\nu }+p)}{2}}(\hat{\omega }_{ij})\) is evaluated via (D.6). By (D.2), we obtain

$$\begin{aligned} \hat{{\varvec{u}}}_{ij}= & {} E\left( {\varvec{U}}_{ij} \mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}}\right) =E\left[ E({\varvec{U}}_{ij} \mid {\varvec{y}}_{j}, W_j,Z_{ij}=1)\mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}} \right] \nonumber \\= & {} \hat{{\varvec{\xi }}}_i+{\hat{s}}_{1ij}\hat{{\varvec{\lambda }}}_i+\hat{{\varvec{\gamma }}}^{\top }_i({\varvec{y}}_j-\hat{{\varvec{A}}}\hat{{\varvec{\xi }}}_i-{\hat{s}}_{1ij} \hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i), \end{aligned}$$
(D.11)
$$\begin{aligned} \hat{{\varvec{\eta }}}_{ij}= & {} E\left( W^{-1}_j{\varvec{U}}_{ij} \mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}}\right) \nonumber \\= & {} E\left[ W^{-1}_jE({\varvec{U}}_{ij} \mid {\varvec{y}}_{j}, W_j,Z_{ij}=1)\mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}} \right] \nonumber \\= & {} \hat{{\varvec{\lambda }}}_i-\hat{{\varvec{\gamma }}}^{\top }_i\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i+{\hat{s}}_{2ij}\Big (\hat{{\varvec{\xi }}}_i+\hat{{\varvec{\gamma }}}^{\top }_i({\varvec{y}}_j-\hat{{\varvec{A}}}\hat{{\varvec{\xi }}}_i)\Big ), \end{aligned}$$
(D.12)

and

$$\begin{aligned} \hat{{\varvec{{\varPsi }}}}_{ij}= & {} E\left( W^{-1}_j{\varvec{U}}_{ij} {\varvec{U}}^{\top }_{ij}\mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}}\right) \nonumber \\= & {} E\left[ W^{-1}_jE({\varvec{U}}_{ij}{\varvec{U}}^{\top }_{ij} \mid {\varvec{y}}_{j}, W_j,Z_{ij}=1)\mid {\varvec{y}}_{j}, Z_{ij}=1, \hat{{\varvec{{\varTheta }}}} \right] \nonumber \\= & {} E\left[ W^{-1}_j\left[ E({\varvec{U}}_{ij}\mid \right. {\varvec{y}}_{j},\right. W_j,Z_{ij}=1)E({\varvec{U}}^{\top }_{ij}\mid {\varvec{y}}_{j}, W_j,Z_{ij}=1)\nonumber \\&+\,\mathrm{cov}\left. ({\varvec{U}}_{ij}\mid {\varvec{y}}_{j}, W_j,Z_{ij}=1)\right] \mid {\varvec{y}}_{j}, \left. Z_{ij}=1, \hat{{\varvec{{\varTheta }}}} \right] \nonumber \\= & {} \hat{{\varvec{\eta }}}_{ij}\Big (\hat{{\varvec{\xi }}}_i+\hat{{\varvec{\gamma }}}^{\top }_i({\varvec{y}}_j-\hat{{\varvec{A}}}\hat{{\varvec{\xi }}}_i)\Big )^{\top } +\hat{{\varvec{u}}}_{ij}\left( \hat{{\varvec{\lambda }}}_i-\hat{{\varvec{\gamma }}}^{\top }_i\hat{{\varvec{A}}}\hat{{\varvec{\lambda }}}_i\right) ^{\top }\nonumber \\&+\left( \hat{{\varvec{{\varOmega }}}}^{-1}_i+\,\hat{{\varvec{A}}}^{\top }\hat{{\varvec{D}}}^{-1}\hat{{\varvec{A}}}\right) ^{-1}, \end{aligned}$$
(D.13)

where \(\hat{{\varvec{\gamma }}}_i = (\hat{{\varvec{A}}} \hat{{\varvec{{\varOmega }}}}_i \hat{{\varvec{A}}}^{\top } + \hat{{\varvec{D}}})^{-1}\hat{{\varvec{A}}}\hat{{\varvec{{\varOmega }}}}_i\).

After some algebraic manipulations, the resulting Q function that gets rid of the constants is given by

$$\begin{aligned} Q({\varvec{{\varTheta }}}\mid \hat{{\varvec{{\varTheta }}}})= & {} \sum ^{g}_{i=1}\sum ^{n}_{j=1}{\hat{z}}_{ij}\Bigg \{\log \pi _i -\frac{1}{2}\log |{\varvec{D}}| -\frac{1}{2}\log |{\varvec{{\varOmega }}}_i| -\frac{1}{2} \mathrm{tr}\Big ({\varvec{{\varvec{D}}}}^{-1}\Big [{\hat{s}}_{2ij}{\varvec{y}}_j{\varvec{y}}^{\top }_j\nonumber \\&-\,{\varvec{y}}_j\hat{{\varvec{\eta }}}_{ij}^{\top }{\varvec{A}}^{\top } -{\varvec{A}}\hat{{\varvec{\eta }}}_{ij}{\varvec{y}}^{\top }_j+{\varvec{A}}\hat{{\varvec{\varPsi }}}_{ij}{\varvec{A}}^{\top }\Big ]\Big )\nonumber \\&-\,\frac{1}{2}\mathrm{tr}\bigg ({\varvec{{\varvec{{\varOmega }}}}}^{-1}_i\Big [\hat{{\varvec{{\varPsi }}}}_{ij}-\hat{{\varvec{\eta }}}_{ij}{\varvec{\xi }}_{i}^{\top }-{\varvec{\xi }}_{i}\hat{{\varvec{\eta }}}_{ij}^{\top }\nonumber \\&+\,{\hat{s}}_{2ij}{\varvec{\xi }}_{i}{\varvec{\xi }}_{i}^{\top }+{\hat{s}}_{1ij}{\varvec{\lambda }}_i{\varvec{\lambda }}^{\top }_i-(\hat{{\varvec{u}}}_{ij}-{\varvec{\xi }}_i){\varvec{\lambda }}^{\top }_i-{\varvec{\lambda }}_i(\hat{{\varvec{u}}}_{ij}-{\varvec{\xi }}_i)^{\top }\Big ]\bigg )\nonumber \\&+ \,\left( \frac{\nu _i}{2}\right) \log \left( \frac{\nu _i}{2}\right) - \frac{\nu _i}{2}{\hat{s}}_{2ij}-\frac{\nu _i}{2}{\hat{s}}_{3ij}- \log {\varGamma }\left( \frac{\nu _i}{2}\right) \Bigg \}. \end{aligned}$$
(D.14)

Taking partial derivatives of (D.14) with respect to \({\varvec{\xi }}_i\) and \({\varvec{\lambda }}_i\) and equating them to zero vectors yield

$$\begin{aligned}&\sum ^{n}_{j=1}{\hat{z}}_{ij}\left( \hat{{\varvec{\eta }}}_{ij}-{\hat{s}}_{2ij}{\varvec{\xi }}_{i}-{\varvec{\lambda }}_{i}\right) ={\varvec{0}},\end{aligned}$$
(D.15)
$$\begin{aligned}&\sum ^{n}_{j=1}{\hat{z}}_{ij}\left( \hat{{\varvec{u}}}_{ij}-{\varvec{\xi }}_{i}-{\hat{s}}_{1ij}{\varvec{\lambda }}_{i}\right) ={\varvec{0}}. \end{aligned}$$
(D.16)

In summary, the ECM algorithm for estimating the parameters of MCghstFA proceeds as follows:

  1. E-step:

    Given the current value \({\varvec{\varTheta }}=\hat{{\varvec{\varTheta }}}\), compute \({\hat{z}}_{ij}\), \({\hat{s}}_{1ij}\), \({\hat{s}}_{2ij}\), \({\hat{s}}_{3ij}\), \(\hat{{\varvec{u}}}_{ij}\), \(\hat{{\varvec{\eta }}}_{ij}\) and \(\hat{{\varvec{{\varPsi }}}}_{ij}\) as defined in (D.9)–(D.13) for \(i=1,\ldots ,g\) and \(j=1,\ldots ,n\).

  2. CM step 1:

    Maximizing (D.14) with respect to \(\pi _i\) and using the Lagrange multiplier method, this gives \(\hat{\pi }_i={\hat{n}}_i/n\), where \({\hat{n}}_i=\sum _{j=1}^n {\hat{z}}_{ij}\).

  3. CM step 2:

    Update parameters \({\varvec{\xi }}_i\) and \({\varvec{\lambda }}_i\) by solving simultaneous Eqs. (D.15) and (D.16). Simple matrix algebra yields

    $$\begin{aligned} \hat{{\varvec{\xi }}}_i= & {} \frac{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{\eta }}}_{ij}\big )- {\hat{n}}_i(\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{u}}}_{ij}\big )}{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )-{\hat{n}}^2_i} \end{aligned}$$

    and

    $$\begin{aligned} \hat{{\varvec{\lambda }}}_i= & {} \frac{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{u}}}_{ij}\big )- {\hat{n}}_i(\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{\eta }}}_{ij}\big )}{\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{1ij}\big )\big (\sum _{j=1}^n{\hat{z}}_{ij}{\hat{s}}_{2ij}\big )-{\hat{n}}^2_i}. \end{aligned}$$
  4. CM-step3:

    The updates for \({\varvec{A}}\), \({\varvec{{\varOmega }}}_i\) and \({\varvec{D}}\) are given by

    $$\begin{aligned} \hat{{\varvec{A}}}= & {} \left( \sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{y}}}_j\hat{{\varvec{\eta }}}_{ij}^{\top }\right) \left( \sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\hat{{\varvec{{\varPsi }}}}_{ij}\right) ^{-1},\\ \hat{{\varvec{{\varOmega }}}}_i= & {} \frac{1}{{\hat{n}}_i}\sum _{j=1}^n{\hat{z}}_{ij}\Big [\hat{{\varvec{{\varPsi }}}}_{ij}-\hat{{\varvec{\eta }}}_{ij}\hat{{\varvec{\xi }}}_{i}^{\top } -\hat{{\varvec{\xi }}}_{i}\hat{{\varvec{\eta }}}_{ij}^{\top }+{\hat{s}}_{2ij}\hat{{\varvec{\xi }}}_{i}\hat{{\varvec{\xi }}}_{i}^{\top }+{\hat{s}}_{1ij}\hat{{\varvec{\lambda }}}_i\hat{{\varvec{\lambda }}}^{\top }_i\\&-\,(\hat{{\varvec{u}}}_{ij}-\hat{{\varvec{\xi }}}_i)\hat{{\varvec{\lambda }}}^{\top }_i-\hat{{\varvec{\lambda }}}_i(\hat{{\varvec{u}}}_{ij}-\hat{{\varvec{\xi }}}_i)^{\top }\Big ],\\ \hat{{\varvec{D}}}= & {} \frac{1}{n}\mathrm{Diag}\Bigg \{\sum _{i=1}^g\sum _{j=1}^n{\hat{z}}_{ij}\big ({\hat{s}}_{2ij}{\varvec{y}}_j{\varvec{y}}^{\top }_j-{\varvec{y}}_j\hat{{\varvec{\eta }}}_{ij}^{\top }\hat{{\varvec{A}}}^{\top }\big )\Bigg \}. \end{aligned}$$
  5. CM step 4:

    Calculate \(\hat{\nu }_i\) by solving the root of the following equation:

    $$\begin{aligned} \log \left( \frac{\nu _i}{2}\right) - \mathrm{DG}\left( \frac{\nu _i}{2}\right) + 1 -\frac{1}{{\hat{n}}_i} \sum _{j=1}^n {\hat{z}}_{ij}({\hat{s}}_{2ij} + {\hat{s}}_{3ij})=0. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, WL., Castro, L.M., Chang, YT. et al. Mixtures of restricted skew-t factor analyzers with common factor loadings. Adv Data Anal Classif 13, 445–480 (2019). https://doi.org/10.1007/s11634-018-0317-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0317-2

Keywords

Mathematics Subject Classification

Navigation