Skip to main content
Log in

Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A family of parsimonious contaminated shifted asymmetric Laplace mixtures is developed for unsupervised classification of asymmetric clusters in the presence of outliers and noise. A series of constraints are applied to a modified factor analyzer structure of the component scale matrices, yielding a family of twelve models. Application of the modified factor analyzer structure and these parsimonious constraints makes these models effective for the analysis of high-dimensional data by reducing the number of free parameters that need to be estimated. A variant of the expectation-maximization algorithm is developed for parameter estimation with convergence issues being discussed and addressed. Popular model selection criteria like the Bayesian information criterion and the integrated complete likelihood (ICL) are utilized, and a novel modification to the ICL is also considered. Through a series of simulation studies and real data analyses, that includes comparisons to well-established methods, we demonstrate the improvements in classification performance found using the proposed family of models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

All data analyzed in this article are fully presented in the article.

References

  • Aitken, A. (1926). On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edimburgh, 46, 289–305.

    Article  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2011). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.

    Article  MathSciNet  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2011). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.

    Article  MathSciNet  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.

    Article  MathSciNet  Google Scholar 

  • Baek, J., McLachlan, G. J., & Flack, L. K. (2009). Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1298–1309.

    Article  Google Scholar 

  • Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503–515.

    Article  MathSciNet  Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analysis and Machine Intelligence 22(7), 719–725.

  • Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal, 41(3–4), 561–575.

    Article  MathSciNet  Google Scholar 

  • Böhning, D., Diez, E., Scheub, R., Schlattmann, P., & Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.

    Article  Google Scholar 

  • Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.

    Article  MathSciNet  Google Scholar 

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.

    MathSciNet  Google Scholar 

  • Fang, Y., Franczak, B.C., & Subedi, S. (2023). Tackling the infinite likelihood problem when fitting mixtures of shifted asymmetric Laplace distributions

  • Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils origin from their fatter acid composition. Food Research and Data Analysis (pp. 189–214). London: Applied Science Publishers.

    Google Scholar 

  • Forina, M., & Tiscornia, E. (1982). Pattern recognition methods in the prediction of Italian olive oils origin by their fatter acid content. Annali di Chimica, 72, 143–155.

    Google Scholar 

  • Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering methods? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.

    Article  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  MathSciNet  Google Scholar 

  • Franczak, B., Browne, R. P., McNicholas, P., & Burak, K. (2018). MixSAL: Mixtures of multivariate shifted asymmetric Laplace (SAL) distributions. R package version, 1.0

  • Franczak, B., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.

    Article  Google Scholar 

  • Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Toronto, ON

  • Hennig, C. (2010). Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification, 4(1), 3–34.

    Article  MathSciNet  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Jørgensen, B. (1982). Statistical properties of the generalized inverse Gaussian distribution. New York: Springer-Verlag.

    Book  Google Scholar 

  • Kotz, S., Kozubowski, T. J. & Podgorski, K. (2001). The Laplace distribution and generalizations: A revisit with applications to communications, economics, engineering, and finance (1st ed.). Burkhauser Boston.

  • Lin, T.-I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100, 257–265.

    Article  MathSciNet  Google Scholar 

  • Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.

    Article  MathSciNet  Google Scholar 

  • Maugis, C., Celeux, G., & Martin-Magniette, M.-L. (2009). Variable selection for clustering with Gaussian mixture models. Biometrics, 65(3), 701–9.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J., Bean, R. W., & Jones, L.B.-T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons.

    Book  Google Scholar 

  • McLachlan, G. J., & Peel, D. (2000b). Mixtures of factor analyzers. In: Proceedings of the seventh international conference on machine learning, San Francisco, pp. 599–606. Morgan Kaufmann.

  • McNicholas, P. D. (2016). Mixture model-based classification. Boca Raton FL: Chapman & Hall/CRC Press.

    Book  Google Scholar 

  • McNicholas, P. D. (2016). Model-based clustering. Journal of Classification, 33, 331–373.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P. D., ElSherbiny, A., McDaid, A. F., & Murphy, T. B. (2022). pgmm: Parsimonious Gaussian mixture models. R package version, 1(2), 6.

    Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Journal of Statistical Planning and Inference, 26(21), 2705–2712.

    Google Scholar 

  • McNicholas, P. D., Murphy, T. B., McDaid, A. F., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis, 54(3), 711–723.

    Article  MathSciNet  Google Scholar 

  • McNicholas, S., McNicholas, P. D., & Browne, R. P. (2017). A mixture of variance-gamma factor analyzers, pp. 369–385. Cham: Springer International Publishing.

  • McNicholas, S. M., McNicholas, P. D., & Ashlock, D. A. (2021). An evolutionary algorithm with crossover and mutation for model-based clustering. Journal of Classification, 38, 264–279.

    Article  MathSciNet  Google Scholar 

  • Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278.

    Article  MathSciNet  Google Scholar 

  • Meng, X. L., & Van Dyk, D. (1997). The EM algorithm - An old folk song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.

    Article  MathSciNet  Google Scholar 

  • Morris, K., Punzo, A., Blostein, M., & McNicholas, P. D. (2019). Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.

    Article  MathSciNet  Google Scholar 

  • Murray, P. M., Browne, R. B., & McNicholas, P. D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.

    Article  MathSciNet  Google Scholar 

  • Punzo, A., Blostein, M., & McNicholas, P. D. (2020). High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition, 98(107031), 1–12.

    Google Scholar 

  • Punzo, A., Mazza, A., & McNicholas, P. D. (2018). ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software, 85(10), 1–25.

    Article  Google Scholar 

  • Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.

    Article  MathSciNet  Google Scholar 

  • Qui, W., & Joe, H. (2020). clusterGeneration: Random cluster generation (with specified degree of separation). R package version, 1(3), 7.

    Google Scholar 

  • R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • Schork, N. J., & Schork, M. A. (1988). Skewness and mixtures of normal distributions. Journal of the American Statistical Association, 17, 3951–3969.

    MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  Google Scholar 

  • Sclove, S. L. (2002). Assessing accuracy and precision of a medical lab machine by means of cluster analysis. Journal of classification, 19(2), 197–214.

    Article  MathSciNet  Google Scholar 

  • Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 88–103.

    Google Scholar 

  • Steane, M. A., McNicholas, P. D., & Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics - Simulation and Computation, 41(4), 510–523.

    Article  MathSciNet  Google Scholar 

  • Steinley, D. (2004). Properties of the Hubert-Arable adjusted Rand index. Psychological methods, 9(3), 386.

    Article  Google Scholar 

  • Telford, R., & Cunningham, R. (1991). Sex, sport and body-size dependency of hematology in highly trained athletes. Medicine and Science in Sports and Exercise, 23, 788–794.

    Article  Google Scholar 

  • Tipping, T., & Bishop, C. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.

    Article  Google Scholar 

  • Tipping, T., & Bishop, C. (1999). Probabilistic principal component analysers. Journal of the Royal Statistical Society, Series B, 61, 611–622.

    Article  Google Scholar 

  • Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester: John Wiley & Sons.

    Google Scholar 

  • Tong, H., & Tortora, C. (2022). Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification, 16(1), 5–30.

    Article  MathSciNet  Google Scholar 

  • Tortora, C., McNicholas, P. D., & Browne, R. P. (2016). A mixture of generalized hyperbolic factor analyzers. Advanced in Data Analysis and Classification, 10(4), 423–440.

    Article  MathSciNet  Google Scholar 

  • Tukey, J. (1960). A survey of sampling from contaminated distributions. In: Oklin, I., Ed., Contributions to probability and statistics, Redwood, CA., pp. 448–485. Stanford University Press

  • Wehrens, R., Buydens, L. M., Fraley, C., & Raftery, A. E. (2004). Model-based clustering for image segmentation and large datasets via sampling. Journal of Classification, 21(2), 231–253.

    Article  MathSciNet  Google Scholar 

  • Wei, Y., Tang, Y., & McNicholas, P. D. (2018). Flexible high-dimensional unsupervised learning with missing data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 610–621.

    Article  Google Scholar 

  • Woodbury, M. (1950). Inverting modified matrices. Technical Report 42, Princeton University, Princeton, N.J

Download references

Acknowledgements

We would like to thank the editor and three anonymous referees for their constructive feedback that, in our opinion, helped us improve the paper.

Funding

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian C. Franczak.

Ethics declarations

Ethics Approval

The research study did not involve any human participants or animals.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Updates Used in the AECM Presented in Section 4

Appendix. Updates Used in the AECM Presented in Section 4

1.1 A.1 E-Step in the First Alternation

In the E-step of the first alternation of iteration \((k+1)\) of the proposed AECM, the sources of missing data required to compute Eq. 31 are replaced with the expected values \(z^{(k+1)}_{ig}\), \(v^{(k+1)}_{ig}\), \(E^{(k+1)}_{1ig}\), \(E^{(k+1)}_{2ig}\), \(\widetilde{E}^{(k+1)}_{1ig}, \widetilde{E}^{(k+1)}_{2ig}\), respectively, for \(g=1,\ldots ,G\). Formally, these expected values can be written as follows:

$$\begin{aligned} z^{(k+1)}_{ig}&:=\mathbb {E}[Z_{ig} \mid \textbf{X}_i = \textbf{x}_i]= \frac{\pi ^{(k)}_g f_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_g,\eta ^{(k)}_g,\varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)}{\sum ^G_{h=1}\pi ^{(k)}_hf_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_h,\eta ^{(k)}_h,\varvec{\mu }^{(k)}_h,\varvec{\alpha }^{(k)}_h,\varvec{\Sigma }^{(k)}_h)},\\ v^{(k+1)}_{ig}&:=\mathbb {E}[V_{ig} \mid \textbf{X}_i = \textbf{x}_i]=\frac{\rho ^{(k)}_gf_{\text {SAL}}(\textbf{x}_i\ |\ \varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)}{f_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_g,\eta ^{(k)}_g,\varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)},\\ E^{(k+1)}_{1ig}&:=\mathbb {E}[W_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=1]=\sqrt{\frac{b^{(k)}_{ig}}{a^{(k)}_g}}R_{\upsilon }\left( \sqrt{a^{(k)}_g b^{(k)}_{ig}}\right) , \\ E^{(k+1)}_{2ig}&:=\mathbb {E}[1/W_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=1]=\sqrt{\frac{a^{(k)}_g}{b^{(k)}_{ig}}}R_{\upsilon }\left( \sqrt{a^{(k)}_g b^{(k)}_{ig}}\right) -\frac{2\upsilon }{b^{(k)}_{ig}}, \\ \widetilde{E}^{(k+1)}_{1ig}&:=\mathbb {E}\left[ \widetilde{W}_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=0\right] =\sqrt{\frac{\widetilde{b}^{(k)}_{ig}}{a^{(k)}_g}}R_{\upsilon }\left( \sqrt{a^{(k)}_g \widetilde{b}^{(k)}_{ig}}\right) , \\ \widetilde{E}^{(k+1)}_{2ig}&:=\mathbb {E}\left[ 1/\widetilde{W}_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=0\right] =\sqrt{\frac{a^{(k)}_g}{\widetilde{b}^{(k)}_{ig}}}R_{\upsilon }\left( \sqrt{a^{(k)}_g \widetilde{b}^{(k)}_{ig}}\right) -\frac{2\upsilon }{\widetilde{b}^{(k)}_{ig}} \end{aligned}$$

where \(a^{(k)}_g=2+\varvec{\alpha }^{(k)'}_g(\varvec{\Sigma }^{(k)}_g)^{-1}\varvec{\alpha }^{(k)}_g,\ b^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \varvec{\Sigma }^{(k)}_g),\ \widetilde{b}^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \eta ^{(k)}_g\varvec{\Sigma }^{(k)}_g)\), \(\upsilon = (2-p)/2\), \(\varvec{\Sigma }_g^{(k)} = \varvec{\Lambda }_g^{(k)}\varvec{\Lambda }_g^{(k)'} + \omega _g^{(k)}\varvec{\Delta }_g^{(k)}\), and all other terms are previously defined for Eqs. 69 in Section 2.1. The closed-form updates for \(E_{1ig},\ E_{2ig},\ \widetilde{E}_{1ig},\) \(\widetilde{E}_{2ig}\) follow from the conditional relationship between a standard exponential random variable and a SAL random vector given in Eq. 14.

1.2 A.2 CM-Step 1 in the First Alternation

In the first CM-step of the first alternation of iteration \((k+1)\) of the proposed AECM, the maximum likelihood estimates (MLEs) for the parameters in \(\varvec{\vartheta }_{11}=\{\pi _g,\rho _g,\varvec{\mu }_g,\varvec{\alpha }_g\}_{g=1}^G\) are given by

$$\begin{aligned} {\pi }^{(k+1)}_g{} & {} = \frac{n_g^{(k)}}{n}, \end{aligned}$$
(43)
$$\begin{aligned} {\rho }^{(k+1)}_g{} & {} = \frac{n^{(k)}_{g,\text {good}}}{n^{(k)}_g}, \end{aligned}$$
(44)
$$\begin{aligned} {\varvec{\mu }}^{(k+1)}_g{} & {} = \frac{B^{(k)}\sum ^n_{i=1}a^{(k)}_{ig}\textbf{x}_i-C^{(k)}\sum ^n_{i=1}b^{(k)}_{ig}\textbf{x}_i}{B^{(k)}A^{(k)}-(C^{(k)})^2}, \end{aligned}$$
(45)
$$\begin{aligned} {\varvec{\alpha }}^{(k+1)}_g{} & {} = \frac{A^{(k)}\sum ^n_{i=1} b_{ig}^{(k)}\textbf{x}_i-C^{(k)}\sum ^n_{i=1}a^{(k)}_{ig}\textbf{x}_i}{B^{(k)}A^{(k)}-(C^{(k)})^2}, \end{aligned}$$
(46)

where

$$\begin{aligned}&a^{(k)}_{ig}=z^{(k)}_{ig}\left( v^{(k)}_{ig} E^{(k)}_{2ig} + \frac{1-v^{(k)}_{ig}}{\eta _g^{(k)}}\widetilde{E}^{(k)}_{2ig}\right) ,\quad b^{(k)}_{ig} = z^{(k)}_{ig}\left( v^{(k)}_{ig}+\frac{1-v^{(k)}_{ig}}{\sqrt{\eta _g^{(k)}}}\right) \\&A^{(k)}= \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} E^{(k)}_{2ig} + \frac{1-v^{(k)}_{ig}}{\eta _g^{(k)}}\widetilde{E}^{(k)}_{2ig} \right) , \quad \! B^{(k)} \!=\! \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} E^{(k)}_{1ig} \!+\! (1-v^{(k)}_{ig}) \widetilde{E}^{(k)}_{1ig}\right) ,\\&C^{(k)}= \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} + \frac{1-v^{(k)}_{ig}}{\sqrt{\eta _g^{(k)}}}\right) , \end{aligned}$$

and \(n_g^{(k)} = \sum _{g=1}^G{z_{ig}^{(k)}}\) and \(n_{g,\text {good}}^{(k)} \sum _{g=1}^G{z_{ig}^{(k)}v_{ig}^{(k)}}\) were defined for Eq. 31.

1.3 A.3 E-Step in the Second Alternation

For the single-factor analysis model defined in Eq. 18,

$$\begin{aligned} \textbf{X}= \varvec{\mu }+ \varvec{\Lambda }\textbf{u}+ \varvec{\epsilon }, \end{aligned}$$

we have \(\textbf{X}\sim \text {N}_p\left( \varvec{\mu },\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi }\right) \) and \(\textbf{u}\sim \text {N}_q\left( \textbf{0},\textbf{I}_q\right) \), where all terms are as defined for Eq. 18. So, one can show that

$$\begin{aligned} \textbf{X}\mid \textbf{U}&=\textbf{u}\sim \text {N}_p\left( \varvec{\mu }+\varvec{\Lambda }\textbf{u},\varvec{\Psi }\right) , \\ \textbf{u}\mid \textbf{X}&=\textbf{x}\sim \text {N}_q\left( \varvec{\Lambda }'\left( \varvec{\Lambda }\varvec{\Lambda }' + \varvec{\Psi }\right) ^{-1}\left( \textbf{x}-\varvec{\mu }\right) , \left( \textbf{I}_q - \varvec{\Lambda }'\left( \varvec{\Lambda }\varvec{\Lambda }' + \varvec{\Psi }\right) ^{-1}\varvec{\Lambda }\right) \right) . \end{aligned}$$

It follows that the conditional expectations for the latent factor \(\textbf{U}\) and the outer product \(\textbf{U}\textbf{U}'\) are given by

$$\begin{aligned} \mathbb {E}\big [\textbf{U}\mid \textbf{X}= \textbf{x}\big ]&= \varvec{\beta }(\textbf{X}-\varvec{\mu }^*) \end{aligned}$$
(47)
$$\begin{aligned} \mathbb {E}\big [\textbf{U}\textbf{U}' \mid \textbf{X}= \textbf{x}\big ]&= \textbf{I}_q +\varvec{\beta }\varvec{\Lambda }\varvec{\beta }\big [(\textbf{X}-\varvec{\mu }^*)(\textbf{X}-\varvec{\mu }^*){]}\varvec{\beta }' \end{aligned}$$
(48)

where \(\varvec{\beta }=\varvec{\Lambda }'(\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi })^{-1}\). Using Eq. 47 and 48, we can write the expected values required to compute the expected value of Eq. 35 in the E-step of the second alternation. Formally, these expected values can be written as follows:

$$\begin{aligned} \mathbb {E}\left[ Z_{ig}V_{ig}\left( \frac{1-V_{ig}}{\sqrt{\eta _g^{(k+1)}}} \right) \textbf{U}_{ig}\mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)} \left( v^{(k+1/2)}_{ig}+\frac{\left( 1-v^{(k+1/2)}_{ig}\right) }{\sqrt{\eta _g^{(k+1)}}}\right) (\textbf{x}_i-\varvec{\mu }_g^{(k+1)}) \\&\quad -z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( v^{(k+1/2)}_{ig}E^{(k+1/2)}_{1ig}+\left( 1-v^{(k+1/2)}_{ig}\right) \widetilde{E}^{(k+1/2)}_{1ig}\right) \varvec{\alpha }^{(k+1)}_g, \\ \mathbb {E}\left[ Z_{ig}V_{ig}W^{-1}_{ig}\textbf{U}_{ig} \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}v_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( E^{(k+1/2)}_{2ig}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})-\varvec{\alpha }^{(k+1)}_g\right) , \\ \mathbb {E}\left[ Z_{ig}\left( \frac{1-V_{ig}}{\eta _g^{(k+1)}}\right) \widetilde{W}^{-1}_{ig}\textbf{U}_{ig} \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( \frac{1-v_{ig}^{(k+1/2)}}{\eta _g^{(k+1)}}\widetilde{E}^{(k+1/2)}_{2ig}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})\right. \\&\quad -\left. \frac{1-v_{ig}^{(k+1/2)}}{\sqrt{\eta _g^{(k+1)}}}\varvec{\alpha }^{(k+1)}_g\right) , \\ \mathbb {E}\left[ Z_{ig}V_{ig}W^{-1}_{ig}\textbf{U}_{ig}\textbf{U}_{ig}' \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}v^{(k+1/2)}_{ig}\bigg [\textbf{I}_q +\varvec{\beta }^{(k)}_g\varvec{\Lambda }_g^{(k)} + \varvec{\beta }_g^{(k)}\\&\quad \bigg (E^{(k+1/2)}_{2ig} (\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})'&\\&\quad -2(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\varvec{\alpha }_g^{(k+1)})' +E^{(k+1/2)}_{1ig}\varvec{\alpha }_g^{(k+1)}(\varvec{\alpha }_g^{(k+1)})' \bigg )\left( \varvec{\beta }_g^{(k)}\right) '\bigg ], \\ \mathbb {E}\left[ Z_{ig}\left( \frac{1-V_{ig}}{\eta _g^{(k+1)}}\right) W^{-1}_{ig}\textbf{U}_{ig}\textbf{U}_{ig}' \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\frac{(1-v^{(k+1/2)}_{ig})}{\eta _g^{(k+1)}}\Bigg [\textbf{I}_q +\varvec{\beta }^{(k)}_g\varvec{\Lambda }_g^{(k)} + \left( \varvec{\beta }_g^{(k)}\right) ' \\&\Bigg (\widetilde{E}^{(k+1/2)}_{2ig} (\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})' \!-\!2\sqrt{\eta _g^{(k+1)}}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\varvec{\alpha }_g^{(k+1)})' \\&+\widetilde{E}^{(k+1/2)}_{1ig}\eta _g^{(k+1)}\varvec{\alpha }_g^{(k+1)}(\varvec{\alpha }_g^{(k+1)})' \Bigg )\left( \varvec{\beta }_g^{(k)}\right) '\Bigg ], \end{aligned}$$

where \(z_{ig}^{(k+1/2)}\), \(v^{(k+1/2)}_{ig}\), \(E^{(k+1/2)}_{1ig}\), \(\widetilde{E}^{(k+1/2)}_{1ig}\), \(E^{(k+1/2)}_{2ig}\), and \(\widetilde{E}^{(k+1/2)}_{2ig}\) are computed using the updates given in Appendix A.1, but with the required model parameters replaced with their estimates from iteration \((k+1)\), i.e., from the first alternation of the proposed AECM detailed in Section 4.

1.4 A.4 Illustrative Example of the CM-Step in the Second Alternation

To fit the CCUU model, the CM-Step of the second alternation of the proposed AECM is as follows. First, recall that the CCUU model scale matrix has the form

$$\begin{aligned} \varvec{\Sigma }_g=\varvec{\Lambda }\varvec{\Lambda }'+\omega _g\varvec{\Delta }, \end{aligned}$$
(49)

where all terms are as defined for Eqs. 18 and 29 for \(g=1,\ldots ,G\). Let

$$\begin{aligned} \textbf{S}^{(k+1)}&= \sum ^g_{g=1}\pi _g^{(k+1)}\textbf{S}_g^{(k+1)}, \\ \varvec{\Theta }^{(k+1/2)}&= \textbf{I}_q-\varvec{\beta }^{(k)}\varvec{\Lambda }^{(k)} +\varvec{\beta }^{(k)}\textbf{S}^{(k+1)}\varvec{\beta }^{(k)'}, \end{aligned}$$

with \(\varvec{\beta }^{(k)}\) defined according to the imposed constraints. Maximizing Eq. 37 gives the following updating equations:

$$\begin{aligned} \varvec{\beta }_g^{(k)}&= \varvec{\Lambda }^{(k)'}\left( \varvec{\Lambda }^{(k)}\varvec{\Lambda }^{(k)'}+\omega _g^{(k)}\varvec{\Delta }^{(k)}\right) ^{-1}, \\ \varvec{\Lambda }^{(k+1)}&= \left[ \sum ^G_{g=1}\frac{n^{(k+1/2)}_g}{\omega _g^{(k)}}\textbf{S}_g^{(k+1)}\varvec{\beta }_g^{(k)}\right] \left[ \sum ^G_{g=1}\frac{n^{(k+1/2)}_g}{\omega _g^{(k)}}\varvec{\Theta }_g^{(k+1/2)}\right] ^{-1}, \\ \omega _g^{(k+1)}&= \frac{1}{p}\text {tr}\left\{ \left( \varvec{\Delta }^{(k)}\right) ^{-1}\left[ \textbf{S}_g^{(k+1)} -2\varvec{\Lambda }^{(k+1)}\varvec{\beta }_g^{(k)}\textbf{S}_g^{(k+1)}+\varvec{\Lambda }^{(k+1)}\varvec{\Theta }_g^{(k+1/2)}\varvec{\Lambda }^{(k+1)}\right] \right\} , \\ \varvec{\Delta }^{(k+1)}&= \frac{1}{\kappa }\text {diag}\left\{ \varvec{\Xi }^{(k+1/2)}\right\} , \end{aligned}$$

where

$$\begin{aligned} \varvec{\Xi }^{(k+1/2)}&= \sum ^G_{g=1}\frac{n_g^{(k+1/2)}}{\omega _g^{(k+1)}}\left[ \textbf{S}_g^{(k+1)}-2\varvec{\Lambda }^{(k+1)}\varvec{\beta }_g^{(k)}\textbf{S}_g^{(k+1)} + \varvec{\Lambda }^{(k+1)}\varvec{\Theta }_g^{(k+1/2)}\varvec{\Lambda }^{(k+1)'}\right] , \\ \kappa&= \left( \prod ^p_{j=1}\varvec{\Xi }_{jj}^{(k+1/2)}\right) ^{1/p}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McLaughlin, P., Franczak, B.C. & Kashlak, A.B. Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures. J Classif 41, 65–93 (2024). https://doi.org/10.1007/s00357-023-09460-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-023-09460-0

Keywords

Navigation