Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures

McLaughlin, Paul; Franczak, Brian C.; Kashlak, Adam B.

doi:10.1007/s00357-023-09460-0

Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures

Published: 06 January 2024

Volume 41, pages 65–93, (2024)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Paul McLaughlin¹,
Brian C. Franczak² &
Adam B. Kashlak¹

104 Accesses
2 Altmetric
Explore all metrics

Abstract

A family of parsimonious contaminated shifted asymmetric Laplace mixtures is developed for unsupervised classification of asymmetric clusters in the presence of outliers and noise. A series of constraints are applied to a modified factor analyzer structure of the component scale matrices, yielding a family of twelve models. Application of the modified factor analyzer structure and these parsimonious constraints makes these models effective for the analysis of high-dimensional data by reducing the number of free parameters that need to be estimated. A variant of the expectation-maximization algorithm is developed for parameter estimation with convergence issues being discussed and addressed. Popular model selection criteria like the Bayesian information criterion and the integrated complete likelihood (ICL) are utilized, and a novel modification to the ICL is also considered. Through a series of simulation studies and real data analyses, that includes comparisons to well-established methods, we demonstrate the improvements in classification performance found using the proposed family of models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Advances in Robust Constrained Model Based Clustering

Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions

Article 15 February 2023

Data Availability

All data analyzed in this article are fully presented in the article.

References

Aitken, A. (1926). On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edimburgh, 46, 289–305.
Article Google Scholar
Andrews, J. L., & McNicholas, P. D. (2011). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Article MathSciNet Google Scholar
Andrews, J. L., & McNicholas, P. D. (2011). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Article MathSciNet Google Scholar
Andrews, J. L., & McNicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.
Article MathSciNet Google Scholar
Baek, J., McLachlan, G. J., & Flack, L. K. (2009). Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1298–1309.
Article Google Scholar
Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503–515.
Article MathSciNet Google Scholar
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analysis and Machine Intelligence 22(7), 719–725.
Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal, 41(3–4), 561–575.
Article MathSciNet Google Scholar
Böhning, D., Diez, E., Scheub, R., Schlattmann, P., & Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.
Article Google Scholar
Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.
Article MathSciNet Google Scholar
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.
MathSciNet Google Scholar
Fang, Y., Franczak, B.C., & Subedi, S. (2023). Tackling the infinite likelihood problem when fitting mixtures of shifted asymmetric Laplace distributions
Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils origin from their fatter acid composition. Food Research and Data Analysis (pp. 189–214). London: Applied Science Publishers.
Google Scholar
Forina, M., & Tiscornia, E. (1982). Pattern recognition methods in the prediction of Italian olive oils origin by their fatter acid content. Annali di Chimica, 72, 143–155.
Google Scholar
Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering methods? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.
Article Google Scholar
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Article MathSciNet Google Scholar
Franczak, B., Browne, R. P., McNicholas, P., & Burak, K. (2018). MixSAL: Mixtures of multivariate shifted asymmetric Laplace (SAL) distributions. R package version, 1.0
Franczak, B., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
Article Google Scholar
Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Toronto, ON
Hennig, C. (2010). Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification, 4(1), 3–34.
Article MathSciNet Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
Jørgensen, B. (1982). Statistical properties of the generalized inverse Gaussian distribution. New York: Springer-Verlag.
Book Google Scholar
Kotz, S., Kozubowski, T. J. & Podgorski, K. (2001). The Laplace distribution and generalizations: A revisit with applications to communications, economics, engineering, and finance (1st ed.). Burkhauser Boston.
Lin, T.-I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100, 257–265.
Article MathSciNet Google Scholar
Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
Article MathSciNet Google Scholar
Maugis, C., Celeux, G., & Martin-Magniette, M.-L. (2009). Variable selection for clustering with Gaussian mixture models. Biometrics, 65(3), 701–9.
Article MathSciNet Google Scholar
McLachlan, G. J., Bean, R. W., & Jones, L.B.-T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.
Article MathSciNet Google Scholar
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). New York: Wiley.
Book Google Scholar
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons.
Book Google Scholar
McLachlan, G. J., & Peel, D. (2000b). Mixtures of factor analyzers. In: Proceedings of the seventh international conference on machine learning, San Francisco, pp. 599–606. Morgan Kaufmann.
McNicholas, P. D. (2016). Mixture model-based classification. Boca Raton FL: Chapman & Hall/CRC Press.
Book Google Scholar
McNicholas, P. D. (2016). Model-based clustering. Journal of Classification, 33, 331–373.
Article MathSciNet Google Scholar
McNicholas, P. D., ElSherbiny, A., McDaid, A. F., & Murphy, T. B. (2022). pgmm: Parsimonious Gaussian mixture models. R package version, 1(2), 6.
Google Scholar
McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.
Article MathSciNet Google Scholar
McNicholas, P. D., & Murphy, T. B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Journal of Statistical Planning and Inference, 26(21), 2705–2712.
Google Scholar
McNicholas, P. D., Murphy, T. B., McDaid, A. F., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis, 54(3), 711–723.
Article MathSciNet Google Scholar
McNicholas, S., McNicholas, P. D., & Browne, R. P. (2017). A mixture of variance-gamma factor analyzers, pp. 369–385. Cham: Springer International Publishing.
McNicholas, S. M., McNicholas, P. D., & Ashlock, D. A. (2021). An evolutionary algorithm with crossover and mutation for model-based clustering. Journal of Classification, 38, 264–279.
Article MathSciNet Google Scholar
Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278.
Article MathSciNet Google Scholar
Meng, X. L., & Van Dyk, D. (1997). The EM algorithm - An old folk song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.
Article MathSciNet Google Scholar
Morris, K., Punzo, A., Blostein, M., & McNicholas, P. D. (2019). Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.
Article MathSciNet Google Scholar
Murray, P. M., Browne, R. B., & McNicholas, P. D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.
Article MathSciNet Google Scholar
Punzo, A., Blostein, M., & McNicholas, P. D. (2020). High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition, 98(107031), 1–12.
Google Scholar
Punzo, A., Mazza, A., & McNicholas, P. D. (2018). ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software, 85(10), 1–25.
Article Google Scholar
Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.
Article MathSciNet Google Scholar
Qui, W., & Joe, H. (2020). clusterGeneration: Random cluster generation (with specified degree of separation). R package version, 1(3), 7.
Google Scholar
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
Schork, N. J., & Schork, M. A. (1988). Skewness and mixtures of normal distributions. Journal of the American Statistical Association, 17, 3951–3969.
MathSciNet Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Article MathSciNet Google Scholar
Sclove, S. L. (2002). Assessing accuracy and precision of a medical lab machine by means of cluster analysis. Journal of classification, 19(2), 197–214.
Article MathSciNet Google Scholar
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 88–103.
Google Scholar
Steane, M. A., McNicholas, P. D., & Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics - Simulation and Computation, 41(4), 510–523.
Article MathSciNet Google Scholar
Steinley, D. (2004). Properties of the Hubert-Arable adjusted Rand index. Psychological methods, 9(3), 386.
Article Google Scholar
Telford, R., & Cunningham, R. (1991). Sex, sport and body-size dependency of hematology in highly trained athletes. Medicine and Science in Sports and Exercise, 23, 788–794.
Article Google Scholar
Tipping, T., & Bishop, C. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Article Google Scholar
Tipping, T., & Bishop, C. (1999). Probabilistic principal component analysers. Journal of the Royal Statistical Society, Series B, 61, 611–622.
Article Google Scholar
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester: John Wiley & Sons.
Google Scholar
Tong, H., & Tortora, C. (2022). Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification, 16(1), 5–30.
Article MathSciNet Google Scholar
Tortora, C., McNicholas, P. D., & Browne, R. P. (2016). A mixture of generalized hyperbolic factor analyzers. Advanced in Data Analysis and Classification, 10(4), 423–440.
Article MathSciNet Google Scholar
Tukey, J. (1960). A survey of sampling from contaminated distributions. In: Oklin, I., Ed., Contributions to probability and statistics, Redwood, CA., pp. 448–485. Stanford University Press
Wehrens, R., Buydens, L. M., Fraley, C., & Raftery, A. E. (2004). Model-based clustering for image segmentation and large datasets via sampling. Journal of Classification, 21(2), 231–253.
Article MathSciNet Google Scholar
Wei, Y., Tang, Y., & McNicholas, P. D. (2018). Flexible high-dimensional unsupervised learning with missing data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 610–621.
Article Google Scholar
Woodbury, M. (1950). Inverting modified matrices. Technical Report 42, Princeton University, Princeton, N.J

Download references

Acknowledgements

We would like to thank the editor and three anonymous referees for their constructive feedback that, in our opinion, helped us improve the paper.

Funding

This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Department of Mathematical & Statistical Sciences, University of Alberta, Edmonton, Alberta, T6G 2R3, Canada
Paul McLaughlin & Adam B. Kashlak
Department of Mathematics & Statistics, MacEwan University, Edmonton, Alberta, T5J 4S2, Canada
Brian C. Franczak

Authors

Paul McLaughlin
View author publications
You can also search for this author in PubMed Google Scholar
Brian C. Franczak
View author publications
You can also search for this author in PubMed Google Scholar
Adam B. Kashlak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brian C. Franczak.

Ethics declarations

Ethics Approval

The research study did not involve any human participants or animals.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Updates Used in the AECM Presented in Section 4

1.1 A.1 E-Step in the First Alternation

In the E-step of the first alternation of iteration $(k+1)$ of the proposed AECM, the sources of missing data required to compute Eq. 31 are replaced with the expected values $z^{(k+1)}_{ig}$, $v^{(k+1)}_{ig}$, $E^{(k+1)}_{1ig}$, $E^{(k+1)}_{2ig}$, $\widetilde{E}^{(k+1)}_{1ig}, \widetilde{E}^{(k+1)}_{2ig}$, respectively, for $g=1,\ldots ,G$. Formally, these expected values can be written as follows:

$$\begin{aligned} z^{(k+1)}_{ig}&:=\mathbb {E}[Z_{ig} \mid \textbf{X}_i = \textbf{x}_i]= \frac{\pi ^{(k)}_g f_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_g,\eta ^{(k)}_g,\varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)}{\sum ^G_{h=1}\pi ^{(k)}_hf_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_h,\eta ^{(k)}_h,\varvec{\mu }^{(k)}_h,\varvec{\alpha }^{(k)}_h,\varvec{\Sigma }^{(k)}_h)},\\ v^{(k+1)}_{ig}&:=\mathbb {E}[V_{ig} \mid \textbf{X}_i = \textbf{x}_i]=\frac{\rho ^{(k)}_gf_{\text {SAL}}(\textbf{x}_i\ |\ \varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)}{f_{\text {CSAL}}(\textbf{x}_i\ |\ \rho ^{(k)}_g,\eta ^{(k)}_g,\varvec{\mu }^{(k)}_g,\varvec{\alpha }^{(k)}_g,\varvec{\Sigma }^{(k)}_g)},\\ E^{(k+1)}_{1ig}&:=\mathbb {E}[W_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=1]=\sqrt{\frac{b^{(k)}_{ig}}{a^{(k)}_g}}R_{\upsilon }\left( \sqrt{a^{(k)}_g b^{(k)}_{ig}}\right) , \\ E^{(k+1)}_{2ig}&:=\mathbb {E}[1/W_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=1]=\sqrt{\frac{a^{(k)}_g}{b^{(k)}_{ig}}}R_{\upsilon }\left( \sqrt{a^{(k)}_g b^{(k)}_{ig}}\right) -\frac{2\upsilon }{b^{(k)}_{ig}}, \\ \widetilde{E}^{(k+1)}_{1ig}&:=\mathbb {E}\left[ \widetilde{W}_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=0\right] =\sqrt{\frac{\widetilde{b}^{(k)}_{ig}}{a^{(k)}_g}}R_{\upsilon }\left( \sqrt{a^{(k)}_g \widetilde{b}^{(k)}_{ig}}\right) , \\ \widetilde{E}^{(k+1)}_{2ig}&:=\mathbb {E}\left[ 1/\widetilde{W}_{ig} \mid \textbf{X}_i = \textbf{x}_i,Z_{ig}=1,V_{ig}=0\right] =\sqrt{\frac{a^{(k)}_g}{\widetilde{b}^{(k)}_{ig}}}R_{\upsilon }\left( \sqrt{a^{(k)}_g \widetilde{b}^{(k)}_{ig}}\right) -\frac{2\upsilon }{\widetilde{b}^{(k)}_{ig}} \end{aligned}$$

where $a^{(k)}_g=2+\varvec{\alpha }^{(k)'}_g(\varvec{\Sigma }^{(k)}_g)^{-1}\varvec{\alpha }^{(k)}_g,\ b^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \varvec{\Sigma }^{(k)}_g),\ \widetilde{b}^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \eta ^{(k)}_g\varvec{\Sigma }^{(k)}_g)$, $\upsilon = (2-p)/2$, $\varvec{\Sigma }_g^{(k)} = \varvec{\Lambda }_g^{(k)}\varvec{\Lambda }_g^{(k)'} + \omega _g^{(k)}\varvec{\Delta }_g^{(k)}$, and all other terms are previously defined for Eqs. 6 – 9 in Section 2.1. The closed-form updates for $E_{1ig},\ E_{2ig},\ \widetilde{E}_{1ig},$ $\widetilde{E}_{2ig}$ follow from the conditional relationship between a standard exponential random variable and a SAL random vector given in Eq. 14.

1.2 A.2 CM-Step 1 in the First Alternation

In the first CM-step of the first alternation of iteration $(k+1)$ of the proposed AECM, the maximum likelihood estimates (MLEs) for the parameters in $\varvec{\vartheta }_{11}=\{\pi _g,\rho _g,\varvec{\mu }_g,\varvec{\alpha }_g\}_{g=1}^G$ are given by

$$\begin{aligned} {\pi }^{(k+1)}_g{} & {} = \frac{n_g^{(k)}}{n}, \end{aligned}$$

(43)

$$\begin{aligned} {\rho }^{(k+1)}_g{} & {} = \frac{n^{(k)}_{g,\text {good}}}{n^{(k)}_g}, \end{aligned}$$

(44)

$$\begin{aligned} {\varvec{\mu }}^{(k+1)}_g{} & {} = \frac{B^{(k)}\sum ^n_{i=1}a^{(k)}_{ig}\textbf{x}_i-C^{(k)}\sum ^n_{i=1}b^{(k)}_{ig}\textbf{x}_i}{B^{(k)}A^{(k)}-(C^{(k)})^2}, \end{aligned}$$

(45)

$$\begin{aligned} {\varvec{\alpha }}^{(k+1)}_g{} & {} = \frac{A^{(k)}\sum ^n_{i=1} b_{ig}^{(k)}\textbf{x}_i-C^{(k)}\sum ^n_{i=1}a^{(k)}_{ig}\textbf{x}_i}{B^{(k)}A^{(k)}-(C^{(k)})^2}, \end{aligned}$$

(46)

where

$$\begin{aligned}&a^{(k)}_{ig}=z^{(k)}_{ig}\left( v^{(k)}_{ig} E^{(k)}_{2ig} + \frac{1-v^{(k)}_{ig}}{\eta _g^{(k)}}\widetilde{E}^{(k)}_{2ig}\right) ,\quad b^{(k)}_{ig} = z^{(k)}_{ig}\left( v^{(k)}_{ig}+\frac{1-v^{(k)}_{ig}}{\sqrt{\eta _g^{(k)}}}\right) \\&A^{(k)}= \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} E^{(k)}_{2ig} + \frac{1-v^{(k)}_{ig}}{\eta _g^{(k)}}\widetilde{E}^{(k)}_{2ig} \right) , \quad \! B^{(k)} \!=\! \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} E^{(k)}_{1ig} \!+\! (1-v^{(k)}_{ig}) \widetilde{E}^{(k)}_{1ig}\right) ,\\&C^{(k)}= \sum ^n_{i=1} z^{(k)}_{ig} \left( v^{(k)}_{ig} + \frac{1-v^{(k)}_{ig}}{\sqrt{\eta _g^{(k)}}}\right) , \end{aligned}$$

and $n_g^{(k)} = \sum _{g=1}^G{z_{ig}^{(k)}}$ and $n_{g,\text {good}}^{(k)} \sum _{g=1}^G{z_{ig}^{(k)}v_{ig}^{(k)}}$ were defined for Eq. 31.

1.3 A.3 E-Step in the Second Alternation

For the single-factor analysis model defined in Eq. 18,

$$\begin{aligned} \textbf{X}= \varvec{\mu }+ \varvec{\Lambda }\textbf{u}+ \varvec{\epsilon }, \end{aligned}$$

we have $\textbf{X}\sim \text {N}_p\left( \varvec{\mu },\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi }\right) $ and $\textbf{u}\sim \text {N}_q\left( \textbf{0},\textbf{I}_q\right) $, where all terms are as defined for Eq. 18. So, one can show that

$$\begin{aligned} \textbf{X}\mid \textbf{U}&=\textbf{u}\sim \text {N}_p\left( \varvec{\mu }+\varvec{\Lambda }\textbf{u},\varvec{\Psi }\right) , \\ \textbf{u}\mid \textbf{X}&=\textbf{x}\sim \text {N}_q\left( \varvec{\Lambda }'\left( \varvec{\Lambda }\varvec{\Lambda }' + \varvec{\Psi }\right) ^{-1}\left( \textbf{x}-\varvec{\mu }\right) , \left( \textbf{I}_q - \varvec{\Lambda }'\left( \varvec{\Lambda }\varvec{\Lambda }' + \varvec{\Psi }\right) ^{-1}\varvec{\Lambda }\right) \right) . \end{aligned}$$

It follows that the conditional expectations for the latent factor $\textbf{U}$ and the outer product $\textbf{U}\textbf{U}'$ are given by

$$\begin{aligned} \mathbb {E}\big [\textbf{U}\mid \textbf{X}= \textbf{x}\big ]&= \varvec{\beta }(\textbf{X}-\varvec{\mu }^*) \end{aligned}$$

(47)

$$\begin{aligned} \mathbb {E}\big [\textbf{U}\textbf{U}' \mid \textbf{X}= \textbf{x}\big ]&= \textbf{I}_q +\varvec{\beta }\varvec{\Lambda }\varvec{\beta }\big [(\textbf{X}-\varvec{\mu }^*)(\textbf{X}-\varvec{\mu }^*){]}\varvec{\beta }' \end{aligned}$$

(48)

where $\varvec{\beta }=\varvec{\Lambda }'(\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi })^{-1}$. Using Eq. 47 and 48, we can write the expected values required to compute the expected value of Eq. 35 in the E-step of the second alternation. Formally, these expected values can be written as follows:

$$\begin{aligned} \mathbb {E}\left[ Z_{ig}V_{ig}\left( \frac{1-V_{ig}}{\sqrt{\eta _g^{(k+1)}}} \right) \textbf{U}_{ig}\mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)} \left( v^{(k+1/2)}_{ig}+\frac{\left( 1-v^{(k+1/2)}_{ig}\right) }{\sqrt{\eta _g^{(k+1)}}}\right) (\textbf{x}_i-\varvec{\mu }_g^{(k+1)}) \\&\quad -z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( v^{(k+1/2)}_{ig}E^{(k+1/2)}_{1ig}+\left( 1-v^{(k+1/2)}_{ig}\right) \widetilde{E}^{(k+1/2)}_{1ig}\right) \varvec{\alpha }^{(k+1)}_g, \\ \mathbb {E}\left[ Z_{ig}V_{ig}W^{-1}_{ig}\textbf{U}_{ig} \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}v_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( E^{(k+1/2)}_{2ig}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})-\varvec{\alpha }^{(k+1)}_g\right) , \\ \mathbb {E}\left[ Z_{ig}\left( \frac{1-V_{ig}}{\eta _g^{(k+1)}}\right) \widetilde{W}^{-1}_{ig}\textbf{U}_{ig} \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\varvec{\beta }_g^{(k)}\left( \frac{1-v_{ig}^{(k+1/2)}}{\eta _g^{(k+1)}}\widetilde{E}^{(k+1/2)}_{2ig}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})\right. \\&\quad -\left. \frac{1-v_{ig}^{(k+1/2)}}{\sqrt{\eta _g^{(k+1)}}}\varvec{\alpha }^{(k+1)}_g\right) , \\ \mathbb {E}\left[ Z_{ig}V_{ig}W^{-1}_{ig}\textbf{U}_{ig}\textbf{U}_{ig}' \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}v^{(k+1/2)}_{ig}\bigg [\textbf{I}_q +\varvec{\beta }^{(k)}_g\varvec{\Lambda }_g^{(k)} + \varvec{\beta }_g^{(k)}\\&\quad \bigg (E^{(k+1/2)}_{2ig} (\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})'&\\&\quad -2(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\varvec{\alpha }_g^{(k+1)})' +E^{(k+1/2)}_{1ig}\varvec{\alpha }_g^{(k+1)}(\varvec{\alpha }_g^{(k+1)})' \bigg )\left( \varvec{\beta }_g^{(k)}\right) '\bigg ], \\ \mathbb {E}\left[ Z_{ig}\left( \frac{1-V_{ig}}{\eta _g^{(k+1)}}\right) W^{-1}_{ig}\textbf{U}_{ig}\textbf{U}_{ig}' \mid \textbf{X}_i = \textbf{x}_i\right]&=z_{ig}^{(k+1/2)}\frac{(1-v^{(k+1/2)}_{ig})}{\eta _g^{(k+1)}}\Bigg [\textbf{I}_q +\varvec{\beta }^{(k)}_g\varvec{\Lambda }_g^{(k)} + \left( \varvec{\beta }_g^{(k)}\right) ' \\&\Bigg (\widetilde{E}^{(k+1/2)}_{2ig} (\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})' \!-\!2\sqrt{\eta _g^{(k+1)}}(\textbf{x}_i-\varvec{\mu }_g^{(k+1)})(\varvec{\alpha }_g^{(k+1)})' \\&+\widetilde{E}^{(k+1/2)}_{1ig}\eta _g^{(k+1)}\varvec{\alpha }_g^{(k+1)}(\varvec{\alpha }_g^{(k+1)})' \Bigg )\left( \varvec{\beta }_g^{(k)}\right) '\Bigg ], \end{aligned}$$

where $z_{ig}^{(k+1/2)}$, $v^{(k+1/2)}_{ig}$, $E^{(k+1/2)}_{1ig}$, $\widetilde{E}^{(k+1/2)}_{1ig}$, $E^{(k+1/2)}_{2ig}$, and $\widetilde{E}^{(k+1/2)}_{2ig}$ are computed using the updates given in Appendix A.1, but with the required model parameters replaced with their estimates from iteration $(k+1)$, i.e., from the first alternation of the proposed AECM detailed in Section 4.

1.4 A.4 Illustrative Example of the CM-Step in the Second Alternation

To fit the CCUU model, the CM-Step of the second alternation of the proposed AECM is as follows. First, recall that the CCUU model scale matrix has the form

$$\begin{aligned} \varvec{\Sigma }_g=\varvec{\Lambda }\varvec{\Lambda }'+\omega _g\varvec{\Delta }, \end{aligned}$$

(49)

where all terms are as defined for Eqs. 18 and 29 for $g=1,\ldots ,G$. Let

$$\begin{aligned} \textbf{S}^{(k+1)}&= \sum ^g_{g=1}\pi _g^{(k+1)}\textbf{S}_g^{(k+1)}, \\ \varvec{\Theta }^{(k+1/2)}&= \textbf{I}_q-\varvec{\beta }^{(k)}\varvec{\Lambda }^{(k)} +\varvec{\beta }^{(k)}\textbf{S}^{(k+1)}\varvec{\beta }^{(k)'}, \end{aligned}$$

with $\varvec{\beta }^{(k)}$ defined according to the imposed constraints. Maximizing Eq. 37 gives the following updating equations:

$$\begin{aligned} \varvec{\beta }_g^{(k)}&= \varvec{\Lambda }^{(k)'}\left( \varvec{\Lambda }^{(k)}\varvec{\Lambda }^{(k)'}+\omega _g^{(k)}\varvec{\Delta }^{(k)}\right) ^{-1}, \\ \varvec{\Lambda }^{(k+1)}&= \left[ \sum ^G_{g=1}\frac{n^{(k+1/2)}_g}{\omega _g^{(k)}}\textbf{S}_g^{(k+1)}\varvec{\beta }_g^{(k)}\right] \left[ \sum ^G_{g=1}\frac{n^{(k+1/2)}_g}{\omega _g^{(k)}}\varvec{\Theta }_g^{(k+1/2)}\right] ^{-1}, \\ \omega _g^{(k+1)}&= \frac{1}{p}\text {tr}\left\{ \left( \varvec{\Delta }^{(k)}\right) ^{-1}\left[ \textbf{S}_g^{(k+1)} -2\varvec{\Lambda }^{(k+1)}\varvec{\beta }_g^{(k)}\textbf{S}_g^{(k+1)}+\varvec{\Lambda }^{(k+1)}\varvec{\Theta }_g^{(k+1/2)}\varvec{\Lambda }^{(k+1)}\right] \right\} , \\ \varvec{\Delta }^{(k+1)}&= \frac{1}{\kappa }\text {diag}\left\{ \varvec{\Xi }^{(k+1/2)}\right\} , \end{aligned}$$

where

$$\begin{aligned} \varvec{\Xi }^{(k+1/2)}&= \sum ^G_{g=1}\frac{n_g^{(k+1/2)}}{\omega _g^{(k+1)}}\left[ \textbf{S}_g^{(k+1)}-2\varvec{\Lambda }^{(k+1)}\varvec{\beta }_g^{(k)}\textbf{S}_g^{(k+1)} + \varvec{\Lambda }^{(k+1)}\varvec{\Theta }_g^{(k+1/2)}\varvec{\Lambda }^{(k+1)'}\right] , \\ \kappa&= \left( \prod ^p_{j=1}\varvec{\Xi }_{jj}^{(k+1/2)}\right) ^{1/p}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

McLaughlin, P., Franczak, B.C. & Kashlak, A.B. Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures. J Classif 41, 65–93 (2024). https://doi.org/10.1007/s00357-023-09460-0

Download citation

Accepted: 07 December 2023
Published: 06 January 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00357-023-09460-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures

Abstract

Access this article

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Advances in Robust Constrained Model Based Clustering

Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Appendix. Updates Used in the AECM Presented in Section 4

1.1 A.1 E-Step in the First Alternation

1.2 A.2 CM-Step 1 in the First Alternation

1.3 A.3 E-Step in the Second Alternation

1.4 A.4 Illustrative Example of the CM-Step in the Second Alternation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures

Abstract

Access this article

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Advances in Robust Constrained Model Based Clustering

Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Appendix. Updates Used in the AECM Presented in Section 4

Appendix. Updates Used in the AECM Presented in Section 4

1.1 A.1 E-Step in the First Alternation

1.2 A.2 CM-Step 1 in the First Alternation

1.3 A.3 E-Step in the Second Alternation

1.4 A.4 Illustrative Example of the CM-Step in the Second Alternation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation