Abstract
In recent years, finite mixtures of skew distributions are gaining popularity as a flexible tool for modelling data with asymmetric distributional features. Parameter estimation for these mixture models via the traditional EM algorithm requires the number of components to be specified a priori. In this paper, we consider unsupervised learning of skew mixture models where the optimal number of components is estimated during the parameter estimation process. We adopt a component-wise EM algorithm and use the minimum message length (MML) criterion. For illustrative purposes, we focus on the case of a finite mixture of multivariate skew t distributions. The performance of the approach is demonstrated on a real dataset from flow cytometry, where our mixture model was used to provide an automated segmentation of cell populations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)
Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015)
Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)
Bernardi, M.: Risk measures for skew normal mixtures. Stat. Probab. Lett. 83, 1819–1824 (2013)
Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)
Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001)
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)
Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013)
Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)
Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)
Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013)
Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)
Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)
Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear)
Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:1411.2820 [q-bio.QM] (2014)
Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016)
Lin, T.I.: Robust mixture modeling using multivariate skew-\(t\) distribution. Stat. Comput. 20, 343–356 (2010)
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)
Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)
Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)
McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:1404.1733 (2014)
McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)
Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009)
Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)
Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)
Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015)
Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Lee, S.X., McLachlan, G.J. (2016). Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-49586-6_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)