Abstract
Mixtures with random covariates are statistical models which can be applied for clustering and for density estimation of a random vector composed by a response variable and a set of covariates. In this class, the generalized linear Gaussian cluster-weighted model (GLGCWM) assumes, in each mixture component, an exponential family distribution for the response variable and a multivariate Gaussian distribution for the vector of real-valued covariates. For parsimony sake, a family of fourteen models is here introduced by applying some constraints on the eigen-decomposed covariance matrices of the Gaussian distribution. The EM algorithm is described to find maximum likelihood estimates of the parameters for these models. This novel family of models is finally applied to a real data set where a good classification performance is obtained, especially when compared with other well-established mixture-based approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Airoldi, J., Hoffmann, R.: Age variation in voles (Microtus californicus, M. ochrogaster) and its significance for systematic studies. In: Occasional Papers of the Museum of Natural History, vol. 111. University of Kansas, Lawrence (1984)
Aitken, A.: On Bernoulli’s numerical solution of algebraic equations. In: Proceedings of the Royal Society of Edinburgh, vol. 46, pp. 289–305 (1926)
Bagnato, L., Punzo, A.: Finite mixtures of unimodal beta and gamma densities and the k-bumps algorithm. Comput. Stat. 28(4), 1571–1597 (2013)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28(5), 781–793 (1995)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 39(1), 1–38 (1977)
Flury, B.: A First Course in Multivariate Statistics. Springer, New York (1997)
Gershenfeld, N.: Nonlinear inference and cluster-weighted modeling. Ann. N. Y. Acad. Sci. 808(1), 18–24 (1997)
Greselin, F., Punzo, A.: Closed likelihood ratio testing procedures to assess similarity of covariance matrices. Am. Stat. 67(3), 117–128 (2013)
Grün, B., Leisch, F.: FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28(4), 1–35 (2008)
Hennig, C.: Identifiablity of models for clusterwise linear regression. J. Classif. 17(2), 273–296 (2000)
Ingrassia, S., Minotti, S.C., Vittadini, G.: Local statistical modeling via the cluster-weighted approach with elliptical distributions. J.Classif. 29(3), 363–401 (2012)
Ingrassia, S., Minotti, S.C., Punzo, A.: Model-based clustering via linear cluster-weighted models. Comput. Stat. Data Anal. 71, 159–182 (2014)
Ingrassia, S., Punzo, A., Vittadini, G., Minotti, S.C.: The generalized linear mixed cluster-weighted model. J. Classif. 32(1), 85–113 (2015)
Mazza, A., Punzo, A., Ingrassia, S.: flexCWM: Flexible Cluster-Weighted Modeling. Available at http://cran.r-project.org/web/packages/flexCWM/index.html (2014)
Punzo, A.: Flexible mixture modeling with the polynomial Gaussian cluster-weighted model. Stat. Model. 14(3), 257–291 (2014)
Punzo, A., Ingrassia, S.: On the use of the generalized linear exponential cluster-weighted model to asses local linear independence in bivariate data. QdS J. Methodol. Appl. Stat. 15, 131–144 (2013)
Punzo, A., Ingrassia, S.: Clustering bivariate mixed-type data via the cluster-weighted model. Comput. Stat. (2015)
Punzo, A., McNicholas, P.D.: Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. Available at: http://arxiv.org/abs/1409.6019 (2014) [arXiv.org e-print 1409.6019]
Punzo, A., Browne, R.P., McNicholas, P.D.: Hypothesis testing for parsimonious Gaussian mixture models. Available at: http://arxiv.org/abs/1405.0377 (2014) [arXiv.org e-print 1405.0377]
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Subedi, S., Punzo, A., Ingrassia, S., McNicholas, P.D.: Clustering and classification via cluster-weighted factor analyzers. Adv. Data Anal. Classif. 7(1), 5–40 (2013)
Subedi, S., Punzo, A., Ingrassia, S., McNicholas, P.D.: Cluster-weighted t-factor analyzers for robust model-based clustering and dimension reduction. Stat. Methods Appl. 24 (2015)
Wedel, M.: Concomitant variables in finite mixture models. Statistica Neerlandica 56(3), 362–375 (2002)
Wedel, M., Kamakura, W.: Market Segmentation: Conceptual and Methodological Foundations, 2nd edn. Kluwer Academic, Boston (2001)
Acknowledgements
The authors acknowledge the financial support from the grant “Finite mixture and latent variable models for causal inference and analysis of socio-economic data” (FIRB 2012-Futuro in ricerca) funded by the Italian Government (RBFR12SHVV).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Punzo, A., Ingrassia, S. (2015). Parsimonious Generalized Linear Gaussian Cluster-Weighted Models. In: Morlini, I., Minerva, T., Vichi, M. (eds) Advances in Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-17377-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-17377-1_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17376-4
Online ISBN: 978-3-319-17377-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)