Advertisement

Families of Parsimonious Finite Mixtures of Regression Models

  • Utkarsh J. DangEmail author
  • Paul D. McNicholas
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Finite mixtures of regression (FMR) models offer a flexible framework for investigating heterogeneity in data with functional dependencies. These models can be conveniently used for unsupervised learning on data with clear regression relationships. We extend such models by imposing an eigen-decomposition on the multivariate error covariance matrix. By constraining parts of this decomposition, we obtain families of parsimonious mixtures of regressions and mixtures of regressions with concomitant variables. These families of models account for correlations between multiple responses. An expectation-maximization algorithm is presented for parameter estimation and performance is illustrated on simulated and real data.

Keywords

Concomitant variables EM algorithm Finite mixtures of regressions Mixture models Multivariate response 

Notes

Acknowledgements

This work is supported by a Alexander Graham Bell Canada Graduate Scholarship (CGS-D; Dang) as well as a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (McNicholas).

References

  1. 1.
    Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.G.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46(2), 373–388 (1994)CrossRefzbMATHGoogle Scholar
  3. 3.
    Browne, R.P., McNicholas, P.D.: ‘mixture’: Mixture Models for Clustering and Classification. R package version 1.0. (2013)Google Scholar
  4. 4.
    Campbell, N.A., Mahon, R.J.: A multivariate study of variation in two species of rock crab of the genus leptograpsus. Aust. J. Zool. 22(3), 417–425 (1974)CrossRefGoogle Scholar
  5. 5.
    Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28(5), 781–793 (1995)CrossRefGoogle Scholar
  6. 6.
    Dang, U.J., Punzo, A., McNicholas, P.D., Ingrassia, S., Browne, R.P.: Multivariate response and parsimony for Gaussian cluster-weighted models [arXiv preprint arXiv:1411.0560] (2014)Google Scholar
  7. 7.
    Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. 93(441), 294–302 (1998)CrossRefzbMATHGoogle Scholar
  8. 8.
    DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Galimberti, G., Soffritti, G.: A multivariate linear regression analysis using finite mixtures of t distributions. Comput. Stat. Data Anal. 71, 138–150 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gershenfeld, N.: Nonlinear inference and cluster-weighted modeling. Ann. N. Y. Acad. Sci. 808(1), 18–24 (1997)CrossRefGoogle Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. J. R. Stat. Soc. C App. 28(1), 100–108 (1979)zbMATHGoogle Scholar
  13. 13.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefGoogle Scholar
  14. 14.
    Keribin, C.: Consistent estimation of the order of mixture models. Sankhya Ser. A 62, 49–66 (2000)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11(8), 1–18 (2004)Google Scholar
  16. 16.
    Lindsay, B.G.: Mixture models: theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5 (1995)Google Scholar
  17. 17.
    McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54(3), 711–723 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2014)Google Scholar
  19. 19.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar
  20. 20.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)CrossRefzbMATHGoogle Scholar
  21. 21.
    Soffritti, G., Galimberti, G.: Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat. Comput. 21(4), 523–536 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Titterington, D.M., Smith, A.F.M., Makov, U,E.: Statistical Analysis of Finite Mixture Distributions, vol. 7. Wiley, New York (1985)Google Scholar
  23. 23.
    Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002)CrossRefzbMATHGoogle Scholar
  24. 24.
    Wedel, M.: Concomitant variables in finite mixture models. Statistica Neerlandica 56(3), 362–375 (2002)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of BiologyMcMaster UniversityHamiltonCanada
  2. 2.Department of Mathematics & StatisticsMcMaster UniversityHamiltonCanada

Personalised recommendations