Skip to main content
Log in

Mixture separation for mixed-mode data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ashford, J. R. and Sowden, R. R. (1970) Multi-variate probit analysis. Biometrics, 26, 535–46.

    Google Scholar 

  • Cormack, R. M. (1971) A review of classification (with discussion). Journal of the Royal Statistical Society, Series A, 134, 321–67.

    Google Scholar 

  • Cox, D. R. and Wermuth, N. (1992) Response models for mixed binary and quantitative variables. Biometrika, 79, 441–61.

    Google Scholar 

  • Day, N. E. (1969) Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–74.

    Google Scholar 

  • Demers, S., Kim, J., Legendre, P. and Legendre, L. (1992) Analyzing multivariate flow cytometric data in aquatic sciences. Cytometry, 13, 291–8.

    Google Scholar 

  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.

    Google Scholar 

  • Edwards, D. (1990) Hierarchical interaction models (with discussion). Journal of the Royal Statistical Society, Series B, 52, 3–20.

    Google Scholar 

  • Everitt, B. S. (1988) A finite mixture model for the clustering of mixed mode data. Statistics and Probability Letters, 6, 305–9.

    Google Scholar 

  • Everitt, B. S. (1993) Cluster Analysis, 3rd Edn. Edward Arnold, London.

    Google Scholar 

  • Everitt, B. S. and Merette, C. (1990) The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17, 283–97.

    Google Scholar 

  • Gordon, A. D. (1981) Classification. Chapman and Hall, London.

    Google Scholar 

  • Krzanowski, W. J. (1975) Discrimination and classification using both binary and continuous variables. Journal of the American Statistical Association, 70, 782–90.

    Google Scholar 

  • Krzanowski, W. J. (1983) Distance between populations using mixed continuous and categorical variables. Biometrika, 70, 235–43.

    Google Scholar 

  • Krzanowski, W. J. (1993) The location model for mixtures of categorical and continuous variables. Journal of Classification, 10, 25–49.

    Google Scholar 

  • McLachlan, G. J. (1982) The classification and mixture maximum likelihood approaches to cluster analysis. In P. R. Krishnaiah and L. N. Kanal (eds.), Handbook of Statistics, Vol. 2, pp. 199–208. North-Holland, Amsterdam.

    Google Scholar 

  • McLachlan, G. J. (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.

    Google Scholar 

  • McLachlan, G. J. and Basford, K. E. (1988) Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.

    Google Scholar 

  • Olkin, I. and Tate, R. F. (1961) Multivariate correlation models with mixed discrete and continuous variables. Annals of Mathematical Statistics, 32, 448–65 (correction 39, 1358–9).

    Google Scholar 

  • Whittaker, J. (1990) Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.

    Google Scholar 

  • Wolfe, J. H. (1970) Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–50.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lawrence, C.J., Krzanowski, W.J. Mixture separation for mixed-mode data. Stat Comput 6, 85–92 (1996). https://doi.org/10.1007/BF00161577

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00161577

Keywords

Navigation