Skip to main content

Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix

  • Conference paper
Book cover Information and Classification

Part of the book series: Studies in Classification, Data Analysis and Knowledge Organization ((STUDIES CLASS))

Abstract

This paper considers the problem of choosing the number of component clusters of individuals within the context of the standard mixture of multivariate normal distributions. Often the number of mixture clusters K is unknown, but varying and needs to be estimated. A two-stage iterative maximum-likelihood procedure is used as a clustering criterion to estimate the parameters of the mixture-model under several different covariance structures. An approximate component-wise inverse-Fisher information (IFIM) for the mixture-model is obtained. Then the informational complexity (ICOMP) criterion of IFIM of this author (Bozdogan 1988, 1990a, 1990b) is derived and proposed as a new criterion for choosing the number of clusters in the mixture-model. For comparative purposes, Akaike’s (1973) information criterion (AIC), and Rissanen’s (1978) minimum description length (MDL) criterion are also introduced and derived for the mixture-model. Numerical examples are shown on simulated multivariate normal data sets with a known number of mixture clusters to illustrate the significance of ICOMP in choosing the number of clusters and the best fitting model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Akaike, H. (1973), Information Theory and an Extension of the Maximum Likelihood Principle, in: B. N. Pretrov and F. Csaki (eds.), Second International Symposium on Information Theory, Academiai Kiado Budapest, 267-281.

    Google Scholar 

  • Binder, D.A. (1978), Bayesian Cluster Analysis, Biometrika, 65, 31–38.

    Article  Google Scholar 

  • Bock, H. H. (1981), Statistical Testing and Evaluation Methods in Cluster Analysis, in the Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on: Statistics: Applications and New Directions, J. K. Gosh and J. Roy (eds.) December 16–19, Calcutta, 116-146.

    Google Scholar 

  • Bozdogan, H. (1981), Multi-Sample Cluster Analysis and Approaches to Validity Studies in Clustering Individuals, Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Bozdogan, H. (1983), Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria, Technical Report No. UIC/DQM/A83-1, June 16, 1983, ARO Contract DAAG29-82-k-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Bozdogan, H. (1987), Model Slection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrika, Vol. 52, No. 3, 1987, Special Section (invited paper), 345-370.

    Google Scholar 

  • Bozdogan, H. (1988), ICOMP: A New Model Selection Criterion, in: Hans H. Bock (ed.), Classification and Related Methods of Data Analysis, North-Holland, Amsterdam, April, 599–608.

    Google Scholar 

  • Bozdogan, H. (1990a), On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models, Communications in Statistics, Theory and Methods, 19(1), 221–278.

    Article  Google Scholar 

  • Bozdogan, H. (1990b), Multisample Cluster Analysis of the Common Principle Component Model in K Groups Using an Entropic Statistical Complexity Criterion, invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.

    Google Scholar 

  • Bozdogan, H. (1992), Mixture-Model Cluster Analysis and Choosing the Number of Clusters Using a New Informational Complexity ICOMP, AIC, and MDL Model-Selection Criteria, invited paper presented at the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, May 24–29, The University of Tennessee, Knoxville, TN 37996, USA. To appear in: H. Bozdogan (ed.), Multivariate Statistical Modeling, Vol. II, Kluwer Academic Publishers, Holland, Dordrecht.

    Google Scholar 

  • Day, N.E. (1969), Estimating the Components of a Mixture of Normal Distributions, Biometrika, 11, 235–254.

    Google Scholar 

  • Everitt, B.S., and Hand, D.J. (1981), Finite Mixture Distributions, Champman and Hall, New York.

    Book  Google Scholar 

  • Hartigan, J.A. (1975), Clustering Algorithms, John Wiley & Sons, New York.

    Google Scholar 

  • Hartigan, J.A. (1977), Distribution Problems in Clustering, in: J. Van Ryzin (ed.), Classification and Clustering, Academic Press, New York, 45–71.

    Google Scholar 

  • John, S. (1970), On Identifying the Population of Origin of Each Observation in a Mixture of Observation from Two Normal Populations, Technometrics, 12, 553–563

    Article  Google Scholar 

  • Kullback, S., and Leibler, R.A. (1951), On Information and Sufficiency, Ann. Math. Statist., 22, 79–86.

    Article  Google Scholar 

  • Magnus, J.R. (1989), Linear Structures, Oxford University Press, New York.

    Google Scholar 

  • Magnus, J.R. (1989), Personal correspondence.

    Google Scholar 

  • Magnus, J.R., and Neudecker, H. (1988), Matrix Differential Calculus with Applications in Statistics and Economitrics, John Wiley & Sons, New York.

    Google Scholar 

  • Maklad, M.S., and Nichols, T. (1980), A New Approach to Model Structure Discrimination, IEEE Trans, on Systems, Man, and Cybernetics, SMC-10, No. 2, 78–84.

    Article  Google Scholar 

  • Mclachlan, G.L., and Basford, K.E. (1988), Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York.

    Google Scholar 

  • Rissanen, J. (1976), Minmax Entropy Estimation of Models for Vector Processes, in: R.K. Mehra and D.G. Lainiotis (eds.), System Identification, Academic Press, New York, 97–119.

    Google Scholar 

  • Rissanen, J. (1978), Modeling by Shortest Data Description, Automatica, Vol. 14, 465–471.

    Article  Google Scholar 

  • Rissanen, J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Company, Teaneck, New Jersey.

    Google Scholar 

  • Sclove, S.L. (1977), Population Mixture Models and Clustering Algorithms, Communications in Statistics, Theory and Methods, A6, 417–434.

    Article  Google Scholar 

  • Sclove, S.L. (1982), Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82-1, 1982, ARO Contract DAAG29-82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.

    Google Scholar 

  • Scott, S.L., and Symons, M.J. (1971), Clustering Methods Based on Likelihood Ratio Criteria, Biometrics, 27, 389–397.

    Google Scholar 

  • Symons, M. J. (1981), Clustering Criteria and Multivariate Normal Mixtures, Biometrics, 37, 35–43.

    Article  Google Scholar 

  • Titterington, D.M., Smith, A.M.F., and Markov, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, New York.

    Google Scholar 

  • Van Emden, M.H. (1971), An Anlysis of Complexity, Mathematical Center Tracts, 35, Amsterdam.

    Google Scholar 

  • Wolfe, J.H. (1967), Normix: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions, Research Memorandum, SRM 68-2, U.S. Naval Personnel Research Activity, San Diego, California.

    Google Scholar 

  • Wolfe, J.H. (1970), Pattern Clustering by Multivariate Mixture Analysis, Multivariate Behavioral Res., 5, 329–350.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Bozdogan, H. (1993). Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: Opitz, O., Lausen, B., Klar, R. (eds) Information and Classification. Studies in Classification, Data Analysis and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-50974-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-50974-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56736-3

  • Online ISBN: 978-3-642-50974-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics