Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix

Bozdogan, Hamparsum

doi:10.1007/978-3-642-50974-2_5

Hamparsum Bozdogan⁷

Part of the book series: Studies in Classification, Data Analysis and Knowledge Organization ((STUDIES CLASS))

608 Accesses
76 Citations

Abstract

This paper considers the problem of choosing the number of component clusters of individuals within the context of the standard mixture of multivariate normal distributions. Often the number of mixture clusters K is unknown, but varying and needs to be estimated. A two-stage iterative maximum-likelihood procedure is used as a clustering criterion to estimate the parameters of the mixture-model under several different covariance structures. An approximate component-wise inverse-Fisher information (IFIM) for the mixture-model is obtained. Then the informational complexity (ICOMP) criterion of IFIM of this author (Bozdogan 1988, 1990a, 1990b) is derived and proposed as a new criterion for choosing the number of clusters in the mixture-model. For comparative purposes, Akaike’s (1973) information criterion (AIC), and Rissanen’s (1978) minimum description length (MDL) criterion are also introduced and derived for the mixture-model. Numerical examples are shown on simulated multivariate normal data sets with a known number of mixture clusters to illustrate the significance of ICOMP in choosing the number of clusters and the best fitting model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike, H. (1973), Information Theory and an Extension of the Maximum Likelihood Principle, in: B. N. Pretrov and F. Csaki (eds.), Second International Symposium on Information Theory, Academiai Kiado Budapest, 267-281.
Google Scholar
Binder, D.A. (1978), Bayesian Cluster Analysis, Biometrika, 65, 31–38.
Article Google Scholar
Bock, H. H. (1981), Statistical Testing and Evaluation Methods in Cluster Analysis, in the Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on: Statistics: Applications and New Directions, J. K. Gosh and J. Roy (eds.) December 16–19, Calcutta, 116-146.
Google Scholar
Bozdogan, H. (1981), Multi-Sample Cluster Analysis and Approaches to Validity Studies in Clustering Individuals, Ph.D. thesis, Department of Mathematics, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Bozdogan, H. (1983), Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria, Technical Report No. UIC/DQM/A83-1, June 16, 1983, ARO Contract DAAG29-82-k-0155, Quantitative Methods Department, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Bozdogan, H. (1987), Model Slection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions, Psychometrika, Vol. 52, No. 3, 1987, Special Section (invited paper), 345-370.
Google Scholar
Bozdogan, H. (1988), ICOMP: A New Model Selection Criterion, in: Hans H. Bock (ed.), Classification and Related Methods of Data Analysis, North-Holland, Amsterdam, April, 599–608.
Google Scholar
Bozdogan, H. (1990a), On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models, Communications in Statistics, Theory and Methods, 19(1), 221–278.
Article Google Scholar
Bozdogan, H. (1990b), Multisample Cluster Analysis of the Common Principle Component Model in K Groups Using an Entropic Statistical Complexity Criterion, invited paper presented at the International Symposium on Theory and Practice of Classification, December 16–19, Puschino, Soviet Union.
Google Scholar
Bozdogan, H. (1992), Mixture-Model Cluster Analysis and Choosing the Number of Clusters Using a New Informational Complexity ICOMP, AIC, and MDL Model-Selection Criteria, invited paper presented at the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, May 24–29, The University of Tennessee, Knoxville, TN 37996, USA. To appear in: H. Bozdogan (ed.), Multivariate Statistical Modeling, Vol. II, Kluwer Academic Publishers, Holland, Dordrecht.
Google Scholar
Day, N.E. (1969), Estimating the Components of a Mixture of Normal Distributions, Biometrika, 11, 235–254.
Google Scholar
Everitt, B.S., and Hand, D.J. (1981), Finite Mixture Distributions, Champman and Hall, New York.
Book Google Scholar
Hartigan, J.A. (1975), Clustering Algorithms, John Wiley & Sons, New York.
Google Scholar
Hartigan, J.A. (1977), Distribution Problems in Clustering, in: J. Van Ryzin (ed.), Classification and Clustering, Academic Press, New York, 45–71.
Google Scholar
John, S. (1970), On Identifying the Population of Origin of Each Observation in a Mixture of Observation from Two Normal Populations, Technometrics, 12, 553–563
Article Google Scholar
Kullback, S., and Leibler, R.A. (1951), On Information and Sufficiency, Ann. Math. Statist., 22, 79–86.
Article Google Scholar
Magnus, J.R. (1989), Linear Structures, Oxford University Press, New York.
Google Scholar
Magnus, J.R. (1989), Personal correspondence.
Google Scholar
Magnus, J.R., and Neudecker, H. (1988), Matrix Differential Calculus with Applications in Statistics and Economitrics, John Wiley & Sons, New York.
Google Scholar
Maklad, M.S., and Nichols, T. (1980), A New Approach to Model Structure Discrimination, IEEE Trans, on Systems, Man, and Cybernetics, SMC-10, No. 2, 78–84.
Article Google Scholar
Mclachlan, G.L., and Basford, K.E. (1988), Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York.
Google Scholar
Rissanen, J. (1976), Minmax Entropy Estimation of Models for Vector Processes, in: R.K. Mehra and D.G. Lainiotis (eds.), System Identification, Academic Press, New York, 97–119.
Google Scholar
Rissanen, J. (1978), Modeling by Shortest Data Description, Automatica, Vol. 14, 465–471.
Article Google Scholar
Rissanen, J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific Publishing Company, Teaneck, New Jersey.
Google Scholar
Sclove, S.L. (1977), Population Mixture Models and Clustering Algorithms, Communications in Statistics, Theory and Methods, A6, 417–434.
Article Google Scholar
Sclove, S.L. (1982), Application of the Conditional Population Mixture Model to Image Segmentation, Technical Report A82-1, 1982, ARO Contract DAAG29-82-K-0155, University of Illinois at Chicago, Chicago, Illinois 60680.
Google Scholar
Scott, S.L., and Symons, M.J. (1971), Clustering Methods Based on Likelihood Ratio Criteria, Biometrics, 27, 389–397.
Google Scholar
Symons, M. J. (1981), Clustering Criteria and Multivariate Normal Mixtures, Biometrics, 37, 35–43.
Article Google Scholar
Titterington, D.M., Smith, A.M.F., and Markov, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, New York.
Google Scholar
Van Emden, M.H. (1971), An Anlysis of Complexity, Mathematical Center Tracts, 35, Amsterdam.
Google Scholar
Wolfe, J.H. (1967), Normix: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions, Research Memorandum, SRM 68-2, U.S. Naval Personnel Research Activity, San Diego, California.
Google Scholar
Wolfe, J.H. (1970), Pattern Clustering by Multivariate Mixture Analysis, Multivariate Behavioral Res., 5, 329–350.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, The University of Tennessee, Knoxville, TN, 37996-0532, USA
Hamparsum Bozdogan

Authors

Hamparsum Bozdogan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Mathematische Methoden der Wirtschaftswissenschaften, Universität Augsburg, Universitätsstr. 2, D-86135, Augsburg, Germany
Otto Opitz
Forschungsinstitut für Kinderernährung, Heinstück 11, D-44225, Dortmund, Germany
Berthold Lausen
Abteilung für Medizinische Informatik, Universitäts-Klinikum Freiburg, Stefan-Meier-Str. 26, D-79104, Freiburg, Germany
Rüdiger Klar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bozdogan, H. (1993). Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: Opitz, O., Lausen, B., Klar, R. (eds) Information and Classification. Studies in Classification, Data Analysis and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-50974-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-50974-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56736-3
Online ISBN: 978-3-642-50974-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics