Abstract
This paper introduces Two-Stage Multi-Sample Cluster Analysis (TSMSCA), i.e., the problem of grouping samples and improving upon homogeneity via reassigning individual objects, as a general approach to ‘classical’ discriminant analysis (DA).
Akaike’s Information Criterion (AIC) and Bozdogan’s CAIC are derived and used in TSMSCA to choose the best fitting model and the best partition among all possible clustering alternatives. With this approach the dimension of the discriminant space is determined, and using a decision-tree classifier, the best lower dimensional models are identified, yielding a hierarchy of efficient separation and assignment rules. On each step of the hierarchy, the performance of the classification of the best discriminant model is evaluated either by a cross-validation method or the method of conditional clustering.
Cross-validation reassigns one object at a time based only on the tentatively updated model, whereas the conditional clustering method actually executes reassignments of objects via a transfer and swapping algorithm given the best discriminant model as the initial partition.
Numerical examples are carried out on real data sets to demonstrate the generality and versatility of the proposed new approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H. (1973). ‘Information Theory and an Extension of the Maximum Likelihood Principle’ in Second International Symposium on Information Theory, (B.N. Petrov and F. Csaki, editors). Akademiai Kiado: Budapest, 267–281.
Akaike, H. (1974). ‘A New Look at the Statistical Model Identification,’ IEEE Transactions on Automatic Control AC-19, 716–723.
Akaike, H. (1977). ‘On Entropy Maximization Principle’ in Proceedings on Applications of Statistics (P.R. Krishnaiah, editor). North-Holland: Amsterdam, 27–47.
Akaike, H. (1979). ‘A Bayesian Analysis of the Minimum AIC Procedure,’ Annals of the Institute of Statistical Mathematics (Part A) 30, 9–14.
Akaike, H. (1981). ‘Likelihood of a Model and Information Criteria,’ Journal of Econometrics 16, 3–14.
Andrews, D.F., and Herzberg, A.M. (1985). Data. A Collection of Problems from Many Fields for the Student and Research Worker. Springer: New York.
Banfield, C.F., and Bassill, L.C. (1977). ‘Algorithm AS 113: A Transfer Algorithm for Non-hierarchical Classification,’ Applied Statistics 26, 206–210.
Box, G.E.P. (1949). ‘A General Distribution Theory for a Class of Likelihood Criteria,’ Biometrika 36, 317–346.
Box, G.E.P., and Cox, D.R. (1964). ‘An Analysis of Transformations,’ (with discussion), Journal of the Royal Statistical Society (B) 26, 211–252.
Bozdogan, H. (1983). ‘Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria,’ Technical Report No. UIC/DQM/A83-1, June 16, 1983, Army Research Office Contract DAAG29-82-K-0155, University of Illinois at Chicago, Box 4348, Chicago, Illinois 60680.
Bozdogan, H. (1984). ‘AIC-Replacements for Multivariate Multi-Sample Conventional Tests of Homogeneity Models,’ Technical Paper #4 in Statistics, Department of Mathematics, University of Virginia, Charlottesville, VA, 22903.
Bozdogan, H. (1986). ‘Multi-Sample Cluster Analysis as a General Alternative to Multiple Comparison Procedures,’ Bulletin of Informatics and Cybernetics Research Association of Statistical Sciences 22, 95–130.
Bozdogan, H. (1987). ‘Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions,’ (to appear in the Special Issue of Psychometrika).
Bozdogan, H., and Sclove, S.L. (1984). ‘Multi-Sample Cluster Analysis Using Akaike’s Information Criterion,’ Annals of the Institute of Statistical Mathematics (Part B) 36, 243–253.
Duran, B.S., and Odell, P.L. (1974). Cluster Analysis: A Survey. Springer: New York.
Eisenblätter, D. (1987). Two-Stage Multi-Sample Cluster Analysis, Ph.D. Thesis (anticipated), Seminar für Wirtschafts- und Sozialstatistik der Universität zu Köln.
Fahrmeir, L., and Hamerle, A., editors (1984). Multivariate statistische Verfahren, de Gruyter: Berlin.
Fisher, R.A. (1936). ‘The Use of Multiple Measurements in Taxonomic Problems,’ Annals of Eugenics 7, 179–188.
Ganesalingam, S., and McLachlan, G.J. (1979). ‘A Case Study of Two Clustering Methods Based on Maximum Likelihood,’ Statistical Neerlandica 33, 81–90.
Johnson, R.A., and Wichern, D. (1983). Applied Multivariate Statistical Analysis. Prentice Hall: New York.
Lachenbruch, P.A. (1975). Discriminant Analysis. Hafner Press: New York.
Lachenbruch, P.A., and Mickey, M.R. (1968). ‘Estimation of Error Rates in Discriminant Analysis,’ Technometrics 10, 1–11.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis. Academic Press: New York.
Schwarz, G. (1978). ‘Estimating the Dimension of a Model,’ Annals of Statistics 6, 461–464.
Sclove, S.C. (1977). ‘Population Mixture Models and Clustering Algorithms,’ Communications in Statistics A 6, 417–434.
Sclove, S.C. (1983). ‘Application of the Conditional Population Mixture Model to Image Segmentation,’ IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 428–433.
Seber, G.A. (1984). Multivariate Observations. Wiley: New York.
Späth, H. (1975). Cluster-Analyse-Algorithmen. Oldenbourg: München.
Späth, H. (1983). Cluster-Formation und -Analyse. Oldenbourg: München.
Symons, M.J. (1981). ‘Clustering Criteria and Multivariate Normal Mixtures,’ Biometrics 37, 35–43.
Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley: New York.
Wilks, S.S. (1932). ‘Certain Generalization in the Analysis of Variance,’ Biometrika 24, 471–494.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 D. Reidel Publishing Company, Dordrecht, Holland
About this chapter
Cite this chapter
Eisenblätter, D., Bozdogan, H. (1987). Two-Stage Multi-Sample Cluster Analysis as a General Approach to Discriminant Analysis. In: Bozdogan, H., Gupta, A.K. (eds) Multivariate Statistical Modeling and Data Analysis. Theory and Decision Library, vol 8. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-3977-6_6
Download citation
DOI: https://doi.org/10.1007/978-94-009-3977-6_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-8264-8
Online ISBN: 978-94-009-3977-6
eBook Packages: Springer Book Archive