Advertisement

The Effects of Initial Values and the Covariance Structure on the Recovery of some Clustering Methods

  • Istvan Hajnal
  • Geert Loosveldt
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Some clustering methods are compared in a simulation study. The data used in the analysis are generated in a mixture modeling framework. The methods included are some hierarchical methods, A:-means as implemented in the FASTCLUS procedure of SAS and cluster analysis by means of normal mixtures with the NORMIX program. We demonstrate that the poor recovery found in some studies for normal mixture type of clustering is partly due to the use of bad initial values, and partly due to the specification of covariance structure within the cluster. We further find that an important factor in the relative success of FASTCLUS lies in the initial seed selection.

Keywords

Covariance Structure Cluster Centroid True Cluster Normal Mixture Hierarchical Cluster Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BAYNE, C.K., BEAUCHAMP, J.J., BEGOVICH, C.L. and KANE, V.E. (1980): Monte carlo comparisons of selected clustering procedures. Pattern Recognition, 12, 51–6.CrossRefGoogle Scholar
  2. DONOGHUE, J.R. (1995): The effects of within-group covariance structure on recovery in cluster analysis. I. The bivariate case. Multivariate Behavioral Research, 30(2):227–254.CrossRefGoogle Scholar
  3. EVERITT, B.S. (1974): Cluster Analysis. Heinemann Educational Books, London, UK.Google Scholar
  4. HUBERT, L. and ARABIE, P. (1985): Comparing partitions. Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
  5. MCLACHLAN, G.J. and BASFORD, K.E. (1988): Mixture Models. Inference and applications to Clustering. Marcel Dekker, New York.Google Scholar
  6. MEZZICH, J. E. (1978): Evaluating clustering methods for psychiatric diagnosis. Biological Psychiatry, 13(2), 265–281.Google Scholar
  7. MILLIGAN, G.W. (1980): An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45(3), 325–342.CrossRefGoogle Scholar
  8. MILLIGAN, G.W. (1981): A review of monte carlo tests of cluster analysis. Multivariate Behavioral Research, 16, 379–407.CrossRefGoogle Scholar
  9. MILLIGAN, G.W. (1996): Clustering validation: Results and implications for applied analysis. In: G. De Soete, P. Arabie and L.J. Hubert (Eds.): Clustering and Classification. World Scientific Publ., River Edge, NJ, 341–375.Google Scholar
  10. PRICE L.J. (1993): Identifying cluster overlap with normix population membership probabilities. Multivariate Behavorial Research, 28(2). 235–262Google Scholar
  11. SAS Institute Inc. (1989): SAS/STAT User’s Guide, Version 6, Fourth Edition, Volume 1, ANOVA-FREQ. SAS Institute, Cary, NC.Google Scholar
  12. WOLFE, J.H. (1970): Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–350.CrossRefGoogle Scholar
  13. WOLFE, J.H. (1978): Comparative cluster analysis of patterns of vocational interest. Multivariate Behavioral Research, 13, 33–44.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2000

Authors and Affiliations

  • Istvan Hajnal
    • 1
  • Geert Loosveldt
    • 1
  1. 1.Department of SociologyUniversity of LeuvenLeuvenBelgium

Personalised recommendations