Psychometrika

, Volume 45, Issue 3, pp 325–342

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

  • Glenn W. Milligan
Article
  • 963 Downloads

Abstract

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.

Key words

clustering algorithms clustering validation Monte Carlo research 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference Notes

  1. Dudewicz, E. J.IRCCRAND-The Ohio State University random number generator package (Tech. Rep. No. 104). Columbus, Ohio: The Ohio State University, Department of Statistics, 1974.Google Scholar
  2. Learmonth, G. P., & Lewis, P. A. W.Naval Postgraduate School random number generator package LLRANDOM (Tech. Rep. NP S55LW73061A). Monterey, Calif.: Naval Postgraduate School, Department of Operations Research and Administrative Sciences, 1973.Google Scholar

References

  1. Anderberg, M. R.Cluster analysis for applications. New York: Academic Press, 1973.Google Scholar
  2. Baker, F. B. Stability of two hierarchical grouping techniques Case I: Sensitivity to data errors.Journal of the American Statistical Association, 1974,69, 440–445.Google Scholar
  3. Bartko, J. J., Straus, J. S., & Carpenter, W. T. An evaluation of taxometric techniques for psychiatric data.Classification Society Bulletin, 1971,2, 2–28.Google Scholar
  4. Blashfield, R. K. Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 1976,83, 377–388.Google Scholar
  5. Bromley, D. B. Rank order cluster analysis.British Journal of Mathematical and Statistical Psychology, 1966,19, 105–123.Google Scholar
  6. Cattel, R. B.r p and other coefficients of pattern similarity.Psychometrika, 1949,14, 279–298.Google Scholar
  7. Cormack, R. M. A review of classification.Journal of the Royal Statistical Society (Series A), 1971,134, 321–367.Google Scholar
  8. Cronbach, L. J., & Gleser, G. C. Assessing the similarity between profiles.Psychological Bulletin, 1953,50, 456–473.Google Scholar
  9. Cunningham, K. M., & Ogilvie, J. C. Evaluation of hierarchical grouping techniques: A preliminary study.Computer Journal, 1972,15, 209–213.Google Scholar
  10. D'Andrade, R. G.U-statistic hierarchical clustering.Psychometrika, 1978,43, 59–67.Google Scholar
  11. Dudewicz, E. J. Speed and quality of random numbers for simulation.Journal of Quality Technology, 1976,8, 171–178.Google Scholar
  12. Edelbrock, C. Comparing the accuracy of hierarchical clustering algorithms: The problem of classifying everybody.Multivariate Behavioral Research, 1979,14, 367–384.Google Scholar
  13. Everitt, B. S.Cluster analysis. London: Halstead Press, 1974.Google Scholar
  14. Fleiss, L., & Zubin, J. On the methods and theory of clustering.Multivariate Behavioral Research, 1969,4, 235–250.Google Scholar
  15. Friedman, H. P., & Rubin, J. On some invariant criteria for grouping data.Journal of the American Statistical Association, 1967,62, 1159–1178.Google Scholar
  16. Hartigan, J. A.Clustering algorithms. New York: Wiley, 1975.Google Scholar
  17. Helmstadter, G. An empirical comparison of methods for estimating profile similarity.Educational and Psychological Measurement, 1957,17, 71–82.Google Scholar
  18. Hubert, L. J., & Levin, J. R. Evaluating object set partitions: Free sort analysis and some generalizations.Journal of Verbal Learning and Verbal Behavior, 1976,15, 459–470.Google Scholar
  19. Jardine, N., & Sibson, R.Mathematical taxonomy. New York: Wiley, 1971.Google Scholar
  20. Johnson, S. C. Hierarchical clustering schemes.Psychometrika, 1967,32, 241–254.Google Scholar
  21. Kuiper, F. K., & Fisher, L. A Monte Carlo comparison of six clustering procedures.Biometrics, 1975,31, 777–783.Google Scholar
  22. Levinsohn, J. R., & Funk, S. G. CLUSTER-Hierarchical clustering program for large data sets (N greater than 100).Behavior Research Methods and Instrumentation, 1973,5, 432.Google Scholar
  23. Mezich, J. E. An evaluation of quantitative taxonomic methods (Doctral dissertation, The Ohio State University, 1975).Dissertation Abstracts International, 1975,36, 3008-B. (University Microfilms No. 75-26, 616).Google Scholar
  24. Milligan, G. W. An examination of the effect of error perturbation of constructed data on fifteen clustering algorithms (Doctoral dissertation, The Ohio State University, 1978).Dissertation Abstracts International, 1979,40, 4010B-4011B. (University Microfilms No. 7902188).Google Scholar
  25. Milligan, G. W. Ultrametric hierarchical clustering algorithms.Psychometrika, 1979,44, 343–346.Google Scholar
  26. Milligan, G. W., & Isaac, P. D. The validation of four ultrametric clustering algorithms.Pattern Recognition, 1980,12, 41–50.Google Scholar
  27. Peay, E. R. Nonmetric grouping: Clusters and cliques.Psychometrika, 1975,40, 297–313.Google Scholar
  28. Rand, W. M. Objective criteria for the evaluation of clustering methods.Journal of the American Statistical Association, 1971,66, 846–850.Google Scholar
  29. Rohlf, F. J. Methods of comparing classifications.Annual Review of Ecology and Systematics, 1974,5, 101–113.Google Scholar
  30. Shepard, R. N. Representation of structure in similarity data: Problems and prospects.Psychometrika, 1974,39, 373–421.Google Scholar
  31. Sneath, P. H. A. A comparison of different clustering methods as applied to randomly-spaced points.Classification Society Bulletin, 1966,1, 2–18.Google Scholar
  32. Sneath, P. H. A. Evaluation of clustering methods. In A. J. Cole (Ed.),Numerical taxonomy, New York: Academic Press, 1969.Google Scholar
  33. Sneath, P. H. A., & Sokal, R. R.Numerical taxonomy, San Francisco: Freeman, 1973.Google Scholar
  34. Williams, W. T., Lance, G. N., Dale, M. B., & Clifford, H. T. Controversy concerning the criteria for taxonometric strategies.Computer Journal, 1971,14, 162–165.Google Scholar
  35. Zahn, C. T. Graph theory methods for detecting and describing Gestalt clusters.IEEE Transactions on Computers, 1971,C-20, 68–86.Google Scholar

Copyright information

© The Psychometric Society 1980

Authors and Affiliations

  • Glenn W. Milligan
    • 1
  1. 1.Faculty of Management SciencesThe Ohio State UniversityColumbus

Personalised recommendations