Psychometrika

, Volume 50, Issue 1, pp 123–127

An algorithm for generating artificial test clusters

  • Glenn W. Milligan
Computational Psychometrics

Abstract

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

Key words

Classification Monte Carlo methods numerical taxonomy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures.Pattern Recognition, 12, 51–62.Google Scholar
  2. Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 83, 377–388.Google Scholar
  3. Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data.Applied Psychological Measurement, 4, 57–64.Google Scholar
  4. Cormack, R. M. (1971). A review of classification.Journal of the Royal Statistical Society (Series A),14, 279–298.Google Scholar
  5. Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies.Pattern Recognition, 11, 235–254.Google Scholar
  6. Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody.Multivariate Behavioral Research, 14, 367–384.Google Scholar
  7. Everitt, B. S. (1980).Cluster analysis (2nd ed.). London: Halstead Press.Google Scholar
  8. Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.Google Scholar
  9. Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures.Biometrika, 31, 86–101.Google Scholar
  10. Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms.Psychometrika, 45, 325–342.Google Scholar
  11. Milligan, G. W. (1981a). A Monte Carlo study of thirty internal criterion measures for cluster analysis.Psychometrika, 46, 187–199.Google Scholar
  12. Milligan, G. W. (1981b). A review of Monte Carlo tests of cluster analysis.Multivariate Behavioral Research, 16, 379–407.Google Scholar
  13. Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set.Psychometrika, 50.Google Scholar
  14. Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms.Pattern Recognition, 12, 41–50.Google Scholar
  15. Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects.Decision Sciences, 11, 669–677.Google Scholar
  16. Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures.Multivariate Behavioral Research.Google Scholar
  17. Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure.IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.Google Scholar
  18. Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation.The Computer Journal, 20, 359–363.Google Scholar
  19. Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement.Educational and Psychological Measurement, 44, 33–37.Google Scholar

Copyright information

© The Psychometric Society 1985

Authors and Affiliations

  • Glenn W. Milligan
    • 1
  1. 1.Faculty of Management SciencesThe Ohio State UniversityColumbus

Personalised recommendations