Advertisement

The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis

  • M. C. Cooper
  • G. W. Milligan

Summary

Market researchers examining market segmentation and other aggregation issues can use cluster analysis to form segments of consumers or organizations. When the segments are formed using attitude information or even demographic data, the possibility of measurement error exists.

Previous research (Milligan and Cooper (1985)) had indicated two stopping rules for determining the number of clusters in a data set were superior in the error-free data sets examined. The present research reconfirmed the performance of the pseudo-t and pseudo-F statistics as the best rules in a larger number of replications of error-free data. In addition, the present research examined the performance of stopping rules in low-error and high-error conditions. Low-error would be representative of small measurement error in the data collection instrument or due to respondent error. High error is more severe and can obscure clusters due to the overlapping cluster boundaries.

As one would expect, the ability to recover the true cluster structure deteriorated as more error was introduced into the data. Some stopping rules had differing recovery at different numbers of clusters in the data sets. Two clusters were particularly difficult to recover.

The two best stopping rules for the error-free data were also clearly superior in the error pertubed conditions. Thus, these rules appear to be robust across the conditions tested here.

Keywords

Statistical Analysis System Multivariate Normal Distribution Hierarchy Level Cluster Criterion Hierarchical Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baker FB, Hubert LJ (1972) Measuring the Power of Hierarchical Cluster Analysis. Journal of the American Statistical Association 70: 31–38CrossRefGoogle Scholar
  2. Beale EML (1969) Cluster Analysis. Scientific Control Systems, LondonGoogle Scholar
  3. Calinski RB, Harabasz JA (1974) A Dendrite Method for Cluster Analysis. Communicatons in Statistics 3: 1–27CrossRefGoogle Scholar
  4. Cooper MC (1987) The Effect of Measurement Error on Determining the Number of Clusters. Working Paper Series, College of Business, The Ohio State University, Columbus, OhioGoogle Scholar
  5. Dalrymple-Alford EC (1970) The Measurement of Clustering in Free Recall. Psychological Bulletin 75: 32–34CrossRefGoogle Scholar
  6. Davies DL, Bouldin DWA (1979) A Cluster Seperation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1: 224–227CrossRefGoogle Scholar
  7. Duda RO, Hart PE (1973) Pattern Classification and Scene Analysis. Wiley, New YorkGoogle Scholar
  8. Frey T, Van Groenewoud HA (1972) A Cluster Analysis of the D-Squared Matrix of White Spruce Stands in Saskatechwan Based on the Maximum-Minimum Principle. Journal of Ecology 60: 873–886CrossRefGoogle Scholar
  9. Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 32: 241–254CrossRefGoogle Scholar
  10. Milligan GW (1985) An Algorithm for Generating Artificial Test Clusters. Psychometrika 501: 123–127CrossRefGoogle Scholar
  11. Milligan GW, Cooper MC (1985) An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 502: 159–179CrossRefGoogle Scholar
  12. Milligan GW, Cooper MC (1988) A Review of Clustering Methodology. Applied Psychological Measurement, in pressGoogle Scholar
  13. Mojena R (1977) Hierarchical Grouping Methods and Stopping Rules: An Evaluation. The Computer Journal 20: 359–363CrossRefGoogle Scholar
  14. Punj G, Steward DW (1983) Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 20: 134–148CrossRefGoogle Scholar
  15. Rohlf FJ (1974) Methods of Comparing Classifiations. Annual Review of Ecology and Systematics 5: 101–113CrossRefGoogle Scholar
  16. Sarle WS (1983) Cubic Clustering Criterion. Technical Report A-108, Cary NC, SAS InstituteGoogle Scholar
  17. Sneath PHA (1977) A Method for Testing the Distinctness of Clusters: A Test of the Disjunction of Two Clusters in Euclidean Space as Measured by Their Overlap. Mathematical Geology 9: 123–143CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1988

Authors and Affiliations

  • M. C. Cooper
    • 1
  • G. W. Milligan
    • 1
  1. 1.Faculty of Marketing, Faculty of Management SciencesThe Ohio State UniversityColumbusUSA

Personalised recommendations