Skip to main content

The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis

  • Conference paper
Book cover Data, Expert Knowledge and Decisions

Summary

Market researchers examining market segmentation and other aggregation issues can use cluster analysis to form segments of consumers or organizations. When the segments are formed using attitude information or even demographic data, the possibility of measurement error exists.

Previous research (Milligan and Cooper (1985)) had indicated two stopping rules for determining the number of clusters in a data set were superior in the error-free data sets examined. The present research reconfirmed the performance of the pseudo-t and pseudo-F statistics as the best rules in a larger number of replications of error-free data. In addition, the present research examined the performance of stopping rules in low-error and high-error conditions. Low-error would be representative of small measurement error in the data collection instrument or due to respondent error. High error is more severe and can obscure clusters due to the overlapping cluster boundaries.

As one would expect, the ability to recover the true cluster structure deteriorated as more error was introduced into the data. Some stopping rules had differing recovery at different numbers of clusters in the data sets. Two clusters were particularly difficult to recover.

The two best stopping rules for the error-free data were also clearly superior in the error pertubed conditions. Thus, these rules appear to be robust across the conditions tested here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baker FB, Hubert LJ (1972) Measuring the Power of Hierarchical Cluster Analysis. Journal of the American Statistical Association 70: 31–38

    Article  Google Scholar 

  • Beale EML (1969) Cluster Analysis. Scientific Control Systems, London

    Google Scholar 

  • Calinski RB, Harabasz JA (1974) A Dendrite Method for Cluster Analysis. Communicatons in Statistics 3: 1–27

    Article  Google Scholar 

  • Cooper MC (1987) The Effect of Measurement Error on Determining the Number of Clusters. Working Paper Series, College of Business, The Ohio State University, Columbus, Ohio

    Google Scholar 

  • Dalrymple-Alford EC (1970) The Measurement of Clustering in Free Recall. Psychological Bulletin 75: 32–34

    Article  Google Scholar 

  • Davies DL, Bouldin DWA (1979) A Cluster Seperation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1: 224–227

    Article  Google Scholar 

  • Duda RO, Hart PE (1973) Pattern Classification and Scene Analysis. Wiley, New York

    Google Scholar 

  • Frey T, Van Groenewoud HA (1972) A Cluster Analysis of the D-Squared Matrix of White Spruce Stands in Saskatechwan Based on the Maximum-Minimum Principle. Journal of Ecology 60: 873–886

    Article  Google Scholar 

  • Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 32: 241–254

    Article  Google Scholar 

  • Milligan GW (1985) An Algorithm for Generating Artificial Test Clusters. Psychometrika 501: 123–127

    Article  Google Scholar 

  • Milligan GW, Cooper MC (1985) An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 502: 159–179

    Article  Google Scholar 

  • Milligan GW, Cooper MC (1988) A Review of Clustering Methodology. Applied Psychological Measurement, in press

    Google Scholar 

  • Mojena R (1977) Hierarchical Grouping Methods and Stopping Rules: An Evaluation. The Computer Journal 20: 359–363

    Article  Google Scholar 

  • Punj G, Steward DW (1983) Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 20: 134–148

    Article  Google Scholar 

  • Rohlf FJ (1974) Methods of Comparing Classifiations. Annual Review of Ecology and Systematics 5: 101–113

    Article  Google Scholar 

  • Sarle WS (1983) Cubic Clustering Criterion. Technical Report A-108, Cary NC, SAS Institute

    Google Scholar 

  • Sneath PHA (1977) A Method for Testing the Distinctness of Clusters: A Test of the Disjunction of Two Clusters in Euclidean Space as Measured by Their Overlap. Mathematical Geology 9: 123–143

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Cooper, M.C., Milligan, G.W. (1988). The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis. In: Gaul, W., Schader, M. (eds) Data, Expert Knowledge and Decisions. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-73489-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-73489-2_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-73491-5

  • Online ISBN: 978-3-642-73489-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics