The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis

Cooper, M. C.; Milligan, G. W.

doi:10.1007/978-3-642-73489-2_27

M. C. Cooper³ &
G. W. Milligan³

140 Accesses
25 Citations

Summary

Market researchers examining market segmentation and other aggregation issues can use cluster analysis to form segments of consumers or organizations. When the segments are formed using attitude information or even demographic data, the possibility of measurement error exists.

Previous research (Milligan and Cooper (1985)) had indicated two stopping rules for determining the number of clusters in a data set were superior in the error-free data sets examined. The present research reconfirmed the performance of the pseudo-t and pseudo-F statistics as the best rules in a larger number of replications of error-free data. In addition, the present research examined the performance of stopping rules in low-error and high-error conditions. Low-error would be representative of small measurement error in the data collection instrument or due to respondent error. High error is more severe and can obscure clusters due to the overlapping cluster boundaries.

As one would expect, the ability to recover the true cluster structure deteriorated as more error was introduced into the data. Some stopping rules had differing recovery at different numbers of clusters in the data sets. Two clusters were particularly difficult to recover.

The two best stopping rules for the error-free data were also clearly superior in the error pertubed conditions. Thus, these rules appear to be robust across the conditions tested here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baker FB, Hubert LJ (1972) Measuring the Power of Hierarchical Cluster Analysis. Journal of the American Statistical Association 70: 31–38
Article Google Scholar
Beale EML (1969) Cluster Analysis. Scientific Control Systems, London
Google Scholar
Calinski RB, Harabasz JA (1974) A Dendrite Method for Cluster Analysis. Communicatons in Statistics 3: 1–27
Article Google Scholar
Cooper MC (1987) The Effect of Measurement Error on Determining the Number of Clusters. Working Paper Series, College of Business, The Ohio State University, Columbus, Ohio
Google Scholar
Dalrymple-Alford EC (1970) The Measurement of Clustering in Free Recall. Psychological Bulletin 75: 32–34
Article Google Scholar
Davies DL, Bouldin DWA (1979) A Cluster Seperation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1: 224–227
Article Google Scholar
Duda RO, Hart PE (1973) Pattern Classification and Scene Analysis. Wiley, New York
Google Scholar
Frey T, Van Groenewoud HA (1972) A Cluster Analysis of the D-Squared Matrix of White Spruce Stands in Saskatechwan Based on the Maximum-Minimum Principle. Journal of Ecology 60: 873–886
Article Google Scholar
Johnson SC (1967) Hierarchical Clustering Schemes. Psychometrika 32: 241–254
Article Google Scholar
Milligan GW (1985) An Algorithm for Generating Artificial Test Clusters. Psychometrika 501: 123–127
Article Google Scholar
Milligan GW, Cooper MC (1985) An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 502: 159–179
Article Google Scholar
Milligan GW, Cooper MC (1988) A Review of Clustering Methodology. Applied Psychological Measurement, in press
Google Scholar
Mojena R (1977) Hierarchical Grouping Methods and Stopping Rules: An Evaluation. The Computer Journal 20: 359–363
Article Google Scholar
Punj G, Steward DW (1983) Cluster Analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research 20: 134–148
Article Google Scholar
Rohlf FJ (1974) Methods of Comparing Classifiations. Annual Review of Ecology and Systematics 5: 101–113
Article Google Scholar
Sarle WS (1983) Cubic Clustering Criterion. Technical Report A-108, Cary NC, SAS Institute
Google Scholar
Sneath PHA (1977) A Method for Testing the Distinctness of Clusters: A Test of the Disjunction of Two Clusters in Euclidean Space as Measured by Their Overlap. Mathematical Geology 9: 123–143
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Marketing, Faculty of Management Sciences, The Ohio State University, 1775 College Road, Columbus, Ohio, 43210, USA
M. C. Cooper & G. W. Milligan

Authors

M. C. Cooper
View author publications
You can also search for this author in PubMed Google Scholar
G. W. Milligan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), Kollegiengebäude am Schloß, Bau III, 7500, Karlsruhe 1, Germany
Wolfgang Gaul
Institut für Informatik, Universität der Bundeswehr, Holstenhofweg 85, 2000, Hamburg 70, Germany
Martin Schader

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cooper, M.C., Milligan, G.W. (1988). The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis. In: Gaul, W., Schader, M. (eds) Data, Expert Knowledge and Decisions. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-73489-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-73489-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-73491-5
Online ISBN: 978-3-642-73489-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics