The Effect of Measurement Error on Determining the Number of Clusters in Cluster Analysis
Market researchers examining market segmentation and other aggregation issues can use cluster analysis to form segments of consumers or organizations. When the segments are formed using attitude information or even demographic data, the possibility of measurement error exists.
Previous research (Milligan and Cooper (1985)) had indicated two stopping rules for determining the number of clusters in a data set were superior in the error-free data sets examined. The present research reconfirmed the performance of the pseudo-t and pseudo-F statistics as the best rules in a larger number of replications of error-free data. In addition, the present research examined the performance of stopping rules in low-error and high-error conditions. Low-error would be representative of small measurement error in the data collection instrument or due to respondent error. High error is more severe and can obscure clusters due to the overlapping cluster boundaries.
As one would expect, the ability to recover the true cluster structure deteriorated as more error was introduced into the data. Some stopping rules had differing recovery at different numbers of clusters in the data sets. Two clusters were particularly difficult to recover.
The two best stopping rules for the error-free data were also clearly superior in the error pertubed conditions. Thus, these rules appear to be robust across the conditions tested here.
Unable to display preview. Download preview PDF.
- Beale EML (1969) Cluster Analysis. Scientific Control Systems, LondonGoogle Scholar
- Cooper MC (1987) The Effect of Measurement Error on Determining the Number of Clusters. Working Paper Series, College of Business, The Ohio State University, Columbus, OhioGoogle Scholar
- Duda RO, Hart PE (1973) Pattern Classification and Scene Analysis. Wiley, New YorkGoogle Scholar
- Milligan GW, Cooper MC (1988) A Review of Clustering Methodology. Applied Psychological Measurement, in pressGoogle Scholar
- Sarle WS (1983) Cubic Clustering Criterion. Technical Report A-108, Cary NC, SAS InstituteGoogle Scholar