A study of standardization of variables in cluster analysis
 Glenn W. Milligan,
 Martha C. Cooper
 … show all 2 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalzscore transformation was found to be less effective in several situations.
 Anderberg, M.R. (1973) Cluster Analysis for Applications. Academic Press, New York
 Bayne, C.K., Beauchamp, J.J., Begovich, C.L., Kane, V.E. (1980) Monte Carlo Comparisons of Selected Clustering Procedures. Pattern Recognition 12: pp. 5162
 Blashfield, R.K. (1976) Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods. Psychological Bulletin 83: pp. 377388
 Blashfield, R.K. (1977) The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis. Psychometrika 42: pp. 429431
 Burr, E.J. (1968) Clustering Sorting with Mixed Character Types: I. Standardization of Character Values. Australian Computer Journal 1: pp. 9799
 Cain, A.J., Harrison, G.A. (1958) An Analysis of the Taxonomist's Judgement of Affinity. Proceedings of the Zoological Society of London 131: pp. 8598
 Carmichael, J.W., George, J.A., Julius, R.S. (1968) Finding Natural Clusters. Systematic Zoology 17: pp. 144150
 Conover, W.J., Iman, R.L. (1981) Rank Transformation as a Bridge Between Parametric and Nonparametric Statistics. The American Statistician 35: pp. 124129
 Cormack, R.M. (1971) A Review of Classification. Journal of the Royal Statistical Society, Series A 134: pp. 321367
 Soete, G., Desarbo, W.S., Carroll, J.D. (1985) Optimal Variable Weighting for Hierarchical Clustering: An Alternating LeastSquares Algorithm. Journal of Classification 2: pp. 173192
 Dubes, R., Jain, A.K. (1980) Clustering Methodologies in Exploratory Data Analysis. Advances in Computers 19: pp. 113228
 Edelbrock, C. (1979) Comparing the Accuracy of Hierarchical Clustering Algorithms: The Problem of Classifying Everybody. Multivariate Behavioral Research 14: pp. 367384
 Everitt, B.S. (1980) Cluster Analysis. Heinemann, London
 Fleiss, J.L., Zubin, J. (1969) On the Methods and Theory of Clustering. Multivariate Behavioral Research 4: pp. 235250
 Gordon, A.D. (1981) Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman and Hall, London
 Gower, J.C. (1971) A General Coefficient of Similarity and Some of Its Properties. Biometrics 27: pp. 857871
 Hall, A.V. (1965) The Peculiarity Index, a New Function for Use in Numerical Taxonomy. Nature 206: pp. 952
 Hall, A.V. Group Forming and Discrimination with Homogeneity Functions. In: Cole, A.J. eds. (1969) Numerical Taxonomy. Academic Press, New York
 Hartigan, J.A. (1975) Clustering Algorithms. Wiley, New York
 Hohenegger, J. (1986) Weighted Standardization — A General Data Transformation Method Preceeding Classification Procedures. Biometrical Journal 28: pp. 295303
 Hubert, L., Arabie, P. (1985) Comparing Partitions. Journal of Classification 2: pp. 193218
 Jardine, N., Sibson, R. (1971) Mathematical Taxonomy. Wiley, New York
 Johnson, S.C. (1967) Hierarchical Clustering Schemes. Psychometrika 32: pp. 241254
 Kaufman, R.L. (1985) Issues in Multivariate Cluster Analysis: Some Simulation Results. Sociological Methods and Research 13: pp. 467486
 Lance, G.N., Williams, W.T. (1967) Mixed Data Classificatory Programs: I. Agglomerative Systems. Australian Computer Journal 1: pp. 1520
 Lorr, M. (1983) Cluster Analysis for the Social Sciences. JosseyBass, San Francisco
 Milligan, G.W. (1980) An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms. Psychometrika 45: pp. 325342
 Milligan, G.W. (1981) A Review of Monte Carlo Tests of Cluster Analysis. Multivariate Behavioral Research 16: pp. 379407
 Milligan, G.W. (1985) An Algorithm for Generating Artificial Test Clusters. Psychometrika 50: pp. 123127
 Milligan, G.W., Cooper, M.C. (1986) A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. Multivariate Behavioral Research 21: pp. 441458
 Milligan, G.W., Cooper, M.C. (1987) Methodological Review: Clustering Methods. Applied Psychological Measurement 11: pp. 329354
 Morrison, D.G. (1967) Measurement Problems in Cluster Analysis. Management Science 13: pp. 775780
 Overall, J.E., Klett, C.J. (1972) Applied Multivariate Analysis. McGrawHill, New York
 Ramsey, P.H. (1978) Power Differences Between Pairwise Multiple Comparisons. Journal of the American Statistical Association 73: pp. 479487
 Romesburg, H.C. (1984) Cluster Analysis for Researchers. Lifetime Learning Publications, Belmont, CA
 SAS User's Guide: Statistics, (1985), Cary, NC: SAS Institute.
 Sawery, W.L., Keller, L., Conger, J.J. (1960) An Objective Method of Grouping Profiles by Distance Functions and Its Relation to Factor Analysis. Educational and Psychological Measurement 20: pp. 651674
 Scheibler, D., Schneider, W. (1985) Monte Carlo Tests of the Accuracy of Cluster Analysis Algorithms — A Comparison of Hierarchical and Nonhierarchical Methods. Multivariate Behavioral Research 20: pp. 283304
 Sneath, P.H.A., Sokal, R.R. (1973) Numerical Taxonomy. Freeman, San Francisco
 Sokal, R.R. (1961) Distance as a Measure of Taxonomic Similarity. Systematic Zoology 10: pp. 7079
 Sokal, R.R., Rohlf, F.J. (1969) Biometry, the Principles and Practice of Statistics in Biological Research. Freeman, San Francisco
 Spath, H. (1980) Cluster Analysis Algorithms. Wiley, New York
 Stoddard, A.M. (1979) Standardization of Measures Prior to Cluster Analysis. Biometrics 35: pp. 765773
 Tukey, J.W. (1977) Exploratory Data Analysis. AddisonWesley, Reading, Ma.
 Williams, W.T., Dale, M.B., Mac NaughtonSmith, P. (1964) An Objective Method of Weighting in Similarity Analysis. Nature 201: pp. 426
 Williams, W.T., Lambert, J.M., Lance, G.N. (1966) Multivariate Methods in Plant Ecology. V. Similarity Analyses and Information Analysis. Journal of Ecology 54: pp. 427445
 Title
 A study of standardization of variables in cluster analysis
 Journal

Journal of Classification
Volume 5, Issue 2 , pp 181204
 Cover Date
 19880901
 DOI
 10.1007/BF01897163
 Print ISSN
 01764268
 Online ISSN
 14321343
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Standard scores
 Cluster analysis
 Industry Sectors
 Authors

 Glenn W. Milligan ^{(1)}
 Martha C. Cooper ^{(2)}
 Author Affiliations

 1. Faculty of Management Sciences, The Ohio State University, 301 Hagerty Hall, 43210, Columbus, Ohio, USA
 2. Faculty of Marketing, The Ohio State University, 421 Hagerty Hall, 43210, Columbus, Ohio, USA