A study of standardization of variables in cluster analysis

Milligan, Glenn W.; Cooper, Martha C.

doi:10.1007/BF01897163

A study of standardization of variables in cluster analysis

Authors Of Articles
Published: September 1988

Volume 5, pages 181–204, (1988)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Glenn W. Milligan¹ &
Martha C. Cooper²

4178 Accesses
548 Citations
8 Altmetric
Explore all metrics

Abstract

A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalz-score transformation was found to be less effective in several situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ANDERBERG, M.R. (1973),Cluster Analysis for Applications, New York: Academic Press.
Google Scholar
BAYNE, C.K., BEAUCHAMP, J.J., BEGOVICH, C.L., and KANE, V.E. (1980), “Monte Carlo Comparisons of Selected Clustering Procedures,”Pattern Recognition, 12, 51–62.
Google Scholar
BLASHFIELD, R.K. (1976), “Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods,”Psychological Bulletin, 83, 377–388.
Google Scholar
BLASHFIELD, R.K. (1977), “The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis,”Psychometrika, 42, 429–431.
Google Scholar
BURR, E.J. (1968), “Clustering Sorting with Mixed Character Types: I. Standardization of Character Values,”Australian Computer Journal, 1, 97–99.
Google Scholar
CAIN, A.J., and HARRISON, G.A. (1958), “An Analysis of the Taxonomist's Judgement of Affinity,”Proceedings of the Zoological Society of London, 131, 85–98.
Google Scholar
CARMICHAEL, J.W., GEORGE, J.A., and JULIUS, R.S. (1968), “Finding Natural Clusters,”Systematic Zoology, 17, 144–150.
Google Scholar
CONOVER, W.J., and IMAN, R.L. (1981), “Rank Transformation as a Bridge Between Parametric and Nonparametric Statistics,”The American Statistician, 35, 124–129.
Google Scholar
CORMACK, R.M. (1971), “A Review of Classification,”Journal of the Royal Statistical Society, Series A, 134, 321–367.
Google Scholar
DE SOETE, G., DESARBO, W.S., and CARROLL, J.D. (1985), “Optimal Variable Weighting for Hierarchical Clustering: An Alternating Least-Squares Algorithm,”Journal of Classification, 2, 173–192.
Google Scholar
DUBES, R., and JAIN, A.K. (1980), “Clustering Methodologies in Exploratory Data Analysis,”Advances in Computers, 19, 113–228.
Google Scholar
EDELBROCK, C. (1979), “Comparing the Accuracy of Hierarchical Clustering Algorithms: The Problem of Classifying Everybody,”Multivariate Behavioral Research, 14, 367–384.
Google Scholar
EVERITT, B.S. (1980),Cluster Analysis (2nd ed.), London: Heinemann.
Google Scholar
FLEISS, J.L., and ZUBIN, J. (1969), “On the Methods and Theory of Clustering,”Multivariate Behavioral Research, 4, 235–250.
Google Scholar
GORDON, A.D. (1981),Classification: Methods for the Exploratory Analysis of Multivariate Data, London: Chapman and Hall.
Google Scholar
GOWER, J.C. (1971), “A General Coefficient of Similarity and Some of Its Properties,”Biometrics, 27, 857–871.
Google Scholar
HALL, A.V. (1965), “The Peculiarity Index, a New Function for Use in Numerical Taxonomy,”Nature, 206, 952.
Google Scholar
HALL, A.V. (1969), “Group Forming and Discrimination with Homogeneity Functions,” inNumerical Taxonomy, ed. A.J. Cole, New York: Academic Press.
Google Scholar
HARTIGAN, J.A. (1975),Clustering Algorithms, New York: Wiley.
Google Scholar
HOHENEGGER, J. (1986), “Weighted Standardization — A General Data Transformation Method Preceeding Classification Procedures,”Biometrical Journal, 28, 295–303.
Google Scholar
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.
Google Scholar
JARDINE, N., and SIBSON, R. (1971),Mathematical Taxonomy, New York: Wiley.
Google Scholar
JOHNSON, S.C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32, 241–254.
PubMed Google Scholar
KAUFMAN, R.L. (1985), “Issues in Multivariate Cluster Analysis: Some Simulation Results,”Sociological Methods and Research, 13, 467–486.
Google Scholar
LANCE, G.N., and WILLIAMS, W.T. (1967), “Mixed Data Classificatory Programs: I. Agglomerative Systems,”Australian Computer Journal, 1, 15–20.
Google Scholar
LORR, M. (1983),Cluster Analysis for the Social Sciences, San Francisco: Jossey-Bass.
Google Scholar
MILLIGAN, G.W. (1980), “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,”Psychometrika, 45, 325–342.
Google Scholar
MILLIGAN, G.W. (1981), “A Review of Monte Carlo Tests of Cluster Analysis,”Multivariate Behavioral Research, 16, 379–407.
Google Scholar
MILLIGAN, G.W. (1985), “An Algorithm for Generating Artificial Test Clusters,”Psychometrika, 50, 123–127.
Google Scholar
MILLIGAN, G.W., and COOPER, M.C. (1986), “A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis,”Multivariate Behavioral Research, 21, 441–458.
Google Scholar
MILLIGAN, G.W., and COOPER, M.C. (1987), “Methodological Review: Clustering Methods,”Applied Psychological Measurement, 11, 329–354.
Google Scholar
MORRISON, D.G. (1967), “Measurement Problems in Cluster Analysis,”Management Science, 13, 775–780.
Google Scholar
OVERALL, J.E., and KLETT, C.J. (1972),Applied Multivariate Analysis, New York: McGraw-Hill.
Google Scholar
RAMSEY, P.H. (1978), “Power Differences Between Pairwise Multiple Comparisons,”Journal of the American Statistical Association, 73, 479–487.
Google Scholar
ROMESBURG, H.C. (1984),Cluster Analysis for Researchers, Belmont, CA: Lifetime Learning Publications.
Google Scholar
SAS User's Guide: Statistics, (1985), Cary, NC: SAS Institute.
SAWERY, W.L., KELLER, L., and CONGER, J.J. (1960), “An Objective Method of Grouping Profiles by Distance Functions and Its Relation to Factor Analysis,”Educational and Psychological Measurement, 20, 651–674.
Google Scholar
SCHEIBLER, D., and SCHNEIDER, W. (1985), “Monte Carlo Tests of the Accuracy of Cluster Analysis Algorithms — A Comparison of Hierarchical and Nonhierarchical Methods,”Multivariate Behavioral Research, 20, 283–304.
Google Scholar
SNEATH, P.H.A., and SOKAL, R.R. (1973),Numerical Taxonomy, San Francisco: Freeman.
Google Scholar
SOKAL, R.R. (1961), “Distance as a Measure of Taxonomic Similarity,”Systematic Zoology, 10, 70–79.
Google Scholar
SOKAL, R.R., and ROHLF, F.J. (1969),Biometry, the Principles and Practice of Statistics in Biological Research, San Francisco: Freeman.
Google Scholar
SPATH, H. (1980),Cluster Analysis Algorithms, New York: Wiley.
Google Scholar
STODDARD, A.M. (1979), “Standardization of Measures Prior to Cluster Analysis,”Biometrics, 35, 765–773.
Google Scholar
TUKEY, J.W. (1977),Exploratory Data Analysis, Reading, Ma.: Addison-Wesley.
Google Scholar
WILLIAMS, W.T., DALE, M.B., and MAC NAUGHTON-SMITH, P. (1964), “An Objective Method of Weighting in Similarity Analysis,”Nature, 201, 426.
Google Scholar
WILLIAMS, W.T., LAMBERT, J.M., and LANCE, G.N. (1966), “Multivariate Methods in Plant Ecology. V. Similarity Analyses and Information Analysis,”Journal of Ecology, 54, 427–445.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Management Sciences, The Ohio State University, 301 Hagerty Hall, 43210, Columbus, Ohio, USA
Glenn W. Milligan
Faculty of Marketing, The Ohio State University, 421 Hagerty Hall, 43210, Columbus, Ohio, USA
Martha C. Cooper

Authors

Glenn W. Milligan
View author publications
You can also search for this author in PubMed Google Scholar
Martha C. Cooper
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milligan, G.W., Cooper, M.C. A study of standardization of variables in cluster analysis. Journal of Classification 5, 181–204 (1988). https://doi.org/10.1007/BF01897163

Download citation

Issue Date: September 1988
DOI: https://doi.org/10.1007/BF01897163

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of standardization of variables in cluster analysis

Abstract

Access this article

Similar content being viewed by others

Hierarchical Means Clustering

Benchmarking distance-based partitioning methods for mixed-type data

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study of standardization of variables in cluster analysis

Abstract

Access this article

Similar content being viewed by others

Hierarchical Means Clustering

Benchmarking distance-based partitioning methods for mixed-type data

Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation