Skip to main content

Abstract

Several standardization methods are investigated in conjunction with the K-means algorithm under various conditions. We find that traditional standardization methods (i.e., z-scores) are inferior to alternative standardization methods. Future suggestions concerning the combination of standardization and variable selection are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brusco, M. J., Cradit, J. D. (2001). “A Variable-Selection Heuristic for If-means Clustering,” Psychometrika, 66, 249–270.

    Article  MathSciNet  Google Scholar 

  2. Dillon, W. R., Mulani, N., Frederick, D. G. (1989). “On the Use of Component Scores in the Presence of Group Structure,” Journal of Consumer Research, 16, 106–112.

    Article  Google Scholar 

  3. Hubert, L., Arabie, P. (1985). “Comparing partitions,” Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  4. MacQueen, J. (1967). “Some Methods of Classification and Analysis of Multivariate Observations,” in Proceedings of the 5th Berkeley Symposium on Statistics and Probability, eds. L. Le Cam and J. Neyman, Berkeley, CA: University of California Press, pp. 281–297.

    Google Scholar 

  5. Milligan, G. W. (1980). “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,” Psychometrika, 45, 325–342.

    Article  Google Scholar 

  6. Milligan, G. W. (1985). “An Algorithm for Generating Artificial Test Clusters,” Psychometrika, 50, 123–127.

    Article  Google Scholar 

  7. Milligan, G. W., Cooper, M. C. (1988). “A Study of Standardization of Variables in Cluster Analysis,” Journal of Classification, 5, 181–204.

    Article  MathSciNet  Google Scholar 

  8. Schaffer, C. M., Green, P. E. (1996). “An Empirical Comparison of Variable Standardization Methods in Cluster Analysis,” Multivariate Behavioral Research, 31, 149–167.

    Article  Google Scholar 

  9. Späth, H. (1985). Cluster Dissection and Analysis-Theory, FORTRAN Programs, Examples. Wiley, New York.

    MATH  Google Scholar 

  10. Steinley, D. (2003a). “K-means Clustering: What You Don’t Know May Hurt You,” Psychometric Methods, 8, 294–304.

    Article  Google Scholar 

  11. Steinley, D. (2003b). “Properties of the Hubert-Arabie Adjusted Rand Index,” Manuscript submitted for publication.

    Google Scholar 

  12. Steinley, D., Henson, R. (2003). “OCLUS-An Analytic Method to Generate Clusters with Known Overlap,” Manuscript submitted for publication.

    Google Scholar 

  13. Stoddard, A. M. (1979). “Standardization of Measures Prior to Cluster Analysis,” Biometrics, 35, 765–773.

    Article  Google Scholar 

  14. Vesanto, J. (2001). “Importance of Individual Variables in the K-means Algorithm,” in Proceedings of the Pacific-Asia Conference in Knowledge Discovery and Data Mining, eds. D. Cheung, G. J. Willimas, and J. Li, New York: Springer, pp. 513–518.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Steinley, D. (2004). Standardizing Variables in K-means Clustering. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17103-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17103-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22014-5

  • Online ISBN: 978-3-642-17103-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics