On Initializations for the Minkowski Weighted K-Means

  • Renato Cordeiro de Amorim
  • Peter Komisarczuk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

Minkowski Weighted K-Means is a variant of K-Means set in the Minkowski space, automatically computing weights for features at each cluster. As a variant of K-Means, its accuracy heavily depends on the initial centroids fed to it. In this paper we discuss our experiments comparing six initializations, random and five other initializations in the Minkowski space, in terms of their accuracy, processing time, and the recovery of the Minkowski exponent p.

We have found that the Ward method in the Minkowski space tends to outperform other initializations, with the exception of low-dimensional Gaussian Models with noise features. In these, a modified version of intelligent K-Means excels.

Keywords

Minkowski K-Means K-Means Initializations Lp Space Minkowski Space Feature Weighting Noise Features intelligent K-Means Ward Method 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behavioral Science 12(2), 153–155 (1967)CrossRefGoogle Scholar
  2. 2.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, pp. 281–297 (1967)Google Scholar
  3. 3.
    Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition 37(5), 943–952 (2004)MATHCrossRefGoogle Scholar
  4. 4.
    Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)CrossRefGoogle Scholar
  5. 5.
    Huang, J.Z., Xu, J., Ng, M., Ye, Y.: Weighting Method for Feature Selection in K-Means. In: Computational Methods of feature selection, pp. 193–209. Chapman & Hall (2008)Google Scholar
  6. 6.
    de Amorim, R.C., Mirkin, B.: Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering. Pattern Recognition 45(3), 1061–1075 (2011)CrossRefGoogle Scholar
  7. 7.
    Mirkin, B.G.: Clustering for data mining: a data recovery approach. CRC Press (2005)Google Scholar
  8. 8.
    Chiang, M.M.T., Mirkin, B.: Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads. Journal of Classification 27(1), 3–40 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)CrossRefGoogle Scholar
  10. 10.
    Steinley, D., Brusco, M.J.: Initializing K-Means batch clustering: A critical evaluation of several techniques. Journal of Classification 24(1), 99–121 (2007)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Maitra, R., Peterson, A.D., Ghosh, A.P.: A systematic evaluation of different methods for initializing the K-Means clustering algorithm. TKDE (2010)Google Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C. 28(1), 100–108 (1979)MATHGoogle Scholar
  13. 13.
    Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 236–244 (1963)Google Scholar
  14. 14.
    Milligan, G.W., Cooper, M.C.: A study of standardization of variables in cluster analysis. Journal of Classification 5(2), 181–204 (1988)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley Online Library (1990)Google Scholar
  16. 16.
    Astrahan, M.M.: Speech analysis by clustering, or the hyperphoneme method. Issue 124 of Memo (Stanford Artificial Intelligence Project) (1970)Google Scholar
  17. 17.
    de Amorim, R.C.: Constrained Intelligent K-Means: Improving Results with Limited Previous Knowledge. In: ADVCOMP, pp. 176–180 (2008)Google Scholar
  18. 18.
    de Amorim, R.C., Komisarczuk, P.: On partitional clustering of malware. In: CyberPatterns, pp. 47–51. Abingdon, Oxfordshire (2012)Google Scholar
  19. 19.
    Steinley, D.: Standardizing variables in K-means clustering. Classification, Clustering, and Data Mining Applications, 53–60 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Renato Cordeiro de Amorim
    • 1
    • 2
  • Peter Komisarczuk
    • 2
  1. 1.Department of Computer Science and Information SystemsBirkbeck University of LondonUK
  2. 2.School of Computing and TechnologyUniversity of West LondonUK

Personalised recommendations