Weighting Features for Partition around Medoids Using the Minkowski Metric

  • Renato Cordeiro de Amorim
  • Trevor Fenner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

In this paper we introduce the Minkowski weighted partition around medoids algorithm (MW-PAM). This extends the popular partition around medoids algorithm (PAM) by automatically assigning K weights to each feature in a dataset, where K is the number of clusters. Our approach utilizes the within-cluster variance of features to calculate the weights and uses the Minkowski metric.

We show through many experiments that MW-PAM, particularly when initialized with the Build algorithm (also using the Minkowski metric), is superior to other medoid-based algorithms in terms of both accuracy and identification of irrelevant features.

Keywords

PAM medoids Minkowski metric feature weighting Build L-p space 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brohee, S., Van Helden, J.: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7(1), 488–501 (2006)CrossRefGoogle Scholar
  2. 2.
    Hartigan, J.A.: Clustering algorithms. John Willey & Sons (1975)Google Scholar
  3. 3.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010)CrossRefGoogle Scholar
  4. 4.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley Online Library (1990)Google Scholar
  5. 5.
    Mirkin, B.: Core concepts in data analysis: summarization, correlation and visualization. Springer, New York (2011)MATHCrossRefGoogle Scholar
  6. 6.
    Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behavioral Science 12(2), 153–155 (1967)CrossRefGoogle Scholar
  7. 7.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, pp. 281–297 (1967)Google Scholar
  8. 8.
    de Amorim, R.C., Komisarczuk, P.: On partitional clustering of malware. In: CyberPatterns, pp. 47–51. Abingdon, Oxfordshire (2012)Google Scholar
  9. 9.
    Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition 37(5), 943–952 (2004)MATHCrossRefGoogle Scholar
  10. 10.
    Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)CrossRefGoogle Scholar
  11. 11.
    Huang, J.Z., Xu, J., Ng, M., Ye, Y.: Weighting Method for Feature Selection in K-Means. In: Computational Methods of Feature Selection, pp. 193–209. Chapman and Hall (2008)Google Scholar
  12. 12.
    Mirkin, B.G.: Clustering for data mining: a data recovery approach. CRC Press (2005)Google Scholar
  13. 13.
    de Amorim, R.C., Mirkin, B.: Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering. Pattern Recognition 45(3), 1061–1075 (2011)CrossRefGoogle Scholar
  14. 14.
    Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Machine Learning 52(3), 217–237 (2003)MATHCrossRefGoogle Scholar
  15. 15.
    Tsai, C.Y., Chiu, C.C.: Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Computational Statistics & Data Analysis 52(10), 4658–4672 (2008)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semi-Supervised Clustering. In: Proceedings of 21st International Conference on Machine Learning, Banff, Canada, pp. 81–88 (2004)Google Scholar
  17. 17.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 16, pp. 521–528 (2002)Google Scholar
  18. 18.
    Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and K-means partitioning: Methods and software. Journal of Classification 18(2), 245–271 (2001)MathSciNetMATHGoogle Scholar
  19. 19.
    Frigui, H., Nasraoui, O.: Unsupervised learning of prototypes and attribute weights. Pattern Recognition 37(3), 567–581 (2004)CrossRefGoogle Scholar
  20. 20.
    Irvine UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/
  21. 21.
    Nabney, I., Bishop, C.: Netlab neural network software. Matlab ToolboxGoogle Scholar
  22. 22.
    de Amorim, R.C.: Constrained Intelligent K-Means: Improving Results with Limited Previous Knowledge. In: ADVCOMP, pp. 176–180 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Renato Cordeiro de Amorim
    • 1
  • Trevor Fenner
    • 1
  1. 1.Department of Computer Science and Information SystemsBirkbeck University of LondonUK

Personalised recommendations