Advertisement

A Median-Based Consensus Rule for Distance Exponent Selection in the Framework of Intelligent and Weighted Minkowski Clustering

  • Renato Cordeiro de Amorim
  • Nadia Tahiri
  • Boris Mirkin
  • Vladimir MakarenkovEmail author
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The intelligent Minkowski and weighted Minkowski K-means are recently developed effective clustering algorithms capable of computing feature weights. Their cluster-specific weights follow the intuitive idea that a feature with a low dispersion in a specific cluster should have a greater weight in this cluster than a feature with a high dispersion. The final clustering provided by these techniques obviously depends on the selection of the Minkowski exponent. The median-based central consensus rule we introduce in this paper allows one to select an optimal value of the Minkowski exponent. Our rule takes into account the values of the Adjusted Rand Index (ARI) between clustering solutions obtained for different Minkowski exponents and selects the clustering that provides the highest average value of ARI. Our simulations, carried out with real and synthetic data, show that the proposed median-based consensus procedure usually outperforms clustering strategies based on the selection of the highest value of the Silhouette or Calinski–Harabasz cluster validity indices.

References

  1. 1.
    Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2012)CrossRefGoogle Scholar
  2. 2.
    Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12, 153–155 (1967)CrossRefGoogle Scholar
  3. 3.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3, 1–27 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn. 37, 943–952 (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    de Amorim, R.C., Mirkin, B.: Minkowski metric, feature weighting and anomalous cluster initializing in K-means clustering. Pattern Recogn. 45, 1061–1075 (2012)CrossRefGoogle Scholar
  6. 6.
    Field, A.: Discovering Statistics Using SPSS. SAGE Publications, New Delhi (2005)zbMATHGoogle Scholar
  7. 7.
    Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27, 657–668 (2005)CrossRefGoogle Scholar
  8. 8.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefzbMATHGoogle Scholar
  9. 9.
    Jain, A.K.: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)CrossRefGoogle Scholar
  10. 10.
    Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)CrossRefGoogle Scholar
  11. 11.
    Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open Source Scientific Tools for Python. R Foundation for Statistical Computing, Vienna (2011). Available via DIALOGGoogle Scholar
  12. 12.
    Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine (2013). Available via DIALOGGoogle Scholar
  13. 13.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Broy, M. (ed.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley, CA (1967)Google Scholar
  14. 14.
    Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and K-Means partitioning. J. Classif. 169, 245–271 (2001)MathSciNetzbMATHGoogle Scholar
  15. 15.
    MATLAB: MATLAB:2010. The MathWorks Inc., Natick, MA (2010)Google Scholar
  16. 16.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)CrossRefGoogle Scholar
  17. 17.
    Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, London (2012)CrossRefzbMATHGoogle Scholar
  18. 18.
    Murtagh, F.: Complexities of hierarchic clustering algorithms: state of the art. Comput. Stat. 1, 101–113 (1984)zbMATHGoogle Scholar
  19. 19.
    Murtagh, F., Contreras, P.: Methods of hierarchical clustering (2011). arXiv preprint arXiv:1105.0121Google Scholar
  20. 20.
    Pal, S.K., Majumder, D.D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Syst. Man Cyber. 7, 625–629 (1977)CrossRefzbMATHGoogle Scholar
  21. 21.
    Pollard, K.S., Van Der Laan, M.J.: A method to identify significant clusters in gene expression data. Bepress, pp. 318–325 (2002)Google Scholar
  22. 22.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013). Available via DIALOGGoogle Scholar
  23. 23.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  24. 24.
    Steinley, D.: K-means: a half-century synthesis. Br. J. Math. Stat. Psychol. 59, 1–34 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Renato Cordeiro de Amorim
    • 1
  • Nadia Tahiri
    • 2
  • Boris Mirkin
    • 3
    • 4
  • Vladimir Makarenkov
    • 2
    Email author
  1. 1.School of Computer ScienceUniversity of HertfordshireHatfieldUK
  2. 2.Département d’informatiqueUniversité du Québec à MontréalMontrealCanada
  3. 3.Department of Data Analysis and Machine IntelligenceNational Research University, Higher School of EconomicsMoscowRussia
  4. 4.Department of Computer Science and Information SystemsBirkbeck University of LondonLondonUK

Personalised recommendations